Security Overview

Sophon gives an autonomous agent real power: tokens for dozens of services, the ability to run untrusted code, and (via Sophon Node) the ability to click buttons on a user's machine. The security model exists to make that power safe by default. It boils down to four risks: credentials the agent must use but never see, irreversible actions that need a human in the loop, untrusted code from the marketplace or self-authored skills, and untrusted text that tries to hijack the agent.

This section is the hub for how each of those is hardened.

The model at a glance

Risk	Control	Where it lives
Raw secrets reaching the LLM	Brokered credential vault	Credential vault + tool executor — Credential Vault
Sending email, money, deletes	Human approval gates	Approval gate + risk classification — Approval Gates
Marketplace / custom skill code	Docker + gVisor sandbox	Sandbox policy presets — Sandbox
Injected instructions in messages, tool output, webhooks	External-content wrapping + detection	External-content wrapper — Prompt Injection Defense
Request floods	Per-user / per-IP rate limiting	Gateway rate limiter
Accountability	Audit logging	Audit trail across every control
Adversarial-AI attack surface	MITRE ATLAS technique mapping + residual-risk roadmap	Threat Model

Credential brokering

The agent never touches a raw token. When the LLM calls a tool, the Tool Executor resolves the call, fetches the token from the vault, makes the API call outside the model context, and returns only the sanitized result. Tokens are encrypted at rest (AES-256-GCM), OAuth 2.1 + PKCE is preferred over static keys, and every access is logged. Sandboxed skills get credentials either through a proxy (the executor makes the call for them) or, opt-in, through a short-lived scoped token injected into the container.

See Credential Vault and Brokering.

Sandboxed code execution

All user-authored, self-authored, and marketplace skill code runs in Docker containers under the gVisor runtime, with CPU/memory/disk/time limits, an ephemeral workspace, and no outbound network unless the skill's manifest declares an explicit allowlist. Built-in policy presets range from minimal isolation to locked-down (no network, read-only filesystem, no process spawn), and Linux deployments add a Landlock LSM layer. Uploaded skills pass through static analysis, a sandboxed test run, and signature verification before they can load.

See Sandbox and Code Execution.

Human approval gates

Before any sensitive or irreversible action, Sophon pauses and asks. Every tool call is risk-classified (None through Critical); read-only queries auto-approve, while sending email, making a purchase, or running a shell command on a node always requires review — unless you explicitly raise the auto-approve ceiling for a session. The user sees exactly what will happen, can edit drafts before approving, and timeout equals reject — nothing is ever auto-approved by inaction. Destructive actions are default-deny, and any batch over the bulk threshold is gated.

See Approval Gates.

Prompt-injection defense

Untrusted text — inbound channel messages, tool results, and webhook payloads — is wrapped in external_content boundary markers before it reaches the model. Each block carries a random per-block nonce so content cannot forge a closing marker to "break out," and any literal closing marker inside the content is de-fanged. A curated pattern list flags known injection phrases; on a match Sophon annotates and logs but never blocks, so legitimate text is never dropped. The behavior is configurable under Sophon:PromptInjection.

See Prompt Injection Defense.

Rate limiting

The Gateway applies global HTTP rate limiting: per-user when authenticated, per-IP otherwise, returning 429 with a Retry-After header. Limits are generous flat defaults, fully config-tunable, and health, metrics, and hub endpoints are exempt. Cron jobs can opt into random schedule jitter to avoid thundering-herd firing.

See Rate Limiting.

Audit logging

Every tool call, approval decision, credential access, and admin action is recorded with a timestamp, user and tenant ID, redacted parameters, outcome, and a correlation ID for tracing. On Enterprise this is browsable and queryable for compliance review.

See Audit Logging.

In this section

Credential Vault and Brokering — how the agent uses tokens it never sees
Sandbox and Code Execution — Docker + gVisor isolation and policy presets
Prompt Injection Defense — wrapping and detecting untrusted text
Rate Limiting — per-user and per-IP request budgets
MITRE ATLAS Threat Model — trust boundaries, ATLAS technique coverage, and the hardening roadmap

Security OverviewNEW