Security OverviewNEW
How Sophon keeps agents, credentials, and code execution safe.
Sophon gives an autonomous agent real power: tokens for dozens of services, the ability to run untrusted code, and (via Sophon Node) the ability to click buttons on a user's machine. The security model exists to make that power safe by default. It boils down to four risks: credentials the agent must use but never see, irreversible actions that need a human in the loop, untrusted code from the marketplace or self-authored skills, and untrusted text that tries to hijack the agent.
This section is the hub for how each of those is hardened.
The model at a glance
| Risk | Control | Where it lives |
|---|---|---|
| Raw secrets reaching the LLM | Brokered credential vault | Credential vault + tool executor — Credential Vault |
| Sending email, money, deletes | Human approval gates | Approval gate + risk classification — Approval Gates |
| Marketplace / custom skill code | Docker + gVisor sandbox | Sandbox policy presets — Sandbox |
| Injected instructions in messages, tool output, webhooks | External-content wrapping + detection | External-content wrapper — Prompt Injection Defense |
| Request floods | Per-user / per-IP rate limiting | Gateway rate limiter |
| Accountability | Audit logging | Audit trail across every control |
| Adversarial-AI attack surface | MITRE ATLAS technique mapping + residual-risk roadmap | Threat Model |
Credential brokering
The agent never touches a raw token. When the LLM calls a tool, the Tool Executor resolves the call, fetches the token from the vault, makes the API call outside the model context, and returns only the sanitized result. Tokens are encrypted at rest (AES-256-GCM), OAuth 2.1 + PKCE is preferred over static keys, and every access is logged. Sandboxed skills get credentials either through a proxy (the executor makes the call for them) or, opt-in, through a short-lived scoped token injected into the container.
See Credential Vault and Brokering.
Sandboxed code execution
All user-authored, self-authored, and marketplace skill code runs in Docker containers under the gVisor runtime, with CPU/memory/disk/time limits, an ephemeral workspace, and no outbound network unless the skill's manifest declares an explicit allowlist. Built-in policy presets range from minimal isolation to locked-down (no network, read-only filesystem, no process spawn), and Linux deployments add a Landlock LSM layer. Uploaded skills pass through static analysis, a sandboxed test run, and signature verification before they can load.
See Sandbox and Code Execution.
Human approval gates
Before any sensitive or irreversible action, Sophon pauses and asks. Every tool call is risk-classified (None through Critical); read-only queries auto-approve, while sending email, making a purchase, or running a shell command on a node always requires review — unless you explicitly raise the auto-approve ceiling for a session. The user sees exactly what will happen, can edit drafts before approving, and timeout equals reject — nothing is ever auto-approved by inaction. Destructive actions are default-deny, and any batch over the bulk threshold is gated.
See Approval Gates.
Prompt-injection defense
Untrusted text — inbound channel messages, tool results, and webhook payloads — is wrapped in external_content boundary markers before it reaches the model. Each block carries a random per-block nonce so content cannot forge a closing marker to "break out," and any literal closing marker inside the content is de-fanged. A curated pattern list flags known injection phrases; on a match Sophon annotates and logs but never blocks, so legitimate text is never dropped. The behavior is configurable under Sophon:PromptInjection.
Rate limiting
The Gateway applies global HTTP rate limiting: per-user when authenticated, per-IP otherwise, returning 429 with a Retry-After header. Limits are generous flat defaults, fully config-tunable, and health, metrics, and hub endpoints are exempt. Cron jobs can opt into random schedule jitter to avoid thundering-herd firing.
See Rate Limiting.
Audit logging
Every tool call, approval decision, credential access, and admin action is recorded with a timestamp, user and tenant ID, redacted parameters, outcome, and a correlation ID for tracing. On Enterprise this is browsable and queryable for compliance review.
See Audit Logging.
In this section
- Credential Vault and Brokering — how the agent uses tokens it never sees
- Sandbox and Code Execution — Docker + gVisor isolation and policy presets
- Prompt Injection Defense — wrapping and detecting untrusted text
- Rate Limiting — per-user and per-IP request budgets
- MITRE ATLAS Threat Model — trust boundaries, ATLAS technique coverage, and the hardening roadmap
Related: Approval Gates, Audit Logging, Architecture Overview, Sophon Node.