Sophon Docs
Security

Security OverviewNEW

How Sophon keeps agents, credentials, and code execution safe.

Sophon gives an autonomous agent real power: tokens for dozens of services, the ability to run untrusted code, and (via Sophon Node) the ability to click buttons on a user's machine. The security model exists to make that power safe by default. It boils down to four risks: credentials the agent must use but never see, irreversible actions that need a human in the loop, untrusted code from the marketplace or self-authored skills, and untrusted text that tries to hijack the agent.

This section is the hub for how each of those is hardened.

The model at a glance

RiskControlWhere it lives
Raw secrets reaching the LLMBrokered credential vaultCredential vault + tool executor — Credential Vault
Sending email, money, deletesHuman approval gatesApproval gate + risk classification — Approval Gates
Marketplace / custom skill codeDocker + gVisor sandboxSandbox policy presets — Sandbox
Injected instructions in messages, tool output, webhooksExternal-content wrapping + detectionExternal-content wrapper — Prompt Injection Defense
Request floodsPer-user / per-IP rate limitingGateway rate limiter
AccountabilityAudit loggingAudit trail across every control
Adversarial-AI attack surfaceMITRE ATLAS technique mapping + residual-risk roadmapThreat Model

Credential brokering

The agent never touches a raw token. When the LLM calls a tool, the Tool Executor resolves the call, fetches the token from the vault, makes the API call outside the model context, and returns only the sanitized result. Tokens are encrypted at rest (AES-256-GCM), OAuth 2.1 + PKCE is preferred over static keys, and every access is logged. Sandboxed skills get credentials either through a proxy (the executor makes the call for them) or, opt-in, through a short-lived scoped token injected into the container.

See Credential Vault and Brokering.

Sandboxed code execution

All user-authored, self-authored, and marketplace skill code runs in Docker containers under the gVisor runtime, with CPU/memory/disk/time limits, an ephemeral workspace, and no outbound network unless the skill's manifest declares an explicit allowlist. Built-in policy presets range from minimal isolation to locked-down (no network, read-only filesystem, no process spawn), and Linux deployments add a Landlock LSM layer. Uploaded skills pass through static analysis, a sandboxed test run, and signature verification before they can load.

See Sandbox and Code Execution.

Human approval gates

Before any sensitive or irreversible action, Sophon pauses and asks. Every tool call is risk-classified (None through Critical); read-only queries auto-approve, while sending email, making a purchase, or running a shell command on a node always requires review — unless you explicitly raise the auto-approve ceiling for a session. The user sees exactly what will happen, can edit drafts before approving, and timeout equals reject — nothing is ever auto-approved by inaction. Destructive actions are default-deny, and any batch over the bulk threshold is gated.

See Approval Gates.

Prompt-injection defense

Untrusted text — inbound channel messages, tool results, and webhook payloads — is wrapped in external_content boundary markers before it reaches the model. Each block carries a random per-block nonce so content cannot forge a closing marker to "break out," and any literal closing marker inside the content is de-fanged. A curated pattern list flags known injection phrases; on a match Sophon annotates and logs but never blocks, so legitimate text is never dropped. The behavior is configurable under Sophon:PromptInjection.

See Prompt Injection Defense.

Rate limiting

The Gateway applies global HTTP rate limiting: per-user when authenticated, per-IP otherwise, returning 429 with a Retry-After header. Limits are generous flat defaults, fully config-tunable, and health, metrics, and hub endpoints are exempt. Cron jobs can opt into random schedule jitter to avoid thundering-herd firing.

See Rate Limiting.

Audit logging

Every tool call, approval decision, credential access, and admin action is recorded with a timestamp, user and tenant ID, redacted parameters, outcome, and a correlation ID for tracing. On Enterprise this is browsable and queryable for compliance review.

See Audit Logging.

In this section

Related: Approval Gates, Audit Logging, Architecture Overview, Sophon Node.