MITRE ATLAS Threat Model

How Sophon maps to the MITRE ATLAS framework — system trust boundaries, adversarial-AI technique coverage, a residual-risk roadmap, and detection coverage.

MITRE ATLAS (Adversarial Threat Landscape for AI Systems) is the industry knowledge base of real-world tactics and techniques used against AI-enabled systems — the AI-specific counterpart to MITRE ATT&CK. Because Sophon gives an autonomous agent real capabilities — credentials for dozens of services, untrusted-code execution, and (via Sophon Node) control of a physical desktop — we publish a technique-level threat model so security reviewers can see exactly which adversarial behaviours we model and how each is contained.

This page maps the trust boundaries of the Sophon platform to specific ATLAS techniques, names the controls that mitigate each, and keeps an honest residual-risk roadmap for the gaps we are still closing.

In scope: the Gateway API, agent runtime, messaging channels, tools, Sophon Node, marketplace skills, MCP servers, and multi-tenant isolation. Out of scope: model-vendor / provider-side security, host-OS hardening, physical security, and third-party package supply chains beyond first-party review. Technique IDs were verified against the published MITRE ATLAS matrix (atlas.mitre.org) in June 2026.

How to read this page

Controls are described at capability level and link to the relevant security docs — not to internal source. The residual-risk roadmap lists known limitations deliberately and openly, framed as forward-looking hardening with planned mitigations; it intentionally omits step-by-step exploitable detail. A control is only listed as a mitigation where it is actually implemented today — partial or absent controls live in the roadmap, never in the "control" column.

Each technique entry below carries its ATLAS ID, the concrete way it could manifest in Sophon, the trust boundary it crosses, the mitigating control(s), and a pointer (→ R-NN) to the residual-risk roadmap wherever the control is partial. Where a genuine Sophon threat has no clean ATLAS technique, it is labelled a Sophon-specific extension rather than forced onto an ID.

System and trust boundaries

Sophon's defensible perimeter is the set of points where untrusted data or actors meet trusted execution. Untrusted input (channel messages, fetched web/email content, marketplace skills, MCP tool definitions, webhook payloads) is never given direct authority; it is wrapped, classified, sandboxed, or gated before it can influence a privileged action. The diagram shows the ten boundaries; the table that follows is the authoritative reference.

  UNTRUSTED INPUTS            TRUST BOUNDARY (controls)          TRUSTED ZONE (your infra)
  ----------------           --------------------------         -------------------------
  channel users        -->   (a) ingress wrap + rate limit  --> Gateway API
  webhooks             -->   (i) HMAC verify + SSRF guard    --> Context Assembler
  fetched web / email  -->   (c) external-content wrapping   --> Agent (LLM)
  LLM tool calls       -->   (b) risk classify + approval    --> Tool Executor
  marketplace skills   -->   (d) gVisor sandbox              --> credentials brokered
  MCP servers          -->   (e) allowlist + schema validate --> (h) Credential Vault
  desktop via Node     -->   (f) scope + denylist + consent  --> external APIs / OS
  operators (REST/UI)  -->   (j) SSO + RBAC + audit          --> (g) per-tenant isolation

Boundary	What crosses it	Primary control(s)
(a) Inbound channels	Untrusted user messages (WhatsApp, Telegram, Slack, Discord, Signal, Email, Teams, SMS, Matrix)	External-content wrapping, rate limiting, channel pairing
(b) LLM output → tool calls	Model-proposed actions and parameters	Risk classification + approval gates, tool-loop detection
(c) Tool results / fetched content	Web pages, emails, files, API responses	External-content wrapping with nonce fencing
(d) Marketplace / community skills	Untrusted code	Docker sandbox, gVisor where configured, static analysis, signing
(e) MCP external servers	Untrusted tool definitions and results	Per-agent allowlists, schema validation, inherited risk → approval
(f) Node / desktop commands	OS actions on a paired machine	Permission scopes, shell denylist, on-device consent
(g) Multi-tenant isolation	Cross-tenant data access	Per-tenant context + ORM-level query filters
(h) Credential access	Tokens and secrets	Brokered credential vault — the model never sees raw tokens
(i) Webhooks (in / out)	Inbound triggers and outbound deliveries	HMAC-SHA256 sign/verify, random slugs, SSRF guard
(j) Dashboard / REST API	Authenticated operator actions	SSO/OIDC, RBAC, audit logging

ATLAS technique coverage

The mapping covers 15 of the 16 ATLAS tactics (Command and Control has no Sophon-relevant technique and is not mapped; AI Attack Staging is represented only by RAG Poisoning, since Sophon does not train or host first-party model weights). Each tactic is a subsection with a table of its techniques.

Reconnaissance

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0006` Active Scanning	Crafted messages probe a channel-connected agent to map its tools, skills, and behaviour before attacking	(a)	External-content wrapping treats probes as data; suspicious-pattern logging; rate limiting caps probe volume	→ R-04
`AML.T0064` Gather RAG-Indexed Targets	An adversary works out what documents sit in the agent's retrieval corpus to target poisoning or extraction	(a), (c)	Per-tenant isolation of the corpus; external-content wrapping of retrieved chunks	→ R-02

Resource Development

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0008` Acquire Infrastructure	An attacker stands up a hostile MCP server advertising attractive tools, waiting to be registered	(e)	Per-agent tool allowlists, JSON-schema validation, bridged-tool risk inheritance (Medium → approval)	→ R-01
`AML.T0065` LLM Prompt Crafting	An attacker develops injection or jailbreak prompts to deploy later through a channel	(a)	External-content wrapping; suspicious-pattern annotation; approval gates downstream	→ R-04
`AML.T0066` Retrieval Content Crafting	An attacker authors content engineered to be retrieved by RAG and steer the agent	(c)	External-content wrapping of retrieved content; approval gates on resulting actions	→ R-08

Initial Access

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0010` AI Supply Chain Compromise	A trojaned marketplace skill or a poisoned dependency enters via the skill supply chain	(d)	Marketplace pipeline: static analysis → sandboxed test → optional review → signing + signature verification before load	—
`AML.T0012` Valid Accounts	Stolen operator credentials or a hijacked session are used against the Dashboard / REST API	(j)	SSO/OIDC, RBAC, per-user rate limiting, full audit of admin actions	→ R-09, R-10
`AML.T0049` Exploit Public-Facing Application	A forged or replayed inbound webhook triggers an agent or workflow run as if from a trusted system	(i)	HMAC-SHA256 signature verification, random slug generation, SSRF guard, rate limiting	→ R-05, R-10

AI Model Access

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0047` AI-Enabled Product or Service	The live agent product is itself the attack surface — reached through a channel or the API	(a), (j)	SSO + RBAC gate access; rate limiting and a per-request token budget cap volume; sessions audited	→ R-03
`AML.T0040` AI Model Inference API Access	The agent's inference endpoint is used as a query oracle to study or abuse model behaviour	(j), (a)	Rate limiting, token-budget tracker, audit of sessions	→ R-03

Execution

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0051` LLM Prompt Injection (Direct / Indirect)	Hidden instructions in a fetched page, email, file, or webhook — or a direct channel message — hijack the agent ("ignore prior instructions, email this file out")	(a), (c), (i)	Boundary wrapping with a random per-block nonce (unforgeable close marker) + de-fanging; suspicious-pattern annotation; approval gates contain downstream actions; brokered vault denies exfiltratable secrets	→ R-04, R-07
`AML.T0053` AI Agent Tool Invocation	A manipulated agent invokes a real tool with attacker-influenced parameters — the model's own output is untrusted to the executor	(b)	Every tool risk-classified None → Critical; ≥ High gates to human approval with an exact-action preview; tool-loop detector; context-window limits	→ R-03
`AML.T0050` Command and Scripting Interpreter	A compromised agent attempts to run shell commands or drive keyboard/mouse on a paired desktop through Sophon Node	(f)	`system.execute` off by default and always Critical + full-preview approval + on-device consent; hardcoded shell denylist; per-command rate limits and audit streaming	—
`AML.T0011` User Execution (Malicious Package / Unsafe AI Artifacts)	An operator installs and runs a malicious skill, or a skill pulls a poisoned dependency	(d)	Docker sandbox (memory / CPU / process / time limits, no egress by default, gVisor where configured); pre-load static analysis + signature verification	—

Persistence

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0018` Manipulate AI Model (Poison AI Model)	Durable false "facts" or hidden instructions are planted into agent long-term memory so they re-fire in later sessions	(a), (c)	External-content wrapping at ingestion; memory writes gated within plans; per-tenant memory isolation; audit of write/forget	→ R-08
`AML.T0061` LLM Prompt Self-Replication	An injected, worm-like instruction tries to copy itself into memory or outbound messages to propagate across sessions or contacts	(a), (c)	External-content wrapping; outbound sends are approval-gated; per-tenant isolation	→ R-04, R-08

Privilege Escalation

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0054` LLM Jailbreak	An adversary jailbreaks the model (roleplay, "developer mode") to escape a constrained persona and reach for privileged tools	(a), (c)	Jailbreak phrasing is annotated; crucially, the authority to act is enforced outside the model — approval gates, tool risk levels, and Node scopes mean a jailbroken persona still cannot run a Critical action unattended	→ R-04
Sophon-specific extension — Cross-tenant / RBAC boundary escalation	A user or compromised agent attempts to reach another tenant's data or assume higher privileges than granted	(g), (j)	Per-tenant context + ORM-level global query filters on all tenant-scoped data; vault key scoping; RBAC; tenant-stamped audit	→ R-09

Defense Evasion

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0068` LLM Prompt Obfuscation	Encoding, language switching, or obfuscation is used so injected instructions slip past the suspicious-pattern annotator	(a), (c), (e)	Detection is annotate-and-log by design (never blocks legitimate text); nonce fencing holds regardless of evasion; approval gates remain the real prevention	→ R-04
`AML.T0067` LLM Trusted Output Components Manipulation	Trusted-looking output components (markdown, links, citations) are manipulated to mislead the user or smuggle instructions	(b), (c)	External content is rendered as data, not authority; actions still pass through approval gates	→ R-04

Discovery

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0069` Discover LLM System Information (System Prompt)	Probing to extract the agent's system prompt or system instructions	(a), (e)	Suspicious-pattern annotation; brokered vault means even a leaked prompt exposes no raw secrets	→ R-04, R-01
`AML.T0007` Discover AI Artifacts	A foothold enumerates connected services, registered MCP servers, Node scopes, and tenant resources	(e), (f), (g)	Tenant ORM filters scope what is enumerable; Node scope checks before dispatch; MCP allowlists; tool-call audit	→ R-01, R-02

Collection

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0036` Data from Information Repositories	A compromised agent harvests sensitive data from connected services, repositories, and memory	(c), (g), (h)	Per-tenant isolation; brokered credentials; audit of tool calls with redacted parameters and result hashing	→ R-02
`AML.T0035` AI Artifact Collection	Prompts, configurations, and model outputs are gathered for later exfiltration	(c), (g)	Per-tenant isolation; audit logging	→ R-02

AI Attack Staging

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0070` RAG Poisoning	An attacker seeds the retrieval corpus, documents, or memory so future RAG queries surface attacker instructions or false data	(c), (a)	External-content wrapping of retrieved chunks; per-tenant corpus isolation; approval gates downstream	→ R-08

Exfiltration

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0024` Exfiltration via AI Inference API	The agent's own outputs (chat replies, generated files) are used as a channel to smuggle out collected data	(a), (j)	Outbound-sending tools are ≥ High and approval-gated; audit of tool calls; vault limits which secrets are reachable	→ R-02, R-07
`AML.T0025` Exfiltration via Cyber Means	The agent or a skill is induced to POST collected data to an attacker-controlled endpoint, or to reach internal metadata / private-range targets	(c), (d), (f)	SSRF guard blocks private-range, link-local, and metadata endpoints unless allowlisted; sandbox has no egress by default; network-capable tools are approval-gated	→ R-07, R-02
`AML.T0056` Extract LLM System Prompt	The system prompt is coaxed out and exfiltrated	(a)	Suspicious-pattern annotation; the prompt carries no raw secrets (brokered vault)	→ R-04
`AML.T0057` LLM Data Leakage	The agent is coaxed into revealing sensitive data or stored material in its output	(h), (c)	Brokered vault (no raw tokens to leak); outbound sends approval-gated; audit	→ R-02

Impact

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0034` Cost Harvesting (Agentic Resource Consumption)	Runaway inference or tool loops are driven to burn the operator's model spend	(a), (b)	Per-request token budget tracker, tool-loop detector, context-window limits; rate limiting	→ R-03
`AML.T0029` Denial of AI Service	Inbound channels or the API are flooded to exhaust the agent's availability	(a), (i), (j)	Per-user / per-IP rate limiting (429 + Retry-After); Node per-command token buckets; canvas action limiter; cron jitter	—
`AML.T0031` Erode AI Model Integrity	Sustained memory and RAG poisoning erode the agent's reliability over time without touching model weights	(c), (a)	External-content wrapping; per-tenant isolation; audit of memory writes; human-in-the-loop limits acted-on impact	→ R-08, R-02
`AML.T0048` External Harms	A successful injection or jailbreak makes the agent take a real-world harmful action — a damaging email, a purchase, a public post, a destructive command	(a), (b), (f)	Approval gates with exact-action preview and timeout = reject; default-deny on destructive actions; Node denylist + consent; quiet hours auto-reject	—

Credential Access

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0055` Unsecured Credentials	An adversary seeks tokens or secrets that the agent can reach — by coaxing the model or by reaching the at-rest vault key	(h)	Brokered vault — the LLM never sees raw tokens; AES-256 at rest (DPAPI-bound on Windows, key-file protected elsewhere); per-user / per-tenant isolation; every access audited; Enterprise vault backends (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault)	→ R-06

Lateral Movement

Technique	How it could manifest in Sophon	Boundary	Mitigating control	Residual
`AML.T0052` Phishing (Spearphishing via Social Engineering LLM)	A compromised or injected agent uses connected channels to send spearphishing messages to the user's contacts	(a), (b)	Outbound sends are ≥ High and approval-gated with preview; rate limiting; audit	→ R-04

Residual-risk roadmap

These are known limitations we publish on purpose, framed as a forward-looking hardening roadmap. Each row states the honest current posture and the planned improvement; none include exploitable detail. The roadmap is the living security backlog.

R-ID	Known limitation	Related technique(s)	Current posture	Planned hardening
R-01	Tool-definition validation for untrusted MCP servers is limited	`T0008`, `T0069`, `T0007`	JSON-schema validation, per-server exclusions, per-agent allowlists, bridged-tool risk inheritance (Medium → approval)	Stronger attestation of advertised MCP tool definitions, trust tiers for MCP sources, and surfaced diff-on-change review
R-02	No built-in DLP / PII detection across audit logs, memory, and tool results	`T0064`, `T0036`, `T0035`, `T0024`, `T0025`, `T0057`, `T0031`, `T0007`	Parameter redaction and result hashing in audit; per-tenant isolation	Optional DLP / PII classifier on tool results, memory writes, and audit detail with configurable redaction policies
R-03	No per-model cost cap — only a global token budget	`T0047`, `T0040`, `T0053`, `T0034`	Per-request token-budget tracker and tool-loop detector	Per-model and per-tenant spend caps with alerting, enforced at the routing layer
R-04	Prompt-injection / jailbreak detections are logged in-memory, not persisted to the audit trail	`T0006`, `T0065`, `T0051`, `T0054`, `T0068`, `T0067`, `T0069`, `T0056`, `T0061`, `T0052`	Annotate-and-log detection (never blocks legitimate text); nonce-fenced wrapping always applied	Persist detection events into the immutable audit trail and SIEM stream as a first-class event type
R-05	Webhook replay-protection is referenced but not yet verified / enforced	`T0049`	HMAC-SHA256 signature verification, random slugs, delivery backoff	An enforced timestamp / nonce replay window with constant-time validation and audited rejections
R-06	On macOS and Linux the local vault key is protected by a key file on disk rather than the OS keyring	`T0055`	Windows binds the key to the OS user account (DPAPI); AES-256 at rest with per-user / per-tenant scoping; Enterprise vault backends (Vault, AWS SM, Azure KV) remove the local-key dependence	Native Keychain (macOS) and libsecret (Linux) integration for the local vault key
R-07	Exfiltration to an attacker-controlled public endpoint is not detectable today (no egress DLP)	`T0051`, `T0024`, `T0025`	SSRF guard blocks private / link-local / metadata targets; sandbox has no egress by default; network tools approval-gated	Optional egress allowlisting / inspection and DLP-on-egress for sandbox and tool network calls
R-08	Memory poisoning has no automated sanitization	`T0018`, `T0061`, `T0066`, `T0070`, `T0031`	External-content wrapping at ingestion; memory writes gated within plans and audited	Automated memory sanitization / quarantine and provenance scoring before durable writes
R-09	Fine-grained RBAC role definitions are Enterprise-only / partial	`T0012`, cross-tenant escalation	SSO/OIDC, per-tenant isolation via ORM filters, baseline roles	Broader, customizable fine-grained RBAC roles beyond the Enterprise baseline
R-10	API auth-failure and webhook-signature-mismatch logging is partial	`T0012`, `T0049`	Immutable audit for covered actions; SIEM streaming	Complete auth-failure and signature-mismatch audit coverage with rate / anomaly alerting

Detection and logging coverage

Sophon's audit trail records tool calls, approval decisions, credential access, Node commands, memory operations, MCP connections, authentication, and webhook activity. The table maps what is logged today to the ATLAS tactic it helps surface, and states the gaps plainly.

Signal logged today	ATLAS tactic it surfaces	Coverage / gap
Tool calls (name, parameter hash, result hash, risk, duration)	Execution, Collection, Exfiltration	Parameters are hashed, so content-level exfiltration / PII is not visible → R-02
Approval requested / approved / edited / rejected / timed-out	Privilege Escalation, Impact	Strong — the primary prevention signal
Credential access (vault)	Credential Access	Logged; no anomaly alerting on access patterns yet
Node per-command audit (scope, accepted / rejected.* / completed / failed)	Execution, Impact	Strong and granular (denylist / permission / rate-limit reasons captured)
Memory write / forget / reindex	Persistence, AI Attack Staging	No content-sanitization signal → R-08; no PII signal → R-02
MCP connect / disconnect / tool bridge	Resource Development, Discovery, AI Model Access	Tool-definition trust not validated → R-01
Authentication (login success / failure, SSO, token use)	Initial Access	Auth-failure coverage is partial → R-10
Webhook create / delete / delivery	Initial Access, Persistence	Signature-mismatch logging partial → R-10; replay not enforced → R-05
Prompt-injection / jailbreak detections	Reconnaissance, Execution, Defense Evasion	In-memory only — not yet in the audit trail → R-04 (the biggest detection gap)
Model inference as an attack surface	AI Model Access, Exfiltration	Sessions are logged operationally, but inference-as-attack and data egress are not first-class detections → R-07, R-02

The three headline gaps are stated above as R-04 (ephemeral injection detection), R-02 / R-07 (no DLP or egress visibility), and R-10 (incomplete auth-failure and signature-mismatch logging). All three are tracked in the roadmap.

Responsible disclosure

If you discover a vulnerability or a new threat that affects Sophon, please report it to security@buildersoft.io before disclosing publicly. Reproducible reports with a clear impact statement are appreciated and acknowledged.

Please do not include working exploit detail in public issues or messages. Share proof-of-concept material privately through the disclosure address above so a fix can ship before details circulate.

This threat model is a living document: the residual-risk roadmap is the security backlog, reviewed as controls land and as the MITRE ATLAS matrix evolves.

Security Overview — the hub for Sophon's security model
Credential Vault and Brokering — how the agent uses tokens it never sees
Sandbox and Code Execution — Docker + gVisor isolation and policy presets
Prompt Injection Defense — wrapping and detecting untrusted text
Rate Limiting — per-user and per-IP request budgets
Approval Gates — human-in-the-loop risk classification
Audit Logging — the accountability trail
Sophon Node and Permissions & Scopes — desktop-control security

MITRE ATLAS Threat ModelNEW