Sandbox & Code Execution IsolationNEW
How Sophon isolates code execution and risky tools using containers and resource limits.
When an agent needs to run code — a Python data-crunch, a generated C# snippet, a shell command from a marketplace skill — Sophon does not run it directly on the host. Untrusted and generated code goes into an isolated Docker container with resource limits, network policy, and automatic cleanup. The same ISandboxOrchestrator powers both ad-hoc code.execute calls and sandboxed skills.
This page explains the isolation model and contrasts it with running directly on the host.
The three execution tools
Sophon exposes execution through three distinct tools, each with a different risk level. Risk level drives approval gates.
| Tool | What it runs | Where | Risk |
|---|---|---|---|
code.execute | Python or C# code | Sandboxed container | High |
os.execute_sandboxed | A shell command, wrapped and run inside the sandbox | Sandboxed container | High |
os.execute | A shell command (bash, sh, cmd, powershell) | Directly on the host OS | Critical |
The first two are gated at High — they pause and ask a human before running. os.execute is Critical: it touches the real machine with full host privileges, so it always requires explicit approval and confirmation. Prefer the sandboxed path whenever possible; os.execute_sandboxed even tells the model to fall back to os.execute only when Docker is unavailable.
os.execute runs with the same privileges as the Sophon process. Treat every approval for it as you would handing someone a terminal on your server.
What "sandboxed" means
The Docker-based orchestrator (DockerSandboxOrchestrator) creates a fresh container per execution:
- Languages — Python runs on a
python:3.13-alpineimage; C# runs on the .NET 10 SDK Alpine image viadotnet run. A minimal project file and an offline NuGet config are generated for C# so restores stay local. - Network policy — the container's network mode is
none(fully isolated) unless network access is explicitly required, in which case it switches tobridge. Python that imports third-party packages triggers a pip install, which enables network for that run. - Resource limits — memory cap, CPU quota, and a process-count limit are applied to every container, alongside a hard wall-clock timeout that kills the container if exceeded.
- Cleanup — the per-run workspace directory and the container are removed afterward, success or failure.
- gVisor — where configured, containers run under the gVisor runtime, a user-space kernel that intercepts syscalls so sandboxed code never talks directly to the host kernel.
Limits and knobs
A sandbox run is described by a request object. These are the defaults a single code.execute call uses:
| Knob | Default | Notes |
|---|---|---|
Timeout | 30 seconds | Container is killed on expiry |
MemoryLimitBytes | 256 MB | Hard memory cap |
CpuLimitPercent | 50 | Fraction of one CPU |
NetworkEnabled | off | none network mode until enabled |
NetworkAllowList | empty | Optional outbound host allowlist |
PythonPackages | none | Auto-resolved from imports; enables network for pip |
Skills can declare their own limits through a security policy. The built-in presets give a sense of the range — for example the standard preset is 512 MB / 1.0 CPU / 256 PIDs / 300 s with no network and a read-only root filesystem, while minimal drops to 128 MB / 0.25 CPU / 30 s and locked-down runs as nobody with a seccomp profile, 256 MB, and a 60 s timeout. Network-enabled presets either open outbound traffic or restrict it to an explicit host allowlist.
Container security policies can also drop all Linux capabilities, mount a read-only root filesystem with only specific writable paths (/workspace, /tmp), and apply a restrictive seccomp profile. The exact policy depends on the preset or per-skill manifest in effect.
Sandboxed vs host execution
| Sandboxed (Docker) | Host (os.execute) | |
|---|---|---|
| Filesystem | Isolated workspace only | Full host filesystem |
| Network | none by default, opt-in bridge | Whatever the host has |
| Resource limits | Memory, CPU, PIDs, timeout | Timeout only |
| Privileges | Reduced, optionally gVisor-isolated | Same as the Sophon process |
| Risk level | High | Critical |
Process fallback
If Docker is not available on the host, the orchestrator can fall back to a process-based runner that executes the code as a child process with a temp workspace and a timeout. This keeps trusted bundled skills working, but it provides reduced isolation — there is no container, no filesystem boundary beyond the temp directory, and the process inherits host network access. Untrusted or network-isolated code should always run under the Docker sandbox.
Where to go next
- Approval Gates & Risk Levels — how High and Critical tools get gated
- Skills — how sandboxed skills declare runtime, network, and resource policy
- Claude Code — running coding agents against your environment