Sandbox & Code Execution Isolation

How Sophon isolates code execution and risky tools using containers and resource limits.

When an agent needs to run code — a Python data-crunch, a generated C# snippet, a shell command from a marketplace skill — Sophon does not run it directly on the host. Untrusted and generated code goes into an isolated Docker container with resource limits, network policy, and automatic cleanup. The same ISandboxOrchestrator powers both ad-hoc code.execute calls and sandboxed skills.

This page explains the isolation model and contrasts it with running directly on the host.

The three execution tools

Sophon exposes execution through three distinct tools, each with a different risk level. Risk level drives approval gates.

Tool	What it runs	Where	Risk
`code.execute`	Python or C# code	Sandboxed container	High
`os.execute_sandboxed`	A shell command, wrapped and run inside the sandbox	Sandboxed container	High
`os.execute`	A shell command (bash, sh, cmd, powershell)	Directly on the host OS	Critical

The first two are gated at High — they pause and ask a human before running. os.execute is Critical: it touches the real machine with full host privileges, so it always requires explicit approval and confirmation. Prefer the sandboxed path whenever possible; os.execute_sandboxed even tells the model to fall back to os.execute only when Docker is unavailable.

os.execute runs with the same privileges as the Sophon process. Treat every approval for it as you would handing someone a terminal on your server.

What "sandboxed" means

The Docker-based orchestrator (DockerSandboxOrchestrator) creates a fresh container per execution:

Languages — Python runs on a python:3.13-alpine image; C# runs on the .NET 10 SDK Alpine image via dotnet run. A minimal project file and an offline NuGet config are generated for C# so restores stay local.
Network policy — the container's network mode is none (fully isolated) unless network access is explicitly required, in which case it switches to bridge. Python that imports third-party packages triggers a pip install, which enables network for that run.
Resource limits — memory cap, CPU quota, and a process-count limit are applied to every container, alongside a hard wall-clock timeout that kills the container if exceeded.
Cleanup — the per-run workspace directory and the container are removed afterward, success or failure.
gVisor — where configured, containers run under the gVisor runtime, a user-space kernel that intercepts syscalls so sandboxed code never talks directly to the host kernel.

Limits and knobs

A sandbox run is described by a request object. These are the defaults a single code.execute call uses:

Knob	Default	Notes
`Timeout`	30 seconds	Container is killed on expiry
`MemoryLimitBytes`	256 MB	Hard memory cap
`CpuLimitPercent`	50	Fraction of one CPU
`NetworkEnabled`	off	`none` network mode until enabled
`NetworkAllowList`	empty	Optional outbound host allowlist
`PythonPackages`	none	Auto-resolved from imports; enables network for pip

Skills can declare their own limits through a security policy. The built-in presets give a sense of the range — for example the standard preset is 512 MB / 1.0 CPU / 256 PIDs / 300 s with no network and a read-only root filesystem, while minimal drops to 128 MB / 0.25 CPU / 30 s and locked-down runs as nobody with a seccomp profile, 256 MB, and a 60 s timeout. Network-enabled presets either open outbound traffic or restrict it to an explicit host allowlist.

Container security policies can also drop all Linux capabilities, mount a read-only root filesystem with only specific writable paths (/workspace, /tmp), and apply a restrictive seccomp profile. The exact policy depends on the preset or per-skill manifest in effect.

Sandboxed vs host execution

	Sandboxed (Docker)	Host (`os.execute`)
Filesystem	Isolated workspace only	Full host filesystem
Network	`none` by default, opt-in `bridge`	Whatever the host has
Resource limits	Memory, CPU, PIDs, timeout	Timeout only
Privileges	Reduced, optionally gVisor-isolated	Same as the Sophon process
Risk level	High	Critical

Process fallback

If Docker is not available on the host, the orchestrator can fall back to a process-based runner that executes the code as a child process with a temp workspace and a timeout. This keeps trusted bundled skills working, but it provides reduced isolation — there is no container, no filesystem boundary beyond the temp directory, and the process inherits host network access. Untrusted or network-isolated code should always run under the Docker sandbox.

Where to go next

Approval Gates & Risk Levels — how High and Critical tools get gated
Skills — how sandboxed skills declare runtime, network, and resource policy
Claude Code — running coding agents against your environment

Sandbox & Code Execution IsolationNEW