Common Issues
Quick fixes for the problems most Sophon users hit — connection failures, provider errors, sandbox issues, Node pairing, MCP handshake.
This is a triage checklist for the things that go wrong most often. For each symptom, the fix. If none of these apply, check Diagnostic Logs or reach out via Support.
Gateway won't start
Symptom: sophon exits immediately, or the Docker container restarts in a loop.
Check, in order:
- Port in use — default is 8080.
lsof -i :8080(ornetstat -ano | findstr 8080on Windows) shows what's holding it. - Database unreachable — if using Postgres, verify connectivity:
psql "<connection-string>" -c "SELECT 1". - Invalid appsettings.json — JSON syntax error, bad config path. Look for the first error in startup logs.
- License expired (Enterprise) — license check runs early; expired license with no grace period blocks startup. Renew or reset tier to Pro temporarily.
- Vector DB unreachable (Pro/Enterprise) — Qdrant down blocks memory init. Check
QDRANT_URL.
Fast test: docker logs sophon-gateway or ~/.sophon/logs/gateway-YYYYMMDD.log.
Can't log in / authentication loop
Symptom: Dashboard keeps redirecting to login; CLI says "Unauthorized."
Check:
- Time drift — JWTs are time-sensitive.
ntpdateor equivalent to sync the clock. - Cookie / CORS — if Dashboard URL differs from API URL, confirm
Sophon:Cors:AllowedOriginsincludes the Dashboard URL. - Expired token — CLI tokens expire after 30 days by default.
sophon loginagain. - SSO misconfiguration (Enterprise) — check OIDC discovery URL is reachable and the authority matches the token's
issclaim.
Test provider connection fails
Symptom: Adding an LLM provider (Anthropic, OpenAI, …) and clicking Test returns an error.
Check:
- API key is correct — paste into a quick curl test against the provider directly.
- Outbound network — Gateway needs egress to
api.anthropic.com/api.openai.com/ etc. Corporate firewalls may block these; add allowlist entries. - Rate limit already exceeded — the first call might 429 if you've been testing from other tools. Wait a minute.
- Wrong region (Azure OpenAI, Google) — Azure OpenAI endpoints are region-specific; Gemini has separate
generativelanguage.googleapis.comvsaiplatform.googleapis.compaths.
Logs will show the full HTTP response from the provider.
Chat hangs — agent status stuck on "thinking"
Symptom: A chat message sits with an AgentStatus: thinking event and never completes.
Check:
- LLM provider latency — check provider status pages. Anthropic / OpenAI occasionally have slow responses.
- Task queue backed up — Operations → System. If queue depth is high, either pause / drain or increase
MaxConcurrentTasks. - Tool call hanging — if the agent called a slow tool (web fetch, browser automation), tool execution can exceed its budget. Tasks → <task> shows current tool + duration. Cancel the task; next turn the agent will handle the "previous tool hung" gracefully.
- Budget exceeded — the pipeline short-circuits quietly. Check Settings → Models → Budget usage.
Skill execution errors
"Sandbox not available"
Docker isn't running or the sandbox image isn't pulled. Run:
docker ps # Is Docker up?
docker pull sophon/sandbox:latest"Permission denied" in skill logs
The skill tried to write to a path outside its sandbox. Check the skill's manifest declares the required paths under sandbox.writablePaths.
"Network access denied"
Default sandbox has network disabled. Add "network": true and a "networkAllowlist" to the manifest.
Python import errors
Required packages aren't in the sandbox image. Either add them via pipDeps in the manifest or rebuild the sandbox image with the dependencies pre-installed.
Workflow not firing on cron
Symptom: Workflow is active, cron job exists, schedule time passes, nothing happens.
Check:
- Scheduler running — Operations → System → Scheduler status. Restart if degraded.
- Cron job enabled — sometimes paused inadvertently.
sophon cron listshows status. - Cron expression timezone — default is UTC unless otherwise specified.
0 9 * * *means 9am UTC, not local. - Previous run still in flight — by default, concurrent executions are blocked. Check Workflows → Runs for a stuck run.
- Misfire — if the Gateway was down when the job should have fired, the misfire policy may have skipped it. Check Cron → History.
Approval requests not arriving on mobile
Symptom: Dashboard shows pending approvals; mobile doesn't get pushes.
Check:
- Device registered — mobile: More → Notifications. Should show "Registered" with a device name.
approvalRequestscategory enabled — same screen.- Quiet hours — if in quiet hours, pushes are suppressed (Critical are auto-rejected).
- Expo Push credentials — Admin → Settings → Push Notifications. Check delivery activity for errors.
- Notification permission on device — iOS: Settings → Sophon → Notifications. Android: same path.
Sophon Node
"Waiting for approval" forever
Ticket expired (15 minutes default), or the Dashboard didn't receive the pair request. Regenerate pairing credentials.
"Failed to connect to Gateway"
WebSocket or TLS issue:
- Gateway URL reachable from the Node machine?
curl <gateway>/health - Self-signed cert? Install in OS trust store.
- Firewall blocking WebSockets? Fall back to HTTP polling transport.
Commands time out
- Node online but scopes missing. Check Settings → Devices → <node> → Permissions.
- macOS Accessibility not granted — grant for
sophon-nodein System Settings. - Linux Wayland —
ydotooldaemon not running, or compositor blocks automation.
MCP handshake errors
"Invalid token" from external MCP server
The token in env.GITHUB_PERSONAL_ACCESS_TOKEN (or similar) isn't what the server expects. Test the token manually against the service's API before blaming MCP.
"Tool not found" when agent tries to use a bridged tool
External server connected at start but dropped. Settings → MCP → Connections → <server> → Activity for last heartbeat. Restart the server.
Claude Desktop doesn't see Sophon tools
claude_desktop_config.json may have a typo, or the stdio subprocess fails to start. Run sophon mcp serve --stdio manually in a terminal — if it errors there, fix the error first.
401 on SSE / HTTP MCP
Token revoked or token label mismatch. Regenerate in Settings → MCP → Server → Tokens.
Documents
Upload stuck on "extracting"
Large PDF with OCR. Look in logs for OCR errors — usually a Tesseract dependency issue. For PDFs with extractable text (not scans), extraction is fast; for scans it can take minutes per page.
Q&A returns no results
- Wait for status: ready on the document. Uploads take seconds to minutes to process.
- Vector index empty? On Personal tier, only keyword search works — semantic queries may miss.
- Reindex:
sophon documents reindex <id>.
Voice
No audio on mobile
Device audio permission. Also check the TTS provider (Settings → Voice) — if no provider is configured, TTS silently degrades to text-only.
Voice orb stuck on "listening"
STT permission denied, or the microphone is in use by another app. Close other apps using the mic; re-grant permission.
Robotic / choppy TTS
Streaming sentence-buffered TTS chops on sentence boundaries. If your text has no clear punctuation, the buffer grows indefinitely and playback stutters. Add punctuation in the response (you can nudge the agent via SOUL.md).
Where to go next
- Diagnostic Logs — where to look for error details
- Support — escalation path
- Operations — admin-level health dashboards