Sophon Docs
Troubleshooting

Common Issues

Quick fixes for the problems most Sophon users hit — connection failures, provider errors, sandbox issues, Node pairing, MCP handshake.

This is a triage checklist for the things that go wrong most often. For each symptom, the fix. If none of these apply, check Diagnostic Logs or reach out via Support.

Gateway won't start

Symptom: sophon exits immediately, or the Docker container restarts in a loop.

Check, in order:

  1. Port in use — default is 8080. lsof -i :8080 (or netstat -ano | findstr 8080 on Windows) shows what's holding it.
  2. Database unreachable — if using Postgres, verify connectivity: psql "<connection-string>" -c "SELECT 1".
  3. Invalid appsettings.json — JSON syntax error, bad config path. Look for the first error in startup logs.
  4. License expired (Enterprise) — license check runs early; expired license with no grace period blocks startup. Renew or reset tier to Pro temporarily.
  5. Vector DB unreachable (Pro/Enterprise) — Qdrant down blocks memory init. Check QDRANT_URL.

Fast test: docker logs sophon-gateway or ~/.sophon/logs/gateway-YYYYMMDD.log.

Can't log in / authentication loop

Symptom: Dashboard keeps redirecting to login; CLI says "Unauthorized."

Check:

  1. Time drift — JWTs are time-sensitive. ntpdate or equivalent to sync the clock.
  2. Cookie / CORS — if Dashboard URL differs from API URL, confirm Sophon:Cors:AllowedOrigins includes the Dashboard URL.
  3. Expired token — CLI tokens expire after 30 days by default. sophon login again.
  4. SSO misconfiguration (Enterprise) — check OIDC discovery URL is reachable and the authority matches the token's iss claim.

Test provider connection fails

Symptom: Adding an LLM provider (Anthropic, OpenAI, …) and clicking Test returns an error.

Check:

  1. API key is correct — paste into a quick curl test against the provider directly.
  2. Outbound network — Gateway needs egress to api.anthropic.com / api.openai.com / etc. Corporate firewalls may block these; add allowlist entries.
  3. Rate limit already exceeded — the first call might 429 if you've been testing from other tools. Wait a minute.
  4. Wrong region (Azure OpenAI, Google) — Azure OpenAI endpoints are region-specific; Gemini has separate generativelanguage.googleapis.com vs aiplatform.googleapis.com paths.

Logs will show the full HTTP response from the provider.

Chat hangs — agent status stuck on "thinking"

Symptom: A chat message sits with an AgentStatus: thinking event and never completes.

Check:

  1. LLM provider latency — check provider status pages. Anthropic / OpenAI occasionally have slow responses.
  2. Task queue backed upOperations → System. If queue depth is high, either pause / drain or increase MaxConcurrentTasks.
  3. Tool call hanging — if the agent called a slow tool (web fetch, browser automation), tool execution can exceed its budget. Tasks → <task> shows current tool + duration. Cancel the task; next turn the agent will handle the "previous tool hung" gracefully.
  4. Budget exceeded — the pipeline short-circuits quietly. Check Settings → Models → Budget usage.

Skill execution errors

"Sandbox not available"

Docker isn't running or the sandbox image isn't pulled. Run:

docker ps                  # Is Docker up?
docker pull sophon/sandbox:latest

"Permission denied" in skill logs

The skill tried to write to a path outside its sandbox. Check the skill's manifest declares the required paths under sandbox.writablePaths.

"Network access denied"

Default sandbox has network disabled. Add "network": true and a "networkAllowlist" to the manifest.

Python import errors

Required packages aren't in the sandbox image. Either add them via pipDeps in the manifest or rebuild the sandbox image with the dependencies pre-installed.

Workflow not firing on cron

Symptom: Workflow is active, cron job exists, schedule time passes, nothing happens.

Check:

  1. Scheduler runningOperations → System → Scheduler status. Restart if degraded.
  2. Cron job enabled — sometimes paused inadvertently. sophon cron list shows status.
  3. Cron expression timezone — default is UTC unless otherwise specified. 0 9 * * * means 9am UTC, not local.
  4. Previous run still in flight — by default, concurrent executions are blocked. Check Workflows → Runs for a stuck run.
  5. Misfire — if the Gateway was down when the job should have fired, the misfire policy may have skipped it. Check Cron → History.

Approval requests not arriving on mobile

Symptom: Dashboard shows pending approvals; mobile doesn't get pushes.

Check:

  1. Device registered — mobile: More → Notifications. Should show "Registered" with a device name.
  2. approvalRequests category enabled — same screen.
  3. Quiet hours — if in quiet hours, pushes are suppressed (Critical are auto-rejected).
  4. Expo Push credentialsAdmin → Settings → Push Notifications. Check delivery activity for errors.
  5. Notification permission on device — iOS: Settings → Sophon → Notifications. Android: same path.

Sophon Node

"Waiting for approval" forever

Ticket expired (15 minutes default), or the Dashboard didn't receive the pair request. Regenerate pairing credentials.

"Failed to connect to Gateway"

WebSocket or TLS issue:

  • Gateway URL reachable from the Node machine? curl <gateway>/health
  • Self-signed cert? Install in OS trust store.
  • Firewall blocking WebSockets? Fall back to HTTP polling transport.

Commands time out

  • Node online but scopes missing. Check Settings → Devices → <node> → Permissions.
  • macOS Accessibility not granted — grant for sophon-node in System Settings.
  • Linux Wayland — ydotool daemon not running, or compositor blocks automation.

MCP handshake errors

"Invalid token" from external MCP server

The token in env.GITHUB_PERSONAL_ACCESS_TOKEN (or similar) isn't what the server expects. Test the token manually against the service's API before blaming MCP.

"Tool not found" when agent tries to use a bridged tool

External server connected at start but dropped. Settings → MCP → Connections → <server> → Activity for last heartbeat. Restart the server.

Claude Desktop doesn't see Sophon tools

claude_desktop_config.json may have a typo, or the stdio subprocess fails to start. Run sophon mcp serve --stdio manually in a terminal — if it errors there, fix the error first.

401 on SSE / HTTP MCP

Token revoked or token label mismatch. Regenerate in Settings → MCP → Server → Tokens.

Documents

Upload stuck on "extracting"

Large PDF with OCR. Look in logs for OCR errors — usually a Tesseract dependency issue. For PDFs with extractable text (not scans), extraction is fast; for scans it can take minutes per page.

Q&A returns no results

  1. Wait for status: ready on the document. Uploads take seconds to minutes to process.
  2. Vector index empty? On Personal tier, only keyword search works — semantic queries may miss.
  3. Reindex: sophon documents reindex <id>.

Voice

No audio on mobile

Device audio permission. Also check the TTS provider (Settings → Voice) — if no provider is configured, TTS silently degrades to text-only.

Voice orb stuck on "listening"

STT permission denied, or the microphone is in use by another app. Close other apps using the mic; re-grant permission.

Robotic / choppy TTS

Streaming sentence-buffered TTS chops on sentence boundaries. If your text has no clear punctuation, the buffer grows indefinitely and playback stutters. Add punctuation in the response (you can nudge the agent via SOUL.md).

Where to go next