Sophon Docs
Features

Browser Automation

Playwright-powered browser sessions for navigating, scraping, screenshotting, and filling forms on the live web.

Sophon's browser automation lets agents drive a real Chromium browser via Playwright. They can navigate, screenshot, click, fill forms, extract text, and persist login state between calls. It's the difference between "asking the web" and "using the web."

When to use it

  • Scraping sites that don't have a clean API
  • Testing your own web app under agent-driven flows
  • Forms — login, submit, extract confirmation
  • Research where screenshots or rendered content matter
  • Automations driven from workflows (e.g., "daily: log into portal X, download this week's report, save to documents")

For pure text search / fetch, use the web.search and web.scrape skills instead — they're faster and don't spin up a full browser.

Browser sessions

Every interaction happens inside a browser session:

  • Per-user — your sessions, your storage, your cookies.
  • Per-profile — within a user, you can have multiple named profiles (default, work-sso, customer-portal). Each has its own storage state — cookies, localStorage, sessionStorage.
  • Persistent — close a session and reopen it later; you're still logged in.
  • Viewport — 1280 × 720 by default, configurable per session.
  • Headless — runs headless; for manual interaction there's a debug-view that renders the session in the Dashboard (Beta).

The tools

Agents have a browser.* tool family:

ToolPurpose
browser.create_sessionStart a new session on a named profile
browser.navigateGo to a URL; returns final URL + page title
browser.screenshotCapture full page or a viewport; returns a PNG
browser.clickClick by selector or accessibility label
browser.fillFill form fields (text inputs, selects, checkboxes)
browser.extractExtract text / attributes via CSS or XPath
browser.wait_forWait for a selector / URL / network-idle
browser.evaluateRun a small JavaScript expression in page context
browser.close_sessionClean up

Each tool is scoped to the active browser session; session IDs are opaque tokens the agent tracks across a conversation.

Storage state and profiles

Profiles are how you keep logged in:

  1. First time — the agent navigates to the site, prompts you to authenticate.
  2. On success, Playwright captures the storage state (cookies, localStorage) and writes it to ~/.sophon/browser/profiles/<profile>.json.
  3. Next session on that profile, the browser loads with the storage state — you're already authenticated.

Re-auth happens when the stored session expires or is revoked server-side. The next navigation will prompt you to log in again.

Never share a profile between users. Profiles are scoped per-user in multi-tenant deployments; sharing would leak credentials.

Example agent flow

User: "Download the Q2 sales report from the portal and save it to documents."

Agent:
1. browser.create_session(profile: "customer-portal")
2. browser.navigate(url: "https://portal.example.com/reports")
3. browser.wait_for(selector: ".report-row")
4. browser.click(selector: ".report-row[data-quarter='Q2']")
5. browser.click(selector: ".download-button")
6. browser.extract(selector: ".download-link", attr: "href")
   → "https://portal.example.com/download/q2-sales.pdf"
7. [calls document.upload with the URL]
8. browser.close_session()

Response: "Done — Q2 sales report is in your documents library."

If step 1 fails because the profile isn't authenticated, the agent prompts you to log in, then retries.

Screenshots in chat

browser.screenshot returns a PNG that's inlined in the chat stream. Useful when the agent wants to show you what it sees. The Dashboard chat renderer displays screenshots inline; the Mobile app does too.

Running in workflows

Workflows can include browser-automation steps just like any other tool call. A common pattern:

Trigger: Cron (every Monday 09:00)
→ Skill: browser.create_session (profile: portal)
→ Skill: browser.navigate (url: /reports)
→ Skill: browser.extract (table rows)
→ Skill: document.create (spreadsheet from extracted rows)
→ Skill: gmail.send_email (attach spreadsheet)

Sandbox and security

Playwright's Chromium runs inside the skills sandbox — Docker + gVisor — not on the Gateway host.

  • No access to the host filesystem.
  • Network policy governed by the browser skill's manifest — by default, outbound HTTP only.
  • Downloads are routed into the skills sandbox filesystem and can be explicitly promoted to documents.
  • SSRF guard applies: requests to private ranges (10.x, 127.x, 192.168.x, 169.254.x, metadata.google.internal) are blocked.

Installation prerequisites

Playwright requires Chromium. Sophon attempts to install it on first use, or you can pre-install:

sophon browser setup          # Installs Playwright Chromium if missing
# Under the hood this is equivalent to `playwright install chromium` in the Gateway container.

If the setup step fails (common on restricted networks), the browser skill stays disabled and falls back gracefully — tools that need browser return "browser unavailable" errors rather than crashing the agent.

Limits

  • 1 active browser session per agent per conversation (you can hand off to another session but not run both concurrently in one agent).
  • Screenshot max: 10 MB per capture.
  • Page load timeout: 30 s (configurable per call).
  • JavaScript evaluation is restricted to expressions, not full scripts. For complex logic, drop into an Execute code node in a workflow.
  • Playwright's own tracing (video + network har) is off by default. Enable per-session in Settings → Browser if you want to capture a full replay.

Where to go next

  • Skills — the web.scrape skill for lighter-weight scraping
  • Documents — promote downloaded files
  • Workflows — wire browser steps into scheduled automation