Remote Control & Desktop Automation

Let an agent control a paired desktop — mouse, keyboard, apps, windows — through a Sophon Node.

The remote.control tool is the agent's bridge to a paired desktop. Through a Sophon Node, an agent can take screenshots, read the on-screen UI tree, move and click the mouse, type and press hotkeys, launch or focus applications, manage windows, read and write the clipboard, and — with explicit approval — run a shell command. The reasoning happens on the Gateway; the Node is a dumb executor that runs one command and returns the result.

If you have exactly one paired Node, the tool selects it automatically. With multiple Nodes, the agent passes a nodeId to target one specifically. Each call also accepts an optional timeout (1–120 seconds, default 30).

When to use it

Driving a desktop app with no API — a legacy tool, a vendor portal, an Excel macro.
Hands-on confirmation flows — clicking through a dialog on your home PC while you're away.
Visual tasks — capture the screen, query the UI, then act on what's there.

For pure web work, prefer Browser Automation — it's faster and more reliable than driving a real desktop browser.

Command categories

Every action maps to a Node command type, which is gated by a permission scope. The tool exposes these categories:

Category	Actions	Scope
Screen	`screen.capture`, `screen.region`, `screen.queryElements`, `screen.monitors`	`screen.capture`
Mouse	`mouse.move`, `mouse.click`, `mouse.scroll`	`input.control`
Keyboard	`keyboard.type`, `keyboard.press`, `keyboard.hotkey`	`input.control`
Clipboard	`clipboard.read`, `clipboard.write`	`clipboard.access`
Applications	`app.launch`, `app.close`, `app.focus`, `app.list`	`app.manage`
Windows	`window.list`, `window.move`, `window.minimize`, `window.maximize`	`window.manage`
Notifications	`notify.show`	`notify.send`
Shell	`shell.execute`	`system.execute`

screen.queryElements reads the OS accessibility tree of the focused window, so the agent can target UI by semantics instead of guessing pixel coordinates. See Available Commands for each command's parameters and response shape.

Permission model

Scopes are enforced Gateway-side, before a command ever reaches the Node. If the target Node doesn't hold the required scope, the Gateway returns a permission-denied result — the action never runs. A Node only allows the command scopes you granted it in the Dashboard, so a Node configured for screenshots and clicks simply cannot launch an app or run a shell command.

Configure these per-Node under Settings → Devices → (node) → Permissions. The wildcard node.command scope grants everything; for shared or sensitive machines, grant specific scopes instead. Full details are in Permissions & Scopes.

remote.control is rated Medium risk overall. Input and window actions require approval at the Medium threshold; launching or closing an app requires High approval; and shell.execute is Critical — every invocation routes through the approval gate with the full command shown to you, on top of a built-in denylist that blocks catastrophic patterns (recursive root deletes, disk formats, shutdown/reboot, and download-pipe-to-shell). There is no "trust this tool" override for shell execution.

Showing a canvas on the desktop

A related tool, node.present_canvas, opens the agent Canvas for the current session on the Node's screen. The Node launches the Gateway's Dashboard canvas page in its default browser, so everything the agent pushes via canvas rendering shows up live on the desktop rather than only in the Dashboard. This command requires the canvas.control scope and is consent-gated on the Node by default. See Canvas on a Node for the full flow.

Example agent flow

User: "Open Notepad on my office PC and paste my address."

Agent:
1. remote.control { action: "app.launch", params: { nameOrPath: "notepad.exe" } }
2. remote.control { action: "screen.capture", params: { quality: 70 } }   → confirms it's focused
3. remote.control { action: "keyboard.type", params: { text: "123 Example St" } }

Response: "Done — Notepad is open with your address typed."

When the agent captures the screen, the result is attached inline in chat so you can see exactly what the Node sees. While the agent drives the Node, the chat UI shows an inline action panel logging each step.

Where to go next

Sophon Node — Overview — architecture, pairing, lifecycle
Available Commands — full parameter reference
Permissions & Scopes — what each scope grants
Canvas — interactive surfaces the agent can render

Remote Control & Desktop AutomationNEW