Extended Thinking & ReasoningNEW
Control how much the model reasons before answering across Anthropic, OpenAI, and Gemini.
Modern reasoning models can spend extra compute "thinking" before they answer — exploring options, checking their own work, planning multi-step solutions. Each provider exposes this differently: Anthropic has a token budget, OpenAI has a reasoning_effort string, Google Gemini has a thinking budget. Sophon hides those differences behind one thinking level so you pick a depth and Sophon maps it to whatever the active model expects.
Thinking levels
There are four levels:
| Level | Meaning |
|---|---|
| Auto | Preserve each provider's default behavior — Sophon sends no explicit reasoning control |
| Off | Suppress reasoning where the provider allows it (fastest, cheapest) |
| Fast | A modest reasoning budget — quick wins on harder prompts without much latency |
| Full | A large reasoning budget — best for complex, multi-step, or agentic problems |
Auto is the default. It is deliberately hands-off: Sophon omits the reasoning parameter entirely so the model behaves exactly as the provider intends out of the box.
How it maps to each provider
Sophon translates the level into the native control for the model you're using:
| Provider | Native control | Fast | Full | Off / Auto |
|---|---|---|---|---|
| Anthropic | thinking.budget_tokens | 4,096 tokens | 12,288 tokens | block omitted |
| Google Gemini | thinking_config.thinking_budget | 4,096 tokens | 24,576 tokens | budget 0 (Off) |
| OpenAI | reasoning_effort | low | high | minimal |
A few provider-specific details Sophon handles for you:
- Anthropic requires
temperature = 1whenever thinking is enabled, andmax_tokensmust exceed the thinking budget — Sophon adjusts both automatically and adds answer headroom on top of the budget. - OpenAI only sends
reasoning_effortfor reasoning models (the o-series and gpt-5 family). Theminimaleffort is gpt-5-only, so Sophon omits the parameter for o-series models where it doesn't apply. - Gemini thinking tokens are billed separately and don't count against the output limit. For Gemini 2.5+ models, Auto applies a sensible heuristic (small utility requests skip thinking entirely).
Only reasoning-capable models expose this. On a model that doesn't support reasoning — for example a standard chat model without a thinking mode — the level is ignored and the request runs normally. Choosing Full does not turn a non-reasoning model into a reasoning one.
Setting the thinking level
In the Dashboard chat, open the Thinking selector in the composer controls and pick Auto, Off, Fast, or Full. The choice rides along with the next message you send, so you can change depth per turn — Full for a thorny architecture question, Off for a quick lookup.
The selected level is sent to the Gateway with your message, parsed into the internal level, and applied by the provider when the model is invoked. Unknown or empty values fall back to Auto.
Seeing the reasoning trace
When a thinking level surfaces the model's reasoning, Sophon streams it on a separate channel from the answer text and renders it as a collapsible Thinking block above the message. The trace stays open while the response streams, then collapses once the final answer settles — you can expand or collapse it anytime.
Reasoning tokens are tracked as a distinct count. The message footer shows total tokens and, when present, the reasoning portion (for example, 1240 tokens · 380 reasoning). For Gemini, reasoning tokens are included in the billed output count and also reported as a display-only subset, mirroring how OpenAI reports reasoning tokens.
Cost and billing
Reasoning is not free — thinking tokens are real tokens you pay for. Full can multiply token usage on a single turn, so reserve it for problems that genuinely benefit. Per-provider reasoning-token counts flow into Sophon's usage tracking; see Insights for how token consumption is rolled up across sessions, agents, and providers.
Limits and gotchas
- Auto means "do nothing." Sophon does not force reasoning on or off in Auto — the model keeps its native default.
- Anthropic thinking + tools. When a request includes tools, Anthropic extended thinking is currently disabled, because Anthropic requires the signed thinking block to be replayed verbatim on the follow-up tool turn — a path Sophon hasn't enabled yet. Thinking applies on non-tool turns.
- Not every model qualifies. OpenAI reasoning effort only applies to the o-series and gpt-5 family; Gemini thinking applies to 2.5+ models. Non-reasoning models silently ignore the level.
- OpenAI-compatible providers vary. Reasoning controls aren't portable — some compatible endpoints reason automatically and reject an explicit effort, so Sophon only sends it where it's known to be accepted.
Where to go next
- Insights — token and reasoning-token tracking across your activity
- Models configuration — choosing which model and provider a request uses