Rate Limiting | Sophon

Sophon's Gateway applies a single global token-bucket rate limiter to incoming HTTP requests. It is partitioned: authenticated traffic is limited per user, anonymous traffic is limited per IP. The goal is to absorb short bursts while capping sustained abuse, without ever throttling the things that legitimately spike — inbound channel webhooks and the Dashboard's static assets.

The limiter is built on ASP.NET Core's standard System.Threading.RateLimiting token-bucket primitives, keyed by a Gateway-resolved partition per request.

How partitioning works

For each request the limiter resolves a partition key:

If the request carries a validated user identity (the NameIdentifier claim), the key is user:<id>.
Otherwise it falls back to ip:<address>, derived from the first hop of X-Forwarded-For (when present) or the connection's remote IP.

Each key gets its own independent token bucket. The per-user limit is the unspoofable guarantee — it is keyed on the authenticated claim. Per-IP limiting is best-effort for anonymous traffic, since X-Forwarded-For is client-supplied; it assumes a trusted reverse proxy populates that header.

The rate limiter runs after authentication and tenant resolution in the pipeline, so per-user keying always has the validated identity available. Rejected requests return HTTP 429 with a JSON body { "error": "rate_limited" } and a Retry-After header when a retry estimate is available.

What is exempt

Two categories of traffic bypass the limiter entirely:

Excluded path prefixes — health checks, metrics, SignalR hubs, and all inbound webhooks. Webhooks are exempt because platform channel receivers authenticate via HMAC signatures (not JWT) and arrive from shared platform egress IPs that would otherwise exhaust the anonymous per-IP bucket on a busy channel.
Dashboard SPA static assets — JS, CSS, and font chunks. Static file serving is registered before the rate limiter in the pipeline, so a matched file short-circuits and is never counted against a bucket. Non-file requests fall through to the limiter and auth layers as normal.

The default excluded prefixes are /api/health, /health, /metrics, /hubs, and /webhooks.

Configuration

The limiter reads the Sophon:RateLimit configuration section. Following Sophon's standard mapping, each key is also settable via a SOPHON__RateLimit__<Key> environment variable. All values below are the source defaults.

Key	Default	Description
`Enabled`	`true`	Master switch. When `false`, every partition resolves to a no-op limiter.
`PerUserPermitLimit`	`120`	Sustained requests per window for an authenticated user (tokens replenished per period).
`PerUserBurst`	`20`	Token-bucket capacity (max burst) for an authenticated user.
`PerUserWindowSeconds`	`60`	Replenishment period, in seconds, for the per-user bucket.
`PerIpPermitLimit`	`60`	Sustained requests per window for an anonymous IP.
`PerIpBurst`	`10`	Token-bucket capacity (max burst) for an anonymous IP.
`PerIpWindowSeconds`	`60`	Replenishment period, in seconds, for the per-IP bucket.
`ExcludedPaths`	see above	Request-path prefixes exempt from limiting.

In a token bucket, Burst is how many requests can fire back-to-back from a full bucket, and PermitLimit tokens are added back every WindowSeconds. With the defaults, an authenticated user can burst 20 requests, then sustain 120 per minute.

// appsettings.user.json (or via SOPHON__RateLimit__* env vars)
{
  "Sophon": {
    "RateLimit": {
      "Enabled": true,
      "PerUserPermitLimit": 120,
      "PerUserBurst": 20,
      "PerUserWindowSeconds": 60,
      "PerIpPermitLimit": 60,
      "PerIpBurst": 10,
      "PerIpWindowSeconds": 60
    }
  }
}

SignalR hub invocation limits

The /hubs exemption above only applies to the HTTP edge — a SignalR connection is one long-lived request, so an HTTP token bucket can't meaningfully meter it. Realtime traffic is limited at the invocation level instead: hub method calls (chat sends, audio streaming, subscription changes) are metered per authenticated user, with the same per-IP fallback for connections that haven't authenticated yet, using the same token-bucket semantics as the HTTP limits.

Hub invocation limits live in the same configuration section, under the Sophon:RateLimit:Hub* keys (settable via SOPHON__RateLimit__Hub* environment variables, like every other key here). Leave them at the defaults unless you run high-frequency realtime workloads — streaming voice in particular sends many small hub invocations per utterance.

Cron stagger prevents scheduled-job bursts

Rate limiting protects the HTTP edge, but scheduled jobs are an internal source of synchronized load. When many cron jobs share the same schedule they can fire at the same instant — a thundering herd. Sophon's scheduler applies an optional random stagger to spread them out.

The Sophon:Scheduling section exposes one key:

Key	Default	Description
`CronStaggerMaxSeconds`	`0`	Maximum random delay, in seconds, added to a job's start time. `0` disables stagger.

When set, each job receives a random offset in [0, CronStaggerMaxSeconds]. This meaningfully phase-shifts repeating interval triggers so they no longer fire in lockstep. For raw cron-expression triggers the offset only delays first eligibility — the expression itself dictates each recurrence and cannot be individually phase-shifted — so jitter is not applied there.

Where to go next

Cron & Scheduled Jobs — defining schedules and trigger types
Operations — running Sophon in production

Rate LimitingNEW