Gateway: mid-stream socket drops (no SSE ping forwarding) + silent body-param stripping breaks Anthropic betas

# PostHog Code gateway — API integrator report: mid-stream socket drops & Anthropic beta passthrough

**Date:** 2026-07-02
**Gateway:** `https://gateway.us.posthog.com/posthog_code` (US region)
**Client:** Anthropic TypeScript SDK (v0.91.x) over Bun's `fetch`, streaming SSE from `/v1/messages`
**Models:** `claude-fable-5` (primary), `claude-opus-4-8`

All findings below were reproduced today against the live gateway with an OAuth access token from the PostHog Code beta. Repro commands use `$POSTHOG_TOKEN` as a placeholder.

---

## Finding 1 — Anthropic `ping` keepalives are not forwarded → mid-stream TCP kills

### Symptom

During long silent stretches of a streaming response (typically extended thinking on `claude-fable-5` / Opus, where Anthropic emits no content deltas for tens of seconds), the TCP connection is closed by an intermediary. The client surfaces:

```
Error: The socket connection was closed unexpectedly. For more information,
pass `verbose: true` in the second argument to fetch()
```

This happens **mid-response** — after `message_start` and often after partial thinking/text has streamed — so it can't be handled as a simple pre-request connection retry.

### Corroboration: the official PostHog Code app shows the same stall

This is not client-specific. In the official PostHog Code desktop app, on the same account and models, we regularly see turns where **nothing arrives for 2–3 minutes**: the UI stays in its "working" state with no output and no error, then eventually recovers or produces a fresh answer. That is exactly the signature of this failure absorbed silently — the stream dies (or goes irrecoverably quiet) mid-turn, and the client retries/re-issues the request without surfacing anything to the user. The user experience is a silent multi-minute hang; the API-integrator experience is the raw socket error above. Same gateway path, same root cause, two presentations.

If you can correlate server-side: look for connections on `/v1/messages` closed upstream-idle after ~60–120s of zero payload bytes during active turns, paired with a same-conversation retry request arriving seconds later.

### What we believe is happening

- `api.anthropic.com` emits SSE `ping` events during quiet periods precisely to keep intermediaries from idle-killing the connection.
- The gateway does not forward these `ping` events to the client (we have never observed one in captured SSE traffic through the gateway, while they are routine against the official endpoint).
- With zero bytes flowing, an LB/proxy on the gateway path enforces an idle timeout and resets the connection. The longer the model thinks, the higher the probability of a kill.

### Impact

- Long-thinking turns die non-deterministically. Anything that raises client-side stream-idle tolerances (which integrators must do anyway, because without pings a quiet-but-alive stream is indistinguishable from a dead one) makes the raw socket error *more* likely to surface instead of a clean client timeout.
- The failure text is transport-level and vendor-specific (Bun/undici/node each produce different strings), so generic retry classifiers frequently treat it as fatal rather than transient. Integrators each have to discover and special-case it.
- Where a client absorbs the failure with a silent retry (as the official app appears to), the user pays twice: minutes of dead air on the UI, and the partial turn's input tokens re-billed on the retry.

### Suggested fix

Forward Anthropic's `ping` events verbatim, or synthesize an SSE comment (`: keepalive\n\n`) every ~15–30s of upstream silence. Either keeps the connection warm end-to-end and is invisible to SSE parsers.

### Client-side mitigation we applied (works, but shouldn't be needed)

We now wrap the response stream and reclassify mid-stream socket deaths as transient connection errors so our retry layer restarts the turn. That re-bills the partial turn's tokens on every retry — server-side keepalives would eliminate both the failure and the re-billing.

---

## Finding 2 — Request body is schema-filtered: unknown params silently dropped → Anthropic betas unusable

### Symptom

The gateway strips top-level request-body parameters it doesn't recognize before forwarding to Anthropic, instead of passing them through or rejecting them. This silently disables Anthropic beta features that are negotiated via new body params + `anthropic-beta` header.

Concrete case: **server-side fallback** (`server-side-fallback-2026-06-01`, [refusals-and-fallback docs](https://platform.claude.com/docs/en/build-with-claude/refusals-and-fallback)) — the feature that retries a Fable-5 classifier refusal on another model inside one API call.

### Evidence (all reproduced 2026-07-02)

The key probe: a `fallbacks` chain that is **guaranteed to 400 on `api.anthropic.com`** (fallback model identical to the primary — the API requires distinct entries) returns **200** through the gateway. The only explanation is that `fallbacks` never reaches Anthropic.

| # | Request | Expected (direct Anthropic) | Observed via gateway |
|---|---------|------------------------------|----------------------|
| 1 | `fallbacks: [{model: <same as primary>}]` + `anthropic-beta: server-side-fallback-2026-06-01` | 400 "must be distinct" | **200**, normal completion |
| 2 | unknown top-level param `frobnicate: true`, no beta header | 400 "Unexpected value(s)" | **200**, normal completion |
| 3 | `fallbacks: "bogus"` (wrong type) + beta header | 400 type error | **200**, normal completion |
| 4 | same as #2 but without `?beta=true` query param | 400 | **200** (both gateway paths filter) |

Repro for probe 1:

```bash
curl -sS "https://gateway.us.posthog.com/posthog_code/v1/messages?beta=true" \
  -H "x-api-key: $POSTHOG_TOKEN" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: server-side-fallback-2026-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-fable-5",
    "max_tokens": 16,
    "fallbacks": [{"model": "claude-fable-5"}],
    "messages": [{"role": "user", "content": "Say ok"}]
  }'
# → 200 with a normal assistant message. Direct to api.anthropic.com this is a hard 400.
```

Note the `anthropic-beta` **header** appears to pass through fine (no error, no behavior change) — it's specifically the **body** param that's filtered.

### Impact

- Server-side fallback cannot be used through the gateway at all. Given Fable-5's classifier refusals are exactly the problem your own PR [PostHog/code#3078](https://github.com/PostHog/code/pull/3078) addresses client-side (SDK `fallbackModel`), forwarding `fallbacks` would give every API integrator the same protection in one round trip, with Anthropic's fallback-credit billing (a pre-output refusal attempt costs nothing) instead of a full client-side re-send.
- More generally: any future Anthropic beta that introduces body params will be silently broken through the gateway until each param is individually whitelisted. Silent dropping is the worst failure mode — a 400 would at least tell integrators the feature is unsupported.

### Suggested fix

Whitelist `fallbacks` (and ideally adopt pass-through-with-denylist rather than parse-and-rebuild-with-allowlist for `/v1/messages` bodies). If filtering must stay, return a 4xx or a warning header when a param is dropped.

---

## Finding 3 (related, previously observed) — `context_management` is half-parsed → upstream 400

When a request contains Anthropic's `context_management` body param (e.g. `{"edits":[{"type":"clear_thinking_20251015"}]}`) alongside `thinking: {"type":"adaptive"}`, the gateway forwards `context_management` but drops the paired `thinking` field. Anthropic then rejects the request:

```
400 ... `clear_thinking_20251015` strategy requires thinking to be enabled or adaptive
```

So the body filtering is not only dropping unknown params (Finding 2) — for at least one known param it forwards the param while dropping a field it depends on. We currently strip `context_management` client-side to avoid the 400. Consistent pass-through would fix this too.

### Also worth noting: unsigned thinking blocks

Thinking blocks returned through the gateway carry an empty `signature` (`signature: ""`), while the official endpoint returns signed blocks. Replaying such a block to Anthropic on the next turn produces a 400, so integrators must strip or downgrade thinking blocks in multi-turn conversations. If the gateway is re-serializing responses, preserving the original signature bytes would restore standard multi-turn replay behavior.

---

## Summary of asks, in priority order

1. **Keepalives:** forward Anthropic `ping` SSE events (or inject SSE comments) so idle LB timeouts stop killing long thinking turns mid-stream.
2. **`fallbacks` passthrough:** whitelist the `fallbacks` body param + `server-side-fallback-2026-06-01` beta so classifier refusals can fall back server-side (the API-integrator counterpart of PostHog/code#3078).
3. **Body filtering policy:** prefer pass-through; if filtering stays, fail loudly instead of silently dropping params, and forward `context_management` together with its paired `thinking` field.
4. **Thinking signatures:** preserve `signature` on thinking blocks for standard multi-turn replay.


#	Request	Expected (direct Anthropic)	Observed via gateway
1	`fallbacks: [{model: <same as primary>}]` + `anthropic-beta: server-side-fallback-2026-06-01`	400 "must be distinct"	200, normal completion
2	unknown top-level param `frobnicate: true`, no beta header	400 "Unexpected value(s)"	200, normal completion
3	`fallbacks: "bogus"` (wrong type) + beta header	400 type error	200, normal completion
4	same as #2 but without `?beta=true` query param	400	200 (both gateway paths filter)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gateway: mid-stream socket drops (no SSE ping forwarding) + silent body-param stripping breaks Anthropic betas #3092

PostHog Code gateway — API integrator report: mid-stream socket drops & Anthropic beta passthrough

Finding 1 — Anthropic `ping` keepalives are not forwarded → mid-stream TCP kills

Symptom

Corroboration: the official PostHog Code app shows the same stall

What we believe is happening

Impact

Suggested fix

Client-side mitigation we applied (works, but shouldn't be needed)

Finding 2 — Request body is schema-filtered: unknown params silently dropped → Anthropic betas unusable

Symptom

Evidence (all reproduced 2026-07-02)

Impact

Suggested fix

Finding 3 (related, previously observed) — `context_management` is half-parsed → upstream 400

Also worth noting: unsigned thinking blocks

Summary of asks, in priority order

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Gateway: mid-stream socket drops (no SSE ping forwarding) + silent body-param stripping breaks Anthropic betas #3092

Description

PostHog Code gateway — API integrator report: mid-stream socket drops & Anthropic beta passthrough

Finding 1 — Anthropic ping keepalives are not forwarded → mid-stream TCP kills

Symptom

Corroboration: the official PostHog Code app shows the same stall

What we believe is happening

Impact

Suggested fix

Client-side mitigation we applied (works, but shouldn't be needed)

Finding 2 — Request body is schema-filtered: unknown params silently dropped → Anthropic betas unusable

Symptom

Evidence (all reproduced 2026-07-02)

Impact

Suggested fix

Finding 3 (related, previously observed) — context_management is half-parsed → upstream 400

Also worth noting: unsigned thinking blocks

Summary of asks, in priority order

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Finding 1 — Anthropic `ping` keepalives are not forwarded → mid-stream TCP kills

Finding 3 (related, previously observed) — `context_management` is half-parsed → upstream 400