feat(agent): add refusal handling and model fallback#3078
Conversation
|
React Doctor found no issues in the changed files. 🎉 Reviewed by React Doctor for commit |
|
Reviews (1): Last reviewed commit: "add refusal handling and model fallback" | Re-trigger Greptile |
There was a problem hiding this comment.
Technically sound with good direction gating and tests, but this is a T1d-complex feature that changes session initialization for all users (defaulting fallbackModel on every session) and modifies the ACP protocol surface for refusals. No team member has reviewed the current head — only the author's own self-comments and an older-commit bot review.
There was a problem hiding this comment.
The code is technically sound and the greptile concerns were all addressed, but this T1d-complex feature defaults fallbackModel on every session (a behavioral change for all users), modifies the ACP protocol surface with new status types, and changes how terminal refusals are surfaced — with no human team member review on the current head.
Problem
End users on Claude Fable 5 can hit safety-classifier refusals: the API returns HTTP 200 with
stop_reason: "refusal"and expects integrators to configure a fallback model (https://platform.claude.com/docs/en/build-with-claude/refusals-and-fallback). We configured none, and the adapter streamed the refusal's rawstop_details.explanationinto the chat as assistant prose. A user saw their truncated answer followed by "API integrators: you can reduce refusals for your users by configuring a fallback model", which is platform text meant for us, not them.Changes
packages/agent: sessions now spawn withOptions.fallbackModel: "claude-opus-4-8", so the SDK retries a refused (or overloaded) turn on Opus 4.8 and keeps the swap for the session. Caller-provided values win, and a guard avoids the SDK's spawn error when the fallback equalsOptions.model.packages/agent: themodel_refusal_fallbacksystem message (previously a silent no-op) now emits a_posthog/statusnotification carrying the model swap and the display-only explanation.packages/agent: a terminal refusal emits a structured_posthog/statusrefusal notification instead of anagent_message_chunkwith the raw explanation text. The turn still returns ACP stop reasonrefusal.packages/ui: renders the two new statuses. A terminal refusal shows an orange callout with the explanation and a retry hint; a fallback retry shows a marker row ("claude-fable-5 declined this request, retried with claude-opus-4-8").UPSTREAM.md: documents the divergence so upstream syncs don't revert it.Known limitation: on a mid-stream refusal the already-streamed partial text stays in the transcript. The SDK sends retraction uuids but the chunk-based transcript cannot evict them, so the sequence reads partial answer, fallback marker, then the fresh answer.
No screenshots: both states require a live safety-classifier refusal, which cannot be triggered on demand.
How did you test this?
fallbackModeldefault, resume and override behavior inoptions.test.ts;model_refusal_fallbacknotification shape insdk-to-acp.test.ts; refusal and refusal_fallback item passthrough inbuildConversationItems.test.ts.pnpm --filter @posthog/agent typecheck,pnpm --filter @posthog/ui typecheckand Biome on the changed files.Automatic notifications