Skip to content

feat(agent): add refusal handling and model fallback#3078

Merged
charlesvien merged 3 commits into
mainfrom
feat/agent-refusal-fallback
Jul 2, 2026
Merged

feat(agent): add refusal handling and model fallback#3078
charlesvien merged 3 commits into
mainfrom
feat/agent-refusal-fallback

Conversation

@charlesvien

Copy link
Copy Markdown
Member

Problem

End users on Claude Fable 5 can hit safety-classifier refusals: the API returns HTTP 200 with stop_reason: "refusal" and expects integrators to configure a fallback model (https://platform.claude.com/docs/en/build-with-claude/refusals-and-fallback). We configured none, and the adapter streamed the refusal's raw stop_details.explanation into the chat as assistant prose. A user saw their truncated answer followed by "API integrators: you can reduce refusals for your users by configuring a fallback model", which is platform text meant for us, not them.

Changes

  • packages/agent: sessions now spawn with Options.fallbackModel: "claude-opus-4-8", so the SDK retries a refused (or overloaded) turn on Opus 4.8 and keeps the swap for the session. Caller-provided values win, and a guard avoids the SDK's spawn error when the fallback equals Options.model.
  • packages/agent: the model_refusal_fallback system message (previously a silent no-op) now emits a _posthog/status notification carrying the model swap and the display-only explanation.
  • packages/agent: a terminal refusal emits a structured _posthog/status refusal notification instead of an agent_message_chunk with the raw explanation text. The turn still returns ACP stop reason refusal.
  • packages/ui: renders the two new statuses. A terminal refusal shows an orange callout with the explanation and a retry hint; a fallback retry shows a marker row ("claude-fable-5 declined this request, retried with claude-opus-4-8").
  • UPSTREAM.md: documents the divergence so upstream syncs don't revert it.

Known limitation: on a mid-stream refusal the already-streamed partial text stays in the transcript. The SDK sends retraction uuids but the chunk-based transcript cannot evict them, so the sequence reads partial answer, fallback marker, then the fresh answer.

No screenshots: both states require a live safety-classifier refusal, which cannot be triggered on demand.

How did you test this?

  • New unit tests: fallbackModel default, resume and override behavior in options.test.ts; model_refusal_fallback notification shape in sdk-to-acp.test.ts; refusal and refusal_fallback item passthrough in buildConversationItems.test.ts.
  • Full suites pass: claude adapter (20 files, 311 tests) and ui sessions (24 files, 306 tests).
  • pnpm --filter @posthog/agent typecheck, pnpm --filter @posthog/ui typecheck and Biome on the changed files.
  • Not verified against a live refusal; the flow is covered by the unit tests above.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

React Doctor found no issues in the changed files. 🎉

Reviewed by React Doctor for commit b5b54ed.

@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Reviews (1): Last reviewed commit: "add refusal handling and model fallback" | Re-trigger Greptile

Comment thread packages/agent/src/adapters/claude/session/options.test.ts Outdated
Comment thread packages/agent/src/adapters/claude/conversion/sdk-to-acp.test.ts Outdated
Comment thread packages/agent/src/adapters/claude/conversion/sdk-to-acp.ts
@charlesvien charlesvien added the Stamphog This will request an autostamp by stamphog on small changes label Jul 2, 2026

@stamphog stamphog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically sound with good direction gating and tests, but this is a T1d-complex feature that changes session initialization for all users (defaulting fallbackModel on every session) and modifies the ACP protocol surface for refusals. No team member has reviewed the current head — only the author's own self-comments and an older-commit bot review.

@stamphog stamphog Bot removed the Stamphog This will request an autostamp by stamphog on small changes label Jul 2, 2026
@charlesvien charlesvien requested a review from a team July 2, 2026 03:32
@charlesvien charlesvien added the Stamphog This will request an autostamp by stamphog on small changes label Jul 2, 2026

@stamphog stamphog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is technically sound and the greptile concerns were all addressed, but this T1d-complex feature defaults fallbackModel on every session (a behavioral change for all users), modifies the ACP protocol surface with new status types, and changes how terminal refusals are surfaced — with no human team member review on the current head.

@stamphog stamphog Bot removed the Stamphog This will request an autostamp by stamphog on small changes label Jul 2, 2026
@charlesvien charlesvien added the Create Release This will trigger a new release label Jul 2, 2026 — with Graphite App
@charlesvien charlesvien merged commit 246b0b4 into main Jul 2, 2026
32 checks passed
@charlesvien charlesvien deleted the feat/agent-refusal-fallback branch July 2, 2026 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Create Release This will trigger a new release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants