Skip to content

fix(agent-server): handle follow-up turn failures and classify timeouts#2938

Merged
tatoalo merged 2 commits into
mainfrom
fix/agent-turn-failure-handling
Jun 26, 2026
Merged

fix(agent-server): handle follow-up turn failures and classify timeouts#2938
tatoalo merged 2 commits into
mainfrom
fix/agent-turn-failure-handling

Conversation

@tatoalo

@tatoalo tatoalo commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Problem

When an agent turn crashed mid-prompt on a live follow-up message, the failure was effectively invisible:

  • The follow-up user_message had no failure handling (unlike the initial/resume paths), so a thrown agent error just returned as an HTTP-200 JSON-RPC error
  • The Claude SDK also surfaces the error string as an ordinary assistant message, so the crash rendered in the transcript as a normal reply so it looked like a healthy turn
  • "API Error: ... timed out" was classified as a permanent agent_error rather than a transient upstream stall, so even retryable timeouts were treated as hard failures

Changes

  • added an upstream_timeout classification
  • introduce a single handleTurnFailure(payload, phase, error) used by the initial, resume, and (newly wired) follow-up paths
  • the failure is always surfaced as a tagged error session update instead of riding the normal assistant-message path.

before:

image

after:

good_error

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

React Doctor found no issues in the changed files. 🎉

Reviewed by React Doctor for commit d8639b4.

@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Reviews (1): Last reviewed commit: "fix(agent-server): handle follow-up turn..." | Re-trigger Greptile

Comment thread packages/agent/src/server/agent-server.test.ts Outdated
@tatoalo tatoalo self-assigned this Jun 25, 2026
@tatoalo tatoalo marked this pull request as ready for review June 25, 2026 17:27
@tatoalo tatoalo requested a review from a team June 25, 2026 17:28
@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Reviews (2): Last reviewed commit: "fix(agent-server): scope recoverable to ..." | Re-trigger Greptile

@tatoalo tatoalo added the Stamphog This will request an autostamp by stamphog on small changes label Jun 26, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Focused, well-tested fix: adds timeout error classification, refactors failure handling to distinguish recoverable upstream errors in follow-up turns, and properly scopes recoverability to interactive mode only. No showstoppers; the one bot review comment was addressed and resolved.

@tatoalo tatoalo merged commit 194b3a7 into main Jun 26, 2026
30 checks passed
@tatoalo tatoalo deleted the fix/agent-turn-failure-handling branch June 26, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stamphog This will request an autostamp by stamphog on small changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant