feat: classify run failure error codes and improve error logging#1340
feat: classify run failure error codes and improve error logging#1340
Conversation
🦋 Changeset detectedLatest commit: bcad456 The changes in this PR will be included in the next version bump. This PR includes changesets to release 21 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🐘 Local Postgres (1 failed)sveltekit-stable (1 failed):
🌍 Community Worlds (56 failed)mongodb (3 failed):
redis (2 failed):
turso (51 failed):
📋 Other (1 failed)e2e-local-postgres-nest-stable (1 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
❌ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
❌ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
e7068bf to
e019085
Compare
| return 'bg-emerald-500'; | ||
| case 'failed': | ||
| return 'bg-red-500'; | ||
| return isInfra ? 'bg-amber-500' : 'bg-red-500'; |
There was a problem hiding this comment.
I think we should just show red no matter whether it's infra or user error
There was a problem hiding this comment.
also don't render the entire error stack trace in the tooltip. That won't even be accessible without decryption when a run is encrypted - so it's enough to just show the error code for now
There was a problem hiding this comment.
Fixed — all failures are red now, removed the amber/infra distinction.
There was a problem hiding this comment.
Fixed — tooltip now only shows the error code (e.g. USER_ERROR), no stack trace or message. The full error data is behind the ref/encryption anyway.
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
There was a problem hiding this comment.
Pull request overview
Adds run failure classification via standardized error codes and threads those codes through runtime events into the UI, while also tightening up debug/error logging in world implementations.
Changes:
- Introduces
RUN_ERROR_CODES(USER_ERROR,RUNTIME_ERROR) andclassifyRunError()to populaterun_failed.eventData.errorCode. - Updates Web UI status badge to surface/copy error codes (when present) instead of raw error details.
- Adjusts world-local queue error logging and world-vercel schema-validation error messaging / debug gating.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/world-vercel/src/utils.ts | Tweaks DEBUG behavior and improves schema validation error formatting (with optional debug context). |
| packages/world-local/src/queue.ts | Simplifies queue failure logs and standardizes log prefixes/metadata. |
| packages/web/app/components/display-utils/status-badge.tsx | Shows a tooltip for failed statuses that exposes/copies StructuredError.code. |
| packages/errors/src/index.ts | Re-exports new error code constants/types from error-codes. |
| packages/errors/src/error-codes.ts | Defines canonical run error codes and exported union type. |
| packages/core/src/runtime.ts | Classifies caught workflow errors and includes errorCode in run_failed events. |
| packages/core/src/classify-error.ts | Adds helper to classify errors into run error codes. |
| packages/core/src/classify-error.test.ts | Unit tests for classifyRunError. |
| packages/core/e2e/e2e.test.ts | Extends e2e assertions to verify error codes are present in surfaced errors and CLI output. |
| .changeset/classify-run-error-codes.md | Publishes patch bumps and documents the feature/logging changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| @@ -374,7 +381,7 @@ export function workflowEntrypoint( | |||
| message: errorMessage, | |||
| stack: errorStack, | |||
| }, | |||
| // TODO: include error codes when we define them | |||
| errorCode, | |||
| }, | |||
| }, | |||
There was a problem hiding this comment.
Fixed — the setup failure path now includes errorCode: RUN_ERROR_CODES.RUNTIME_ERROR in the run_failed event data. Both run_failed emission sites are now consistent.
There was a problem hiding this comment.
First pass by Claude:
1. DEBUG truthiness change leaks Authorization Bearer tokens in logs
The DEBUG check changed from process.env.DEBUG === '1' to process.env.DEBUG (truthy). The curl-reproduction debug block at line 302 stringifies all request headers, including Authorization: Bearer <token>. The debug npm package (a dependency of @workflow/core) commonly sets DEBUG=workflow:* or DEBUG=*, which is now truthy and triggers token emission to stderr. This flows into Vercel deployment logs, Datadog log drains, and CI output.
Suggestion: Either (a) revert to process.env.DEBUG === '1' for the curl block specifically, or (b) redact the Authorization header before logging. The schema-validation debug context at line 369 is less sensitive and can use truthy.
// Option (a): revert strict check for curl block
if (process.env.DEBUG === '1') {
// Option (b): redact sensitive headers
const safeHeaders = Array.from(headers.entries())
.filter(([key]) => key.toLowerCase() !== 'authorization')
.map(([key, value]) => `-H "${key}: ${value}"`)
.join(' ');2. UI regression: StatusBadge tooltip lost for runs without error code
The old ErrorStatusBadge showed a tooltip with the full error message whenever status === 'failed' && context?.error. The new ErrorCodeBadge only renders when getErrorCode(context?.error) returns a string. This means:
- Historical failed runs pre-dating this PR have no
.codeon theirStructuredError-- tooltip silently disappears - Step failures never populate
errorCode(onlyrun_failedevents do) -- no tooltip for failed steps - The error message itself is no longer shown anywhere in the tooltip -- only the code string like
USER_ERROR
Suggestion: Keep ErrorCodeBadge for runs with a code, but restore a fallback that shows the error message when no code is present. Consider a two-tier tooltip: error code (if present) + error message.
3. errorCode accepts arbitrary strings in wire schemas
Both WorkflowRunWireBaseSchema and RunFailedEventSchema define errorCode: z.string().optional(). The domain defines only two valid codes (USER_ERROR, RUNTIME_ERROR), but validation doesn't enforce this at the boundary.
Suggestion: Use z.enum(['USER_ERROR', 'RUNTIME_ERROR']).optional() on the write path (event creation). On the read path (wire schema), z.string().optional() is acceptable for forward-compatibility with newer server versions.
4. Misleading retry denominator in queue log message
attempt ${attempt + 1}/${attempt + 1 + defaultRetriesLeft} works by arithmetic coincidence because defaultRetriesLeft is decremented before this line. If retry logic changes (e.g., defaultRetriesLeft is incremented on timeout at line 165), the denominator inflates.
Suggestion: Capture total before the loop:
const maxAttempts = defaultRetriesLeft + 1;
// then in the loop:
`attempt ${attempt + 1}/${maxAttempts}`5. Stale TODO in StructuredError.code definition
Comment says // TODO: currently unused. make this an enum maybe but this PR actively populates it.
Suggestion: Update to reflect current state: // Populated with RunErrorCode values (USER_ERROR, RUNTIME_ERROR) for run_failed events
6. Missing OTEL span attribute for errorCode
The errorCode is logged to runtimeLogger but NOT set as an OTEL span attribute. Span attributes include WorkflowRunStatus, WorkflowErrorName, WorkflowErrorMessage, and ErrorType but omit error classification. Datadog traces cannot filter by USER_ERROR vs RUNTIME_ERROR.
Suggestion: Add Attribute.WorkflowErrorCode(errorCode) to both span attribute blocks (lines 406-409 and 417-421).
pranaygp
left a comment
There was a problem hiding this comment.
Addressed all actionable items:
-
Auth header leak — Fixed.
Authorizationheader is now filtered out of the curl debug output. -
UI tooltip regression — Intentional. The old error message tooltip won't work with encryption (error is behind refs), so we intentionally show only the error code. No tooltip for runs without a code is the expected behavior for backward compat.
-
z.string()vsz.enum— Keepingz.string(). The domain types enforce valid codes at the TypeScript level. Usingz.enumon the wire would break forward-compat when new error codes are added. -
Misleading retry denominator — Fixed. Captured
maxAttemptsbefore the loop. -
Stale TODO — Fixed. Updated comment to reflect current state.
-
Missing OTEL span attribute — Fixed. Added
Attribute.WorkflowErrorCode(errorCode)to both span attribute blocks.
There was a problem hiding this comment.
LGTM, left a few pieces of human feedback.
Also Claude says Steps can fail but have no classification. A step failure that bubbles up to "run_failed" gets a code only at the run level. Document this as an intentional Phase 1 scope limitation if step-level classification is planned.
| const handleCopy = async (e: React.MouseEvent) => { | ||
| e.stopPropagation(); | ||
| await navigator.clipboard.writeText(errorMessage); | ||
| await navigator.clipboard.writeText(errorCode); |
There was a problem hiding this comment.
The Clipboard API throws on unfocused pages, non-HTTPS contexts, or permission denial. The existing copyable-text.tsx:28-34 correctly wraps this in try/catch - this component does not. Could fix.
There was a problem hiding this comment.
Fixed — wrapped in try/catch.
| @@ -90,12 +90,16 @@ export function serializeError<T extends { error?: StructuredError }>( | |||
| * status), but the transformation preserves all other fields correctly. | |||
| */ | |||
| export function deserializeError<T extends Record<string, any>>(obj: any): T { | |||
There was a problem hiding this comment.
Changes here are missing unit tests but fine IMO
There was a problem hiding this comment.
Ack — the errorCode merging in deserializeError is covered implicitly by the e2e tests that assert error.cause.code === 'USER_ERROR' end-to-end.
| expect(WorkflowRunFailedError.is(error)).toBe(true); | ||
| assert(WorkflowRunFailedError.is(error)); | ||
| expect(error.cause.message).toContain('Nested workflow error'); | ||
| expect(error.cause.code).toBe('USER_ERROR'); |
There was a problem hiding this comment.
We should add one e2e test that actually causes + asserts a RUNTIME_ERROR
There was a problem hiding this comment.
Triggering RUNTIME_ERROR in e2e requires corrupting the event log (e.g., injecting an unexpected event type for a step). This is fragile and environment-dependent. We have unit test coverage for WorkflowRuntimeError classification in classify-error.test.ts and for the error itself in step.test.ts. Could add an e2e in a follow-up with a dedicated fault injection mechanism.
There was a problem hiding this comment.
human: I tried this but not possible without chaos testing or a bigger change out of scope of this PR. tl;dr we can't easily inject failures into runs (we do on steps by injecting 500s into the world) but for workflow/run code - we would either need to expose things into the VM to allow it to inject that (i.e. changing runtime code just for a fault injection test) - or another ideas is we need to use a proxy in front of workflow-server and queue to inject these failures
at that point I'm thinking we just do this in the chaos testing @TooTallNate is setting up and we can have validation that it's working once that's up
| * These are populated in the `errorCode` field of `run_failed` events | ||
| * and flow through to `StructuredError.code` on the run entity. | ||
| */ | ||
| export const RUN_ERROR_CODES = { |
There was a problem hiding this comment.
If these are also user-facing, should add to docs
There was a problem hiding this comment.
Good call — will add docs in a follow-up. These are user-facing via WorkflowRunFailedError.cause.code.
There was a problem hiding this comment.
Added docs in #1445 — documents error codes in the errors & retries guide.
- Add RUN_ERROR_CODES (USER_ERROR, RUNTIME_ERROR) to @workflow/errors - Populate errorCode in run_failed events via classifyRunError() - Update web UI StatusBadge to show amber dot for infrastructure errors - Improve world-local queue error logging (concise, no body dump) - Improve schema validation error messages (concise, verbose behind DEBUG) - Add e2e tests for error code flow and infrastructure error retry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
human: Merging this so I can unblock the next PR. good call on the docs though - I've started another PR for that rather than this one so I don't have to wait for CI again for a docs change |
Summary
USER_ERROR,RUNTIME_ERROR) torun_failedevents, populating the existing but previously unusederrorCodefieldDetails
Error codes
After #1339's structural separation, the
run_failedtry/catch only catches:USER_ERROR(throws from workflow functions, propagated step failures)RUNTIME_ERROR(corrupted event log, missing timestamps — internal bugs)Infrastructure errors (ECONNRESET, 5xx, schema validation) never produce
run_failedat all — they propagate to the queue for retry.The error code flows through the existing (previously unused) plumbing:
eventData.errorCode→StructuredError.code→WorkflowRunFailedError.cause.codeNote on storage:
errorCodeis stored inline as a plain DynamoDB attribute on the run entity — it does NOT go through refs/encryption. Only theerrorobject (message + stack) goes throughrefTracker→errorRef. TheerrorCodeis a sibling field ineventData, extracted and stored separately by the server (events.ts:846).Web UI
RUNTIME_ERROR: amber dot + "Internal Error" tooltip headerUSER_ERROR/ absent (backward compat): red dot + "Error Details" tooltip headerLogging improvements
See examples below.
Error log examples (captured from e2e test runs)
Runtime error logs (before → after)
Before:
After (now includes
errorCode):Queue error logs (before → after)
Before (dumped full request body with traceCarrier, runId, stepId, etc.):
After (concise, actionable):
Schema validation error messages (before → after)
Before (full Zod error dump + CBOR debug context always included):
After (concise issue list, verbose context only when
DEBUGenv var is set):Debug curl reproduction (before → after)
Before: only shown when
DEBUG=1(exact string match)After: shown when
DEBUGis set to any truthy value (consistent withdebugpackage)Test plan
classifyRunError(7 tests, all pass)errorWorkflowNested— assertserror.cause.code === 'USER_ERROR'andrunData.error.code === 'USER_ERROR'errorRetryFatal— assertserror.cause.code === 'USER_ERROR'infraErrorRetryWorkflow— validates infra errors onrun_completedretry via queue (notrun_failed)🤖 Generated with Claude Code