Skip to content

[nexus] make almost all saga action failures accept an omicron error#9998

Merged
sunshowers merged 2 commits intomainfrom
sunshowers/spr/rfc-make-almost-all-uses-of-saga-action-failed-accept-an-omicron-error
Mar 9, 2026
Merged

[nexus] make almost all saga action failures accept an omicron error#9998
sunshowers merged 2 commits intomainfrom
sunshowers/spr/rfc-make-almost-all-uses-of-saga-action-failed-accept-an-omicron-error

Conversation

@sunshowers
Copy link
Copy Markdown
Contributor

@sunshowers sunshowers commented Mar 8, 2026

While working on #9997 I saw this error in the logs:

saga ACTION error at node "delete_local_storage": deserialize failed: unknown variant failed to delete local storage: failed at attempt 4: retries exhausted: Error Response: status: 503 Service Unavailable; headers: {\"content-type\": \"application/json\", \"x-request-id\": \"4a21fdec-b7b2-4f37-a99d-c218efa1c701\", \"content-length\": \"94\", \"date\": \"Sat, 07 Mar 2026 05:34:47 GMT\"}; value: Error { error_code: None, message: \"Service Unavailable\", request_id: \"4a21fdec-b7b2-4f37-a99d-c218efa1c701\" }, expected one of ObjectNotFound, ObjectAlreadyExists, InvalidRequest, Unauthenticated, InvalidValue, Forbidden, InternalError, ServiceUnavailable, InsufficientCapacity, TypeVersionMismatch, Conflict, NotFound, Gone

The root cause for that was that we were passing in a string as an error rather than a structured omicron-common error. That's because steno's ActionError::action_failed accepts anything that implements Debug + DeserializeOwned.

Fix this by:

  • adding nexus_types::saga::saga_action_failed
  • banning most uses of ActionError::action_failed through clippy disallowed-methods
  • updating all the call sites

There are two instances of ActionError::action_failed that remain, both of which are marked with expect(clippy::disallowed_methods):

  1. The invocation inside saga_action_failed.
  2. In nexus/src/app/sagas/instance_update/mod.rs, the instance updater lock error which is handled specially.

Created using spr 1.3.6-beta.1
@sunshowers sunshowers requested review from davepacheco and jmpesp March 8, 2026 01:32
@sunshowers sunshowers changed the title [RFC] make almost all uses of saga action failed accept an omicron error [RFC] make almost all saga action failures accept an omicron error Mar 8, 2026
Created using spr 1.3.6-beta.1
Copy link
Copy Markdown
Contributor

@jmpesp jmpesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome

@sunshowers sunshowers changed the title [RFC] make almost all saga action failures accept an omicron error [nexus] make almost all saga action failures accept an omicron error Mar 9, 2026
@sunshowers sunshowers merged commit 672943c into main Mar 9, 2026
16 checks passed
@sunshowers sunshowers deleted the sunshowers/spr/rfc-make-almost-all-uses-of-saga-action-failed-accept-an-omicron-error branch March 9, 2026 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants