Skip to content

[9/n] [sync-switch] bail on an incomplete bootstore config instead of silently dropping items#10650

Open
sunshowers wants to merge 1 commit into
sunshowers/spr/main.9n-sync-switch-bail-on-an-incomplete-bootstore-config-instead-of-silently-dropping-itemsfrom
sunshowers/spr/9n-sync-switch-bail-on-an-incomplete-bootstore-config-instead-of-silently-dropping-items
Open

[9/n] [sync-switch] bail on an incomplete bootstore config instead of silently dropping items#10650
sunshowers wants to merge 1 commit into
sunshowers/spr/main.9n-sync-switch-bail-on-an-incomplete-bootstore-config-instead-of-silently-dropping-itemsfrom
sunshowers/spr/9n-sync-switch-bail-on-an-incomplete-bootstore-config-instead-of-silently-dropping-items

Conversation

@sunshowers

Copy link
Copy Markdown
Contributor

Previously, on encountering a transient DB issue, we would silently drop items from the bootstore config. That seems quite obviously bad and was part of the reason #10640 escalated a crdb bug into persistent bootstore corruption.

Rework all this by:

  • collecting all the problems that occurred
  • returning a new error type if there were any problems

This is a functional change that is worth thinking very hard about. Many but not all of the invariants here are maintained by the database layer. If there's persistent db corruption or a similar issue, we will no longer generate a bootstore config for the rack! I think this is overall the right thing to do, but it is quite possible there are latent issues that were being hidden by the looser error handling that this code previously had.

The report is now logged by the background task, unless the task exits with one of the other early errors. (There's a much larger rework of the task that needs to be done, but I'm not doing that here since it'll collide with a lot of what @jgallagher is currently doing.)

Depends on:

Created using spr 1.3.6-beta.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant