feat(engine): SpillError::PriorSpillsLost typed variant + consumer wiring (SPL-35..37)#4749
Merged
Merged
Conversation
9246e05 to
597acf2
Compare
…ring (SPL-35..37)
The reorder spill module previously bubbled a generic io::Error{NotFound}
when the caller-supplied spill directory vanished mid-transfer after prior
records had already committed. The receiver could not distinguish that
unrecoverable data-loss signal from a transient kernel-level ENOENT. This
change adds a typed SpillError::PriorSpillsLost { dir, count } variant,
upgrades both spill writers (per-item + whole-batch) to surface it at the
recovery-refusal sites, and wires the new variant through the spillable
reorder consumer loop so the receiver maps it to exit code 11 with an
actionable diagnostic.
- SPL-35: SpillError::PriorSpillsLost { dir, count } variant + Display.
- SPL-36: Per-item and whole-batch writers upgrade NotFound to the typed
variant when spill_index is non-empty; consumer loop adds the missing
match arm. The pre-existing temp-dir-vanish hardening test now asserts
the typed variant carries the configured dir and count >= 1.
- SPL-37: New regression test forces real spills via the byte threshold,
wipes the directory mid-transfer, and asserts the surfaced error is
exactly SpillError::PriorSpillsLost with the expected dir and count.
597acf2 to
990f265
Compare
4 tasks
oferchen
added a commit
that referenced
this pull request
May 23, 2026
Evaluate five candidate ENOSPC injection mechanisms (tmpfs size-cap, fallocate filler, mock writer, failpoints crate, fuse-mt) against CI compatibility on the Linux musl / macOS / Windows tiles and against the SPL-32 audit's site inventory. Recommend a userspace mock writer for the unit-test layer plus a Linux-gated fallocate filler for one real-kernel integration test, and pin the chosen mechanism's API surface in pseudo-code so SPL-33.b has a concrete target. Cross-references docs/design/spill-fs-error-audit.md (SPL-32) for the syscall site inventory and the SpillError::PriorSpillsLost wiring landed via SPL-35/36/37 (#4749).
oferchen
added a commit
that referenced
this pull request
May 23, 2026
….a) (#4770) Evaluate five candidate ENOSPC injection mechanisms (tmpfs size-cap, fallocate filler, mock writer, failpoints crate, fuse-mt) against CI compatibility on the Linux musl / macOS / Windows tiles and against the SPL-32 audit's site inventory. Recommend a userspace mock writer for the unit-test layer plus a Linux-gated fallocate filler for one real-kernel integration test, and pin the chosen mechanism's API surface in pseudo-code so SPL-33.b has a concrete target. Cross-references docs/design/spill-fs-error-audit.md (SPL-32) for the syscall site inventory and the SpillError::PriorSpillsLost wiring landed via SPL-35/36/37 (#4749).
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements SPL-32's top hardening recommendation: distinguish "prior spills lost" from a generic NotFound at the reorder spill recovery-refusal sites.
SpillError::PriorSpillsLost { dir, count }variant incrates/engine/src/concurrent_delta/spill/error.rs. The variant carries the spill directory that vanished and the count of records known to be unrecoverable.Displayrenders the recommended message"prior spill directory <dir> vanished; <count> chunk(s) cannot be recovered".io_error()andis_out_of_space()correctly classify it as a non-I/O failure.spill_candidates_per_item(per-item path): onNotFoundreturned from the inner refusal branch, calls the newprior_spills_lost_error()helper and surfacesPriorSpillsLostrather thanIo(NotFound).spill_candidates_whole_batch(whole-batch path): same upgrade in its refusal arm beforerestore_taken.run_spillable_loop(consumer/loops.rs) adds the missing match arm so the variant funnels intoDeltaResult::failedwith the actionable diagnostic for the receiver (still mapped to upstream exit code 11).temp_dir_vanish_after_prior_spills_returns_errorassertion was tightened to requireSpillError::PriorSpillsLost { dir, count }withdir == spill_dirandcount >= 1.prior_spills_lost_surfaces_typed_variant_on_dir_wipeexplicitly drives the SPL-37 contract: seeds enough inserts to commit real records to disk, force-wipes the spill directory, then drains the buffer and asserts the surfaced error is exactlyPriorSpillsLostwith the configured dir and a non-zero count.dir_recreate_eventsmust remain0.The variant is added in a way that is non-breaking only at the variant-set level; every existing exhaustive
match SpillErrorin the workspace (consumer loop, hardening tests, top-level Display/source test) was updated to handle the new arm.Test plan
prior_spills_lost_surfaces_typed_variant_on_dir_wipetest passes