Skip to content

docs(engine): design temp-vanish fault-injection mechanism (SPL-34)#4784

Merged
oferchen merged 1 commit into
masterfrom
docs/spl-34-temp-vanish-injection
May 23, 2026
Merged

docs(engine): design temp-vanish fault-injection mechanism (SPL-34)#4784
oferchen merged 1 commit into
masterfrom
docs/spl-34-temp-vanish-injection

Conversation

@oferchen

Copy link
Copy Markdown
Owner

Summary

  • Sibling of SPL-33.a (ENOSPC injection design). Specifies how SPL-34.b will inject temp-vanish faults into the reorder/spill module.
  • Defines five concrete failure modes: (a) tempfile unlinked while open, (b) tempfile replaced behind an open fd, (c) parent dir vanishes mid-write, (d) parent dir vanishes between writes, (e) filesystem unmounted.
  • Evaluates the same toolset as SPL-33.a (mock filesystem, real remove_file/remove_dir_all, bind-mount+umount, failpoints, FUSE) against each mode and recommends a layered hybrid:
    • Layer 1 (unit tests): mock filesystem chassis (re-used from SPL-33.b's FaultingFile) covering modes (a) and (b).
    • Layer 2 (integration tests): real remove_file / remove_dir_all covering modes (a), (c), (d) - same pattern as SPL-37's dir-wipe regression test.
    • Layer 3 (skip): mode (e) documented as a known CI coverage gap; reachable only via privileged umount not granted on hosted runners.
  • Maps the 23 syscall sites enumerated in docs/design/spill-fs-error-audit.md (SPL-32) to SPL-33 vs SPL-34 coverage; eight new tests land.
  • Recommends introducing SpillError::TempVanished { path } as a typed variant symmetric to SpillError::PriorSpillsLost (shipped by SPL-35/36/37 in PR feat(engine): SpillError::PriorSpillsLost typed variant + consumer wiring (SPL-35..37) #4749).
  • Recommends against multi-directory spill fallback inside the buffer; fail-fast preserves the documented "best-effort scratch space" contract.

Design-only document. No production source changes.

Test plan

  • Reviewer confirms the per-audit-site test matrix maps every audited site to either SPL-33, SPL-34, or "no new test required" with explicit rationale.
  • Reviewer confirms the recommended FaultPlan extension is compatible with SPL-33.a's chassis (one shared type carrying both ENOSPC and temp-vanish parameters).
  • Reviewer confirms the SpillError::TempVanished proposal does not break the receiver's exit-code mapping (must still surface as exit 11 FileIo).
  • CI green on docs-only diff.

Sibling of SPL-33.a (ENOSPC). Covers five vanish modes (unlink-while-
open, replace-inode, parent-rmdir-mid-write, parent-rmdir-between-
writes, fs-unmount) and recommends a layered hybrid: mock filesystem
for unit tests (modes a, b), real remove_file/remove_dir_all for
integration tests (modes a, c, d), unmount documented as gap (mode e).
Maps eight new tests across the 23 audited sites.
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 23, 2026
@oferchen oferchen merged commit ea92767 into master May 23, 2026
10 checks passed
@oferchen oferchen deleted the docs/spl-34-temp-vanish-injection branch May 28, 2026 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant