Skip to content

test(orchestrator): end-to-end dedup pause/upload/resume corruption tests#3003

Draft
ValentaTomas wants to merge 2 commits into
mainfrom
test/dedup-pause-pipeline-e2e
Draft

test(orchestrator): end-to-end dedup pause/upload/resume corruption tests#3003
ValentaTomas wants to merge 2 commits into
mainfrom
test/dedup-pause-pipeline-e2e

Conversation

@ValentaTomas

Copy link
Copy Markdown
Member

Investigation context (EN-978)

Part of the resume-corruption investigation. These tests exercise the full production artifact pipeline for memfile dedup and verify byte-exact restores after every generation:

  • pause: real memfd -> NewCacheFromMemfdDeduped (compare + block-granular inputEmpty merge + packed drain, with and without O_DIRECT)
  • upload: zstd frame compression, CloneForUpload + ancestor BuildData propagation (full frame tables, V3 sentinel), V4 and V5 SerializeHeader -> DeserializeBytes (which sparse-trims ancestor frame tables)
  • resume: every page resolved via GetShiftedMapping + trimmed FrameTable lookups, the same path the chunker uses

8-generation chains with hugepage-granular FC-bitmap semantics (identical rewrites, zero writes, balloon-REMOVE blocks, reverts to ancestor content), all dedup budget combinations. All pass, which exonerates the dedup transform itself (compare classification, packing, header merge, serialization, directIO drain) for the corruption flavor — the bug, if dedup-gated, is not in this pipeline.

dedup_divergent_parent_test.go documents the one dedup-only amplification mechanism found: the compare trusts the pause node's local chunker view of the parent, so any local-vs-authoritative divergence becomes a durably wrong parent mapping in the snapshot (the non-dedup path stores guest bytes verbatim and is immune). Checkable on a victim chain via the header Checksum fields.

Companion fixes: #3001 (hinting drain), #3002 (REMOVE alignment). Findings are not yet confirmed as the EN-978 root cause; these tests pin down what it is not and where to look next.

…ests

Byte-exact verification of the full memfile-dedup artifact pipeline
(memfd dedup + inputEmpty merge, header merge, zstd frame compression,
V4/V5 serialization with frame-table trimming, resume resolution
through trimmed tables) across multi-generation chains, with and
without O_DIRECT drain. Also documents the one dedup-only corruption
amplification found: a divergent local parent cache is baked into the
snapshot as a wrong parent mapping, which the non-dedup path is immune
to.
@cla-bot cla-bot Bot added the cla-signed label Jun 13, 2026
@cursor

cursor Bot commented Jun 13, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Test-only additions with no production behavior changes; risk is limited to CI time and Linux build tag requirements.

Overview
Adds Linux-only tests for the EN-978 resume-corruption investigation: one runs multi-generation pause → upload → resume chains through the real memfile dedup path (memfd compare/drain, zstd upload, V4/V5 header round-trip, page reads like production) and asserts byte-exact memory after each cycle under varied dedup budgets and optional O_DIRECT. A second test documents how dedupCompare can bake a wrong parent mapping into a snapshot when the pause node’s local parent view disagrees with authoritative storage—behavior the non-dedup path avoids—without changing runtime code.

Reviewed by Cursor Bugbot for commit 131e083. Bugbot is set up for automated code reviews on this repo. Configure here.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

No critical findings or feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant