Skip to content

fix(uffd): stop assuming REMOVE events are page-size aligned#3002

Draft
ValentaTomas wants to merge 2 commits into
mainfrom
fix/uffd-remove-range-alignment
Draft

fix(uffd): stop assuming REMOVE events are page-size aligned#3002
ValentaTomas wants to merge 2 commits into
mainfrom
fix/uffd-remove-range-alignment

Conversation

@ValentaTomas

Copy link
Copy Markdown
Member

Investigation context (EN-978)

The serve loop's REMOVE batch assumed rm.start/rm.end are aligned to the tracker page size ("UFFD invariant" comment). Empirically false: the kernel only guarantees madvise (4 KiB) granularity, and on hugetlbfs a misaligned MADV_REMOVE punches only whole hugepages inward while zeroing the partial edges in place — yet the UFFD event reports the full requested range. The old arithmetic (BlockIdx rounds start down, end derived from len/pageSize) would then mark a fully live hugepage Zero; the pause turns tracker-Zero into Empty -> uuid.Nil, i.e. zeros over live guest memory on resume. The unmarked kernel-zeroed tail is the inverse hazard (stale parent bytes on resume).

Today this is only fenced off by FC's discard_range alignment guard (and balloon hint/report chunks are buddy-aligned), so this is hardening against a kernel-verified hazard, not a confirmed production hit — see the included TestHugetlbMemfd_MisalignedRemove_MarksLivePagesZero, which reproduces the kernel behavior and the old arithmetic's failure on a real hugetlb memfd + userfaultfd.

Fix

  • applyRemoveRange: only blocks fully covered by the event are marked Zero; partially covered Dirty blocks keep their state and are recorded in a removeTainted bitmap (with a warn log — this should never fire with current FC).
  • Uffd.DiffMetadata folds the taint into the dirty set so the export re-reads those blocks' true bytes from the memfd instead of trusting tracker state.
  • remove_shared_memfd_test.go additionally documents the MAP_SHARED memfd REMOVE/DONTNEED semantics the tracker relies on (content destruction, no-event re-access, stale-hint write race), as kernel-verified regression tests; they self-skip without unprivileged userfaultfd / hugepages.

Unit tests cover aligned, head/tail/both-edge misaligned, sub-block, and not-present cases. Full uffd suite passes with a real userfaultfd.

The kernel only guarantees madvise (4 KiB) granularity for
UFFD_EVENT_REMOVE ranges; on hugetlbfs the hole punch discards whole
hugepages and zeroes partial edges in place while the event reports the
full requested range. Rounding the event start down marked live
hugepages Zero, which the pause maps to uuid.Nil - zeroing live guest
memory on resume. Mark only fully covered blocks Zero and taint
partially covered Dirty blocks so the pause re-reads their true content
from the memfd. Adds kernel-semantics regression tests for the
DONTNEED/REMOVE behavior on shared memfds.
@cla-bot cla-bot Bot added the cla-signed label Jun 13, 2026
@cursor

cursor Bot commented Jun 13, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes sandbox memory diff correctness at pause/resume; wrong taint/zero logic could corrupt guest RAM, though current FC alignment guards limit exposure and the change is heavily regression-tested.

Overview
Hardens pause/snapshot bookkeeping when UFFD_EVENT_REMOVE ranges are only madvise-aligned (not tracker-page-aligned): misaligned handling no longer rounds the event down and marks whole live hugepages Zero, which could resume as uuid.Nil over guest memory. REMOVE processing now marks only fully covered blocks Zero, records partially covered Dirty blocks in a removeTainted set, and DiffMetadata folds that set into the dirty bitmap so those pages are re-read from the memfd instead of trusting tracker or WP-async pagemap state.

Reviewed by Cursor Bugbot for commit 3f2ea82. Bugbot is set up for automated code reviews on this repo. Configure here.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces applyRemoveRange to handle unaligned UFFD_EVENT_REMOVE ranges by marking only fully covered blocks as Zero and tracking partially covered Dirty blocks as tainted to be re-read from the memfd during a pause. It also adds comprehensive tests demonstrating these semantics on shared memfds. No review comments were provided, and there is no additional feedback.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 23 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...trator/pkg/sandbox/uffd/userfaultfd/userfaultfd.go 9.09% 19 Missing and 1 partial ⚠️
packages/orchestrator/pkg/sandbox/uffd/uffd.go 0.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant