uffd: unregister empty and removed huge pages from UFFD missing pages tracking#2974
Draft
bchalios wants to merge 4 commits into
Draft
uffd: unregister empty and removed huge pages from UFFD missing pages tracking#2974bchalios wants to merge 4 commits into
bchalios wants to merge 4 commits into
Conversation
Add bindings for the UFFDIO_REGISTER and UFFDIO_UNREGISTER ioctls. We will use these to unregister zero huge pages and re-register them only for WP bit tracking. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Currently, when we receive a UFFD_EVENT_REMOVE we mark the page as removed so subsequent page faults know that they should provide the zero page. In the case of huge pages we can avoid handling subsequent page faults all together, by unregistering the removed region for missing events from UFFD. However, we want the WP asynchronous tracking to keep working because we rely on it to get a correct diff when we later snapshot the sandbox. We can achieve that by unregistering the region, registering it again with UFFDIO_REGISTER_MODE_WP (no _MODE_MISSING) and write-protecting it again. Re-registering with UFFDIO_REGISTER_MODE_WP is enough to ensure that we will get UFFD_EVENT_REMOVE in the future (so we will mark again the page for WP tracking) and, hence, an accurate view of the dirty memory. We can't do the same for 4K pages. Freeing anonymous 4K pages zaps the page table entry and we lose the ability to do WP tracking. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
When using huge pages to back guest memory we can register memory regions just for WP tracking. This avoids the round trip to user space (UFFD handler in orchestrator) to handle page faults that we know in advance that we will handle simply by providing zeros. Use the dropMissingEventsTracking method that we introduced in previous commit to unregister all ranges for which that backing build ID is nil. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add an OTEL metric for tracking UFFD unregistering operations. The metric is a TimerFactory triplet tracking latency histogram (ms), number of bytes unregistered and number of unregister operations. The metric is tagged by a source attribute that indicates whether the operation was triggerred due to unregistering empty pages (should be once during startup) or due to a UFFD_EVENT_REMOVE event. Also add a test that ensures the metric works as expected. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
28f3cea to
191671f
Compare
PR SummaryCursor Bugbot is generating a summary for commit 191671f. Configure here. |
❌ 116 Tests Failed:
View the top 3 failed test(s) by shortest run time
View the full list of 2 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Unregister huge pages from UFFD in two cases:
nilAlso, add tests that ensure that everything works as expected
Why
UFFD handling for known zero pages (ranges with nil block ID) and pages that the guest has deallocated essentially boils down to providing a zero page. This is adds overhead in page fault handling (round-trip:
guest -> host kernel ->orchestrator -> host kernel -> guest). Un-registering those pages from UFFD trims down the round trip to:guest -> host kernel -> guest.This introduces real overhead to page fault handling. Using a simple
dd-like command that touches 1GiB on my laptop shows a throughput of ~1.3GiB/sec onmainvs ~2.0GiB/sec with the changes introduced here.How
We know the initially empty pages, so we unregister them before starting the sandbox. We also unregister all pages for which we receive a UFFD_EVENT_REMOVE. The catch is we use UFFD asynchronous write protection to track dirty pages. So, after unregistering the pages we re-register them, this time using only
UFFDIO_REGISTER_MODE_WP. This way we keep accurate dirty-page info without handling zero pages.Note: this doesn't work with anonymous 4K pages. As, userfaultfd documentation mentions:
So, at the moment, this behaviour is restricted in 4K pages only