Filesystem only snapshots#2995
Conversation
PR SummaryMedium Risk Overview Reviewed by Cursor Bugbot for commit b2273c7. Bugbot is set up for automated code reviews on this repo. Configure here. |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Code Review
No critical findings or correctness issues were identified in the code changes, and no review comments were provided. I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
f669bc7 to
c609e2c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c609e2c116
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| RootfsDroppedBuilds: uint32(rootDropped), | ||
| } | ||
|
|
||
| if memfileHeader != nil && memfileHeader.Metadata != nil { |
There was a problem hiding this comment.
Preserve rootfs metadata after fs-only reload
This nil-memfile path only works while Pause still has the rootfs header in hand. Once a filesystem-only snapshot is uploaded and later reloaded, there is no memfile header/object, but storageTemplate.SchedulingMetadata still calls t.memfile.WaitWithContext and returns nil on that error before reading rootfs (packages/orchestrator/pkg/sandbox/template/storage_template.go:278-282). In contexts that report scheduling metadata from a loaded template, filesystem-only builds therefore lose even the rootfs affinity data; update that provider to tolerate the missing memfile and call FromHeaders with a nil memfile header and the resolved rootfs header.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed! storageTemplate.SchedulingMetadata now reports rootfs-only scheduling metadata for reloaded fs-only templates instead of dropping it.
Add Sandbox.Pause(WithFilesystemSnapshot()) to produce a snapshot that persists only the rootfs and skips the guest memory snapshot. The default remains a full memory snapshot; fs-only is strictly opt-in. When enabled, before pausing the VM a mandatory guest sync is run via envd (a failed sync fails the pause, since no memory snapshot preserves the page cache and the rootfs would otherwise be missing acknowledged writes), and memory prefetch is cleared. CreateSnapshot is still called for its disk drain+flush side effect, but the memfile diff is left empty (NoDiff) with a resolved-nil header. scheduling.FromHeaders now tolerates a nil memfile header, emitting rootfs-only scheduling metadata instead of dropping the metadata entirely. Snapshot.FilesystemSnapshot records the decision at pause time: it can't be inferred from the diff shape, since a memory snapshot with zero dirty pages also yields a NoDiff memfile but still needs its snapfile uploaded. Add -fs-only flag in cmd/resume-build to allow testing the operation locally. Snapshots taken with -fs-only should include only rootfs (meta)data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
…pshot Export the memorySnapshot helper as MemorySnapshot and embed it in Snapshot as a single field, replacing the flat MemfileDiff / MemfileDiffHeader / MemfileBlockSize fields. The Diff, DiffHeader, and BlockSize fields are exported (read cross-package by the upload, layer, and server paths); the header and newBytes scheduling inputs stay unexported as they are used only at Pause time. Pure rename/restructure: no behavioral change. Consumers updated to read snap.MemorySnapshot.Diff / .DiffHeader / .BlockSize. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
A filesystem-only snapshot now uploads only rootfs.ext4(.header) and metadata.json. The snapfile upload is skipped in both the V3 and V4 paths (it is still created during pause for its disk drain+flush side effect, just not persisted), and NewUpload skips resolving the memfile compress config — which would otherwise fail validation on the zero memfile block size when compression is enabled. The memfile body and header need no change: NoDiff.CachePath returns an empty path (body self-skips) and the resolved-nil diff header makes both upload paths return early. Absence of the memory artifacts on storage is the on-disk signal that a snapshot is filesystem-only. Memory snapshots are unaffected (FilesystemSnapshot is false). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
RebootSandbox cold-boots a fresh Firecracker VM from a snapshot's rootfs without restoring guest memory — the resume primitive for filesystem-only snapshots. It masks an empty memfile (sizing NoopMemory only), selects the NBD provider (empty rootfs cache path) so guest TRIM and overlay chaining work like a normal resume, uses the Sync IO engine, and boots via systemd init. The sandbox is marked running only after envd is ready, matching ResumeSandbox's routing guarantee; the caller's absolute end time is honored so queue delay can't extend the TTL. To support the deferred mark-running, CreateSandbox gains a WithDeferredMarkRunning option (cold boot otherwise marks running before envd is up). A StartTypeReboot start type is added, and the ext4 rootflags are pulled into a named constant documenting that "noload" must never be set — fs-only resume relies on ext4 replaying its journal on cold boot. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add a -reboot flag that cold-boots from the build's rootfs via Factory.RebootSandbox instead of resuming from the memory snapshot, routed through a new runner.startSandbox helper used by the resume, interactive, cmd, and pause paths. Lets the filesystem-only pause + reboot resume cycle be exercised end to end locally (e.g. `resume-build -fs-only -cmd-pause '...'` then `resume-build -reboot -cmd 'cat /proc/uptime'`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
A memory resume calls setMmds with the access-token hash (fc/process.go Resume) so the guest envd can authenticate /init against the MMDS hash. Cold boot (Process.Create) only configured the MMDS transport and never wrote the data, so a rebooted secure sandbox would start envd with no hash and /init would fall into the unauthenticated first-time-setup branch. Process.Create now writes the MMDS metadata before startVM when ProcessOptions.AccessToken is set, mirroring Resume. RebootSandbox always sets it; an empty token hashes to the "no token" value, matching Resume's behavior for non-secure sandboxes. Both the MMDS hash and the /init token sent by WaitForEnvd are sourced from config.Envd.AccessToken, so they always agree. Template-build cold boots leave AccessToken nil and are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
c609e2c to
b2273c7
Compare
| // bestEffortReclaim's sync step (LD-flag gated, best-effort), this always runs | ||
| // and always reports failure. | ||
| func (s *Sandbox) guestSync(ctx context.Context) error { | ||
| const syncTimeout = 5 * time.Second |
There was a problem hiding this comment.
5 is pretty short if there's a lot of dirty pages and there's a lot of IO on the box. Maybe this is fine in practice, but we should at least math this out for worst case that every page is dirty and is not flushed for a our maximum disk size.
| // Best-effort pre-pause guest reclaim (fstrim, sync, drop_caches, | ||
| // compact_memory) on the live VM via envd. Per-step caps are LD-flag-driven; | ||
| // all default to 0 which disables the chain entirely. Non-fatal. | ||
| s.bestEffortReclaim(ctx) |
There was a problem hiding this comment.
This is locked behind a feature flag (within the function call), but without it there's a race between when sync is called and when the actual snapshot happens.
| return nil, fmt.Errorf("create empty memfile: %w", err) | ||
| } | ||
|
|
||
| maskedTemplate := template.NewMaskTemplate(t, template.WithMemfile(memfile)) |
There was a problem hiding this comment.
Templates created before this PR will have inconsistent disks. This is fine normally because we load the dirty pages from memory, but this might create unexpected situation for the user where disk does not have the data they expect
What
Adds an opt-in snapshot mode that persists only the filesystem (rootfs) and skips the guest memory snapshot. Resuming such a snapshot cold-boots (reboots) a fresh VM from the rootfs instead of restoring memory.
Sandbox.Pause(WithFilesystemSnapshot())— pauses without a memory snapshot; uploads only the rootfs + metadata (no memfile/snapfile).Factory.RebootSandbox— the resume primitive that cold-boots from a snapshot's rootfs, marked routable only once the guest is ready.cmd/resume-buildgains-fs-onlyand-rebootto exercise the full cycle locally.Normal memory snapshots are unchanged — this mode is strictly opt-in.
Why
Memory snapshots are the expensive part of a pause (large artifacts, slow upload, more storage). Many workloads don't need warm RAM restored on resume — they only need their disk state. A filesystem-only snapshot is cheaper and smaller to take and store, and also serves as an escape hatch when memory artifacts are slow, oversized, or missing. The trade-off is explicit: a reboot loses RAM, running processes, connections, and unsynced writes; only the synced filesystem survives.
Scope / not in this PR
Orchestrator-only. No public API/proto/SDK wiring yet — the reboot path is reachable via
resume-buildbut not yet over gRPC. API-side resume selection and auto-resume/connectgating (so a disk-only snapshot is never silently rebooted) are follow-ups. Guest cold-boot speedups live on a separate branch.Testing
Verified to build/lint clean per commit. End-to-end pause→reboot validation (synced files persist, uptime resets, correct on-storage layout) is done locally via
resume-build; secure-sandbox envd auth after reboot still needs validation against a deployment.