Skip to content

Filesystem only snapshots#2995

Open
bchalios wants to merge 6 commits into
mainfrom
feat/fs-only-snapshots-pause
Open

Filesystem only snapshots#2995
bchalios wants to merge 6 commits into
mainfrom
feat/fs-only-snapshots-pause

Conversation

@bchalios

Copy link
Copy Markdown
Contributor

What

Adds an opt-in snapshot mode that persists only the filesystem (rootfs) and skips the guest memory snapshot. Resuming such a snapshot cold-boots (reboots) a fresh VM from the rootfs instead of restoring memory.

  • Sandbox.Pause(WithFilesystemSnapshot()) — pauses without a memory snapshot; uploads only the rootfs + metadata (no memfile/snapfile).
  • Factory.RebootSandbox — the resume primitive that cold-boots from a snapshot's rootfs, marked routable only once the guest is ready.
  • Cold boot now writes the envd MMDS access-token hash, so rebooted secure sandboxes authenticate the same way memory-resumed ones do.
  • cmd/resume-build gains -fs-only and -reboot to exercise the full cycle locally.

Normal memory snapshots are unchanged — this mode is strictly opt-in.

Why

Memory snapshots are the expensive part of a pause (large artifacts, slow upload, more storage). Many workloads don't need warm RAM restored on resume — they only need their disk state. A filesystem-only snapshot is cheaper and smaller to take and store, and also serves as an escape hatch when memory artifacts are slow, oversized, or missing. The trade-off is explicit: a reboot loses RAM, running processes, connections, and unsynced writes; only the synced filesystem survives.

Scope / not in this PR

Orchestrator-only. No public API/proto/SDK wiring yet — the reboot path is reachable via resume-build but not yet over gRPC. API-side resume selection and auto-resume/connect gating (so a disk-only snapshot is never silently rebooted) are follow-ups. Guest cold-boot speedups live on a separate branch.

Testing

Verified to build/lint clean per commit. End-to-end pause→reboot validation (synced files persist, uptime resets, correct on-storage layout) is done locally via resume-build; secure-sandbox envd auth after reboot still needs validation against a deployment.

@cla-bot cla-bot Bot added the cla-signed label Jun 12, 2026
@cursor

cursor Bot commented Jun 12, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches pause/upload/resume lifecycle and guest sync durability; reboot auth via MMDS is new on cold boot, but the mode is opt-in and gRPC resume selection is unchanged.

Overview
Adds an opt-in filesystem-only snapshot path: Pause can skip guest memory capture, run a mandatory guest sync before pause, persist rootfs + metadata only, and mark snapshots with FilesystemSnapshot so upload skips memfile/snapfile. Resume for those builds is RebootSandbox—cold-boot from rootfs with deferred MarkRunning until envd is ready, MMDS access-token setup on FC create for secure cold boots, and a new StartTypeReboot metric. resume-build gains -fs-only and -reboot; scheduling/template loading tolerate missing memfile (rootfs-only affinity). Full memory snapshots are unchanged unless the new pause option is used; orchestrator gRPC still resumes via memory snapshot, not reboot.

Reviewed by Cursor Bugbot for commit b2273c7. Bugbot is set up for automated code reviews on this repo. Configure here.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

No critical findings or correctness issues were identified in the code changes, and no review comments were provided. I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@bchalios bchalios force-pushed the feat/fs-only-snapshots-pause branch from f669bc7 to c609e2c Compare June 12, 2026 15:58
@bchalios bchalios marked this pull request as ready for review June 12, 2026 17:01

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c609e2c116

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

RootfsDroppedBuilds: uint32(rootDropped),
}

if memfileHeader != nil && memfileHeader.Metadata != nil {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve rootfs metadata after fs-only reload

This nil-memfile path only works while Pause still has the rootfs header in hand. Once a filesystem-only snapshot is uploaded and later reloaded, there is no memfile header/object, but storageTemplate.SchedulingMetadata still calls t.memfile.WaitWithContext and returns nil on that error before reading rootfs (packages/orchestrator/pkg/sandbox/template/storage_template.go:278-282). In contexts that report scheduling metadata from a loaded template, filesystem-only builds therefore lose even the rootfs affinity data; update that provider to tolerate the missing memfile and call FromHeaders with a nil memfile header and the resolved rootfs header.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! storageTemplate.SchedulingMetadata now reports rootfs-only scheduling metadata for reloaded fs-only templates instead of dropping it.

bchalios and others added 6 commits June 12, 2026 19:45
Add Sandbox.Pause(WithFilesystemSnapshot()) to produce a snapshot that
persists only the rootfs and skips the guest memory snapshot. The default
remains a full memory snapshot; fs-only is strictly opt-in.

When enabled, before pausing the VM a mandatory guest sync is run via envd
(a failed sync fails the pause, since no memory snapshot preserves the page
cache and the rootfs would otherwise be missing acknowledged writes), and
memory prefetch is cleared. CreateSnapshot is still called for its disk
drain+flush side effect, but the memfile diff is left empty (NoDiff) with a
resolved-nil header.

scheduling.FromHeaders now tolerates a nil memfile header, emitting
rootfs-only scheduling metadata instead of dropping the metadata entirely.

Snapshot.FilesystemSnapshot records the decision at pause time: it can't be
inferred from the diff shape, since a memory snapshot with zero dirty pages
also yields a NoDiff memfile but still needs its snapfile uploaded.

Add -fs-only flag in cmd/resume-build to allow testing the operation
locally. Snapshots taken with -fs-only should include only rootfs
(meta)data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
…pshot

Export the memorySnapshot helper as MemorySnapshot and embed it in Snapshot
as a single field, replacing the flat MemfileDiff / MemfileDiffHeader /
MemfileBlockSize fields. The Diff, DiffHeader, and BlockSize fields are
exported (read cross-package by the upload, layer, and server paths); the
header and newBytes scheduling inputs stay unexported as they are used only
at Pause time.

Pure rename/restructure: no behavioral change. Consumers updated to read
snap.MemorySnapshot.Diff / .DiffHeader / .BlockSize.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
A filesystem-only snapshot now uploads only rootfs.ext4(.header) and
metadata.json. The snapfile upload is skipped in both the V3 and V4 paths
(it is still created during pause for its disk drain+flush side effect, just
not persisted), and NewUpload skips resolving the memfile compress config —
which would otherwise fail validation on the zero memfile block size when
compression is enabled.

The memfile body and header need no change: NoDiff.CachePath returns an empty
path (body self-skips) and the resolved-nil diff header makes both upload
paths return early. Absence of the memory artifacts on storage is the on-disk
signal that a snapshot is filesystem-only.

Memory snapshots are unaffected (FilesystemSnapshot is false).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
RebootSandbox cold-boots a fresh Firecracker VM from a snapshot's rootfs
without restoring guest memory — the resume primitive for filesystem-only
snapshots. It masks an empty memfile (sizing NoopMemory only), selects the
NBD provider (empty rootfs cache path) so guest TRIM and overlay chaining
work like a normal resume, uses the Sync IO engine, and boots via systemd
init. The sandbox is marked running only after envd is ready, matching
ResumeSandbox's routing guarantee; the caller's absolute end time is honored
so queue delay can't extend the TTL.

To support the deferred mark-running, CreateSandbox gains a
WithDeferredMarkRunning option (cold boot otherwise marks running before envd
is up). A StartTypeReboot start type is added, and the ext4 rootflags are
pulled into a named constant documenting that "noload" must never be set —
fs-only resume relies on ext4 replaying its journal on cold boot.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add a -reboot flag that cold-boots from the build's rootfs via
Factory.RebootSandbox instead of resuming from the memory snapshot, routed
through a new runner.startSandbox helper used by the resume, interactive,
cmd, and pause paths. Lets the filesystem-only pause + reboot resume cycle be
exercised end to end locally (e.g. `resume-build -fs-only -cmd-pause '...'`
then `resume-build -reboot -cmd 'cat /proc/uptime'`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
A memory resume calls setMmds with the access-token hash (fc/process.go
Resume) so the guest envd can authenticate /init against the MMDS hash. Cold
boot (Process.Create) only configured the MMDS transport and never wrote the
data, so a rebooted secure sandbox would start envd with no hash and /init
would fall into the unauthenticated first-time-setup branch.

Process.Create now writes the MMDS metadata before startVM when
ProcessOptions.AccessToken is set, mirroring Resume. RebootSandbox always
sets it; an empty token hashes to the "no token" value, matching Resume's
behavior for non-secure sandboxes. Both the MMDS hash and the /init token
sent by WaitForEnvd are sourced from config.Envd.AccessToken, so they always
agree. Template-build cold boots leave AccessToken nil and are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
@bchalios bchalios force-pushed the feat/fs-only-snapshots-pause branch from c609e2c to b2273c7 Compare June 12, 2026 17:45
// bestEffortReclaim's sync step (LD-flag gated, best-effort), this always runs
// and always reports failure.
func (s *Sandbox) guestSync(ctx context.Context) error {
const syncTimeout = 5 * time.Second

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 is pretty short if there's a lot of dirty pages and there's a lot of IO on the box. Maybe this is fine in practice, but we should at least math this out for worst case that every page is dirty and is not flushed for a our maximum disk size.

// Best-effort pre-pause guest reclaim (fstrim, sync, drop_caches,
// compact_memory) on the live VM via envd. Per-step caps are LD-flag-driven;
// all default to 0 which disables the chain entirely. Non-fatal.
s.bestEffortReclaim(ctx)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is locked behind a feature flag (within the function call), but without it there's a race between when sync is called and when the actual snapshot happens.

return nil, fmt.Errorf("create empty memfile: %w", err)
}

maskedTemplate := template.NewMaskTemplate(t, template.WithMemfile(memfile))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Templates created before this PR will have inconsistent disks. This is fine normally because we load the dirty pages from memory, but this might create unexpected situation for the user where disk does not have the data they expect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants