perf(orchestrator): reduce mount namespace copy overhead during sandbox startup#3006
Conversation
|
We require contributors to sign our Contributor License Agreement, and we don't have @emailcannotbeblank on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check' |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
In packages/orchestrator/pkg/sandbox/network/mount_namespace.go, if restoring the host mount namespace fails during cleanup in createTemplate or create, unlocking the OS thread allows the Go runtime to reuse it for other goroutines, polluting them with the unshared mount namespace. Conditionally unlocking the thread only when the namespace is successfully restored ensures that a corrupted thread is discarded by the runtime.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| runtime.LockOSThread() | ||
| defer runtime.UnlockOSThread() | ||
|
|
||
| if err := unix.Unshare(unix.CLONE_FS); err != nil { | ||
| return nil, fmt.Errorf("failed to unshare fs attributes before creating template mount namespace: %w", err) | ||
| } | ||
|
|
||
| hostMntNS, err := openCurrentMountNamespace() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to open host mount namespace: %w", err) | ||
| } | ||
| defer hostMntNS.Close() | ||
|
|
||
| restoreHostNS := func() { | ||
| if err := unix.Setns(int(hostMntNS.Fd()), unix.CLONE_NEWNS); err != nil { | ||
| logger.L().Error(ctx, "error resetting mount namespace back to the host namespace", zap.Error(err)) | ||
| } | ||
| } | ||
|
|
||
| if err := unix.Unshare(unix.CLONE_NEWNS); err != nil { | ||
| return nil, fmt.Errorf("failed to unshare template mount namespace: %w", err) | ||
| } | ||
| defer restoreHostNS() |
There was a problem hiding this comment.
If restoring the host mount namespace fails during cleanup, unlocking the OS thread allows the Go runtime to reuse it for other goroutines, polluting them with the unshared mount namespace. Conditionally unlocking the thread only when the namespace is successfully restored ensures that a corrupted thread is discarded by the runtime.
var unlockThread bool
runtime.LockOSThread()
defer func() {
if unlockThread {
runtime.UnlockOSThread()
}
}()
if err := unix.Unshare(unix.CLONE_FS); err != nil {
return nil, fmt.Errorf("failed to unshare fs attributes before creating template mount namespace: %w", err)
}
hostMntNS, err := openCurrentMountNamespace()
if err != nil {
return nil, fmt.Errorf("failed to open host mount namespace: %w", err)
}
defer hostMntNS.Close()
restoreHostNS := func() {
if err := unix.Setns(int(hostMntNS.Fd()), unix.CLONE_NEWNS); err != nil {
logger.L().Error(ctx, "error resetting mount namespace back to the host namespace", zap.Error(err))
} else {
unlockThread = true
}
}
if err := unix.Unshare(unix.CLONE_NEWNS); err != nil {
unlockThread = true
return nil, fmt.Errorf("failed to unshare template mount namespace: %w", err)
}
defer restoreHostNS()| runtime.LockOSThread() | ||
| defer runtime.UnlockOSThread() | ||
|
|
||
| if err := unix.Unshare(unix.CLONE_FS); err != nil { | ||
| return nil, fmt.Errorf("failed to unshare fs attributes before creating mount namespace: %w", err) | ||
| } | ||
|
|
||
| hostMntNS, err := openCurrentMountNamespace() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to open host mount namespace: %w", err) | ||
| } | ||
| defer hostMntNS.Close() | ||
|
|
||
| restoreHostNS := func() { | ||
| if err := unix.Setns(int(hostMntNS.Fd()), unix.CLONE_NEWNS); err != nil { | ||
| logger.L().Error(ctx, "error resetting mount namespace back to the host namespace", zap.Error(err)) | ||
| } | ||
| } | ||
|
|
||
| if err := unix.Setns(int(templateMntNS.Fd()), unix.CLONE_NEWNS); err != nil { | ||
| return nil, fmt.Errorf("failed to enter template mount namespace: %w", err) | ||
| } | ||
| defer restoreHostNS() |
There was a problem hiding this comment.
If restoring the host mount namespace fails during cleanup, unlocking the OS thread allows the Go runtime to reuse it for other goroutines, polluting them with the unshared mount namespace. Conditionally unlocking the thread only when the namespace is successfully restored ensures that a corrupted thread is discarded by the runtime.
var unlockThread bool
runtime.LockOSThread()
defer func() {
if unlockThread {
runtime.UnlockOSThread()
}
}()
if err := unix.Unshare(unix.CLONE_FS); err != nil {
return nil, fmt.Errorf("failed to unshare fs attributes before creating mount namespace: %w", err)
}
hostMntNS, err := openCurrentMountNamespace()
if err != nil {
return nil, fmt.Errorf("failed to open host mount namespace: %w", err)
}
defer hostMntNS.Close()
restoreHostNS := func() {
if err := unix.Setns(int(hostMntNS.Fd()), unix.CLONE_NEWNS); err != nil {
logger.L().Error(ctx, "error resetting mount namespace back to the host namespace", zap.Error(err))
} else {
unlockThread = true
}
}
if err := unix.Setns(int(templateMntNS.Fd()), unix.CLONE_NEWNS); err != nil {
unlockThread = true
return nil, fmt.Errorf("failed to enter template mount namespace: %w", err)
}
defer restoreHostNS()
Summary
This PR significantly changes and optimizes the sandbox creation path, greatly improving sandbox startup speed under high concurrency.
Problem
When creating sandboxes with 100 concurrent requests, sandbox startup is slow. A key step suffers from lock contention, causing the per-sandbox cost to exceed 400ms.
More specifically,
NewProcessinpackages/orchestrator/pkg/sandbox/fc/process.gois a critical function in the sandbox creation path.Previously, this function prepared a bash-based startup flow that included commands similar to:
unshare -m ... ip netns exec ns-xxx firecracker ...Both commands copy the mount tree and eventually reach
copy_mnt_nsin the Linux kernel filefs/namespace.c. When each thread copies the mount tree, it callsnamespace_lockand tries to acquire a global lock. The kernel code is:With 100 concurrent threads contending for the same global lock, sandbox startup latency becomes very high.
In addition:
Copying a mount tree requires traversing the mount tree, which is expensive. This can be measured with:
time unshare -m nprocThis command can be used to test how long it takes to copy the mount tree.
The previous flow performed two mount tree copies (
unshare -mandip netns exec) and one mount namespace release (the mount namespace created byunshare -mbeing released). All of these operations acquire the global namespace lock.For these reasons, creating the Firecracker process has very large latency under high concurrency.
Approach
Create a minimal mount tree template.
Before creating sandboxes, the orchestrator prepares a minimal mount tree and unmounts unnecessary nodes. Later, when creating each sandbox, the mount tree is copied from this minimal template instead of being copied from the full host mount tree. This greatly reduces the time spent copying the mount tree.
Optimize the Firecracker process startup flow.
The previous
unshare -mandip netns exec ns-xxxflow is replaced with explicitsetnssystem calls. This reduces the startup path from two mount tree copies and one mount namespace release to only one mount tree copy.Result
This greatly improves the time required to create the Firecracker process.
In
packages/orchestrator/pkg/sandbox/fc/process.go, theconfigurefunction is more than 10x faster in our tests, decreasing from over 400ms to under 40ms.Discussion
This version not only significantly improves startup speed, but also improves sandbox security.
Previously, the host-side Firecracker process inherited a copy of the host mount tree. If an attacker compromised the Firecracker process, they could observe more of the host filesystem structure. This optimization prunes the mount tree used by the Firecracker process and reduces the attack surface.
This optimization reduces the mount tree size and reduces the number of global lock acquisitions.
Another possible optimization would be to reuse mount namespaces together with reusable network namespaces. Since different sandboxes already reuse network namespace slots, we could prepare a paired mount namespace for each network namespace. When creating a sandbox, after acquiring a network namespace slot, the sandbox could directly use the corresponding mount namespace.
However, that approach would reduce isolation to some extent. If an attacker compromised the Firecracker process, they might be able to attack other users that later reuse the same network namespace and mount namespace pair.
This PR chooses a more conservative approach: each sandbox still gets its own mount namespace, but that namespace is copied from a minimal template and created with fewer global-lock-heavy operations.