Snapshot Runtime: QuickJS WASM VM with snapshot/restore for workflow execution#1300
Snapshot Runtime: QuickJS WASM VM with snapshot/restore for workflow execution#1300TooTallNate wants to merge 62 commits intoserialization-migrationfrom
Conversation
Phase 1 of the VM snapshot runtime (RFC #1298). World interface changes (packages/world): - Add SnapshotMetadata type (lastEventId, createdAt) with zod schema - Add snapshots sub-interface to Storage: save(), load(), delete() - Export new types and schema from @workflow/world world-local implementation (packages/world-local): - Filesystem-based snapshot storage in {dataDir}/snapshots/ - {runId}.bin for serialized VM snapshot data - {runId}.json for metadata (lastEventId, createdAt) - save() overwrites existing snapshots (atomic via ensureDir + write) - load() returns null if no snapshot exists - delete() removes both files - Wired into createStorage() with tracing instrumentation
Phase 2 of the VM snapshot runtime (RFC #1298). - Add quickjs-wasi dependency to @workflow/core - Create snapshot-runtime.ts with the basic structure: - runSnapshotWorkflow() entry point - Fresh VM creation with deterministic WASI clock and seeded Math.random - Snapshot restore path (TODO: event processing) - Host function stubs for useStep, sleep, createHook via Symbol.for() - Interrupt handler (30s timeout) - Memory limit (64MB) - Snapshot serialization on suspension The useStep, sleep, and createHook host functions are stubs with TODO markers — the basic VM lifecycle and snapshot/restore flow is in place.
Demonstrates the core snapshot/restore mechanism with a compiled workflow pattern: - useStep implemented inside QuickJS as JS code (not host functions) - Pending step resolve/reject functions stored on globalThis.__resolvers - Step metadata (stepId, args) preserved across snapshot/restore - Multi-step workflow: snapshot at each suspension, restore and resolve, workflow continues from exact suspension point - Both tests pass: simple workflow + metadata preservation
The snapshot runtime (runSnapshotWorkflow) now handles the complete workflow lifecycle: - First run: bootstrap VM with workflow primitives, evaluate compiled workflow bundle, start workflow function, process any existing events - Snapshot: capture VM state when workflow suspends on step/sleep - Restore: deserialize snapshot, process delta events to resolve/reject pending promises, execute pending jobs - Completion: detect workflow result or error Workflow primitives (useStep, sleep) are implemented as JavaScript code inside the QuickJS VM, not as host function callbacks. This keeps the implementation simple — the host communicates by evaluating small JS snippets to resolve/reject promises. 7 tests covering: simple completion, step suspension, snapshot/restore with step completion, multi-step across 3 snapshots, sleep suspension and wake, step failure with try/catch.
…napshot flag - Add snapshot-entrypoint.ts that handles the full lifecycle: snapshot load → event fetching → runSnapshotWorkflow → result handling (create events, queue steps, save/delete snapshots) - Add feature flag: set WORKFLOW_RUNTIME=snapshot to use the new runtime - When enabled, the snapshot path runs before the event-replay path - Step queuing matches the existing step handler's expected payload format - Wait handling includes timeout calculation for delayed re-queuing - Extract workflow ID from SWC-compiled bundle's manifest comment
The snapshot runtime now successfully: 1. Evaluates the compiled workflow bundle in QuickJS 2. Suspends on the first step call 3. Snapshots the VM state 4. Creates step_created events and queues step execution Web API stubs added for TransformStream, ReadableStream, WritableStream, TextEncoder, TextDecoder, Headers, URL, console — these are referenced by the compiled bundle but not needed for basic step/sleep workflows. Remaining issue: step_created events use raw JSON for step input args, but the step handler expects devalue-serialized data. This is the data serialization boundary that needs to be resolved (RFC #1298 discusses moving devalue inside the QuickJS VM).
…untime The step_created events now contain properly devalue-serialized input data (Uint8Array with 'devl' format prefix) instead of raw JSON. This makes the step handler's hydrateStepArguments() work correctly. When processing step_completed events, the output is deserialized via workflow.deserialize() on the host side before passing to the QuickJS VM as JSON. This handles the devalue format prefix correctly. Also properly serializes the run_completed output.
Step arguments are now wrapped in { args: [...], closureVars?: {...} }
before being serialized with workflow.serialize(), matching the format
expected by the step handler's hydrateStepArguments().
The step handler successfully:
- Receives the step message
- Deserializes the step arguments
- Executes the step function (add(10, 7))
- Handles retry on retryable errors
- Completes the step and re-queues the workflow
New files: - serialization/base64.ts — pure-JS base64 encode/decode (no Buffer) - serialization/reducers/common-vm.ts — VM-compatible reducers using instanceof Error instead of types.isNativeError(), pure-JS base64 instead of Buffer - serialization/codec-devalue-vm.ts — devalue codec using VM reducers - serialization/workflow-vm.ts — VM workflow serialize/deserialize The VM serializer produces the EXACT same wire format as the Node.js serializer (devl-prefixed devalue data). Verified by 14 tests including critical cross-compatibility: - VM serialize → Node.js hydrateStepArguments (step handler path) - Node.js dehydrateStepReturnValue → VM deserialize (step result path) - Pure-JS base64 matches Node.js Buffer base64 Sub-path export: @workflow/core/serialization/workflow-vm Re-export: workflow/internal/serialization now points to workflow-vm
Data now flows as format-prefixed devalue bytes (devl + devalue.stringify)
across the VM boundary, with no JSON conversion in the middle:
Step args: VM __wdk_serialize({args}) → Uint8Array → event input
Step results: event output Uint8Array → VM __wdk_deserialize → value
Workflow result: VM __wdk_serialize(result) → Uint8Array → event output
Host functions __wdk_serialize/__wdk_deserialize are installed on
globalThis and use the VM-compatible workflow serializer (pure JS,
no Node.js deps). They are re-installed after snapshot restore since
host callbacks don't survive the snapshot.
VM-compatible serializer (workflow-vm.ts) produces the EXACT same
wire format as the Node.js serializer — verified by cross-compatibility
tests.
The serializer (devalue + reducers + TextEncoder/TextDecoder polyfills) is now bundled as a 16.6KB IIFE that's evaluated inside the QuickJS VM during bootstrap. The serialize/deserialize functions are real JS functions running inside the VM, operating on QuickJS-native values (Date, Map, Set, etc.) that can't cross the VM boundary via dump(). Architecture: - vm-bundle-entry.ts is bundled by esbuild into a self-contained IIFE - esbuild inject option ensures TextEncoder/TextDecoder polyfills run before any module-level code - The host only passes opaque Uint8Array blobs (devl-prefixed devalue) across the VM boundary - On snapshot restore, the serde functions survive in the QuickJS heap (no re-registration needed) New files: - polyfills/text-encoder.ts — pure JS TextEncoder (from nx.js) - polyfills/text-decoder.ts — pure JS TextDecoder (from nx.js) - polyfills/install-text-coding.ts — installs polyfills on globalThis - serialization/vm-bundle-entry.ts — esbuild entry for VM serde bundle - runtime/vm-serde-bundle.generated.ts — auto-generated bundle string - scripts/build-vm-serde-bundle.js — build script (runs during pnpm build) Removed: installSerdeHostFunctions (no longer needed — serde is in-VM)
…ecution The snapshot metadata now stores eventsCursor (the pagination cursor from events.list()) instead of lastEventId (the raw event ID). The world-local pagination expects cursors in 'timestamp|id' format, not raw event IDs. This fix enables the full workflow lifecycle: 1. First invocation: QuickJS VM evaluates workflow, suspends on step_0 2. Step handler executes add(10, 7) = 17 3. Second invocation: snapshot restored, step_0 resolved, suspends on step_1 4. Step handler executes add(17, 8) = 25 5. Third invocation: snapshot restored, both steps resolved, workflow completes 6. run_completed event created, snapshot cleaned up Verified end-to-end with the nextjs-turbopack workbench: - All events created correctly (run_created → run_completed) - Step retries work (the add function throws on first attempt) - Snapshots are saved/restored/deleted at correct lifecycle points - Run status transitions to 'completed'
|
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🌍 Community Worlds (56 failed)mongodb (3 failed):
redis (2 failed):
turso (51 failed):
📋 Other (7 failed)e2e-snapshot-runtime-vercel (7 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
❌ 📋 Other
✅ Snapshot runtime tests (local) (non-blocking): success |
- Extract workflow arguments from run_created event and pass to the workflow function via __wdk_deserialize() - Call executePendingJobs() after each step_completed/step_failed/ wait_completed event to allow async function await resumptions to unwind one step at a time - Add debug logging for workflow result bytes The addTenWorkflow e2e test is still failing: the workflow result bytes are 'devl-1' (devalue for undefined) even though all steps complete successfully. The issue appears to be that the async function return value is not propagating through the SWC-compiled workflow bundle's promise chain. This needs investigation — the unit tests with simple inline workflow code work correctly.
Check if the hook entity already exists before calling events.create for hook_created. This prevents concurrent stale-snapshot invocations from creating spurious hook_conflict events that pollute the event log and interfere with workflow progression.
Replace the runId-prefixed counter approach (wrun_xxx_step_0) with proper ULIDs (step_01ABCDEF...) matching the event-replay runtime format. The ulid package is bundled into the VM serde bundle via esbuild, using the seeded Math.random PRNG for deterministic generation. Also fixes snapshot-runtime unit tests: update makeRun() to match current WorkflowRun type, use dynamic correlationId capture instead of hardcoded step_0/step_1/wait_0 values, and fix eventData.output → eventData.result.
Object.getPrototypeOf(null) throws TypeError. Add early null checks before instanceof/getPrototypeOf in stream reducers so null/undefined values are correctly rejected.
The minified bundle contains patterns like typeof x<"u" whose escaped quotes inside the string literal confuse downstream esbuild when it re-processes the compiled JS (e.g., Nitro prod builds). Encoding the bundle as base64 avoids all escaping issues — it's decoded at import time via Buffer.from() or atob().
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Write the esbuild output as a standalone .js file and read it from disk at runtime with readFileSync. This avoids the escaping issues that arose when embedding minified JS (containing patterns like typeof x<"u") inside a JS string literal that gets re-processed by downstream esbuild (e.g., Nitro prod builds).
Revert the readFileSync approach which broke in CJS bundled contexts (world-testing embedded tests) where import.meta.url is undefined. Instead, use a template literal to embed the bundle string. Template literals avoid the escaping issues that broke Nitro builds — esbuild's minifier produces patterns like typeof x<"u" whose escaped quotes inside a JSON-stringified regular string literal confuse downstream esbuild, but template literals use backticks which don't conflict.
…s, debug logging
- Fix VM serde using wrong symbol names for WritableStream/ReadableStream
(Symbol.for('STREAM_NAME') -> Symbol.for('WORKFLOW_STREAM_NAME')),
causing streams to serialize as '__empty' instead of the correct name
- Add WORKFLOW_GET_STREAM_ID implementation to VM bootstrap so
getWritable() works inside the QuickJS VM
- Include error stack traces in snapshot runtime failure logs
- Fix elapsed waits from prior invocations not firing on snapshot
restore — pending waits with hasCreatedEvent:true now get
wait_completed events created and immediate re-queue
- Add debug logging across the execution path (start, queue handler,
snapshot entrypoint) for visibility with DEBUG=workflow:*
Propagate WORKFLOW_RUNTIME into executionContext.workflowRuntime at start() so the server can select the snapshot runtime on a per-run basis. This allows the same Vercel deployment to serve both replay and snapshot runtime runs — the test runner opts in by setting the env var, and the queue handler reads it from the run entity. Add e2e-snapshot-runtime-vercel CI job that runs the full e2e suite against the nextjs-turbopack Vercel deployment with the snapshot runtime (non-blocking, like the local snapshot job).
- evalCode() now throws JSException directly instead of returning a result union — remove all unwrapResult() calls - Replace isException checks with try/catch using JSException - Update extractError() to extract error details from JSException.handle - Update raw QuickJS proof-of-concept test for new API
The QuickJS VM only understands the 'devl' serialization format. When encryption is enabled (Vercel deployments), event payloads have the 'encr' prefix. Resolve the encryption key in the snapshot entrypoint and decrypt run inputs and step results on the host side before passing them into the VM.
- Use the real nanoid package (via host function) for webhook tokens instead of 'tok_' + ULID, matching the event-replay runtime - Fix host callback ID mismatch on snapshot restore: quickjs-wasi starts nextCallbackId at 1 (not 0), so Math.random is ID 1 and __generateNanoid is ID 2. Previously registered as 0 and 1, causing Math.random to invoke nanoid and nanoid to return undefined. - Re-register host callbacks (Math.random, __generateNanoid) after QuickJS.restore() so they survive snapshot/restore - Add error stack trace logging to world-local queue handler - Add debug logging for hook_created event creation
- Update quickjs-wasi to v2.0.0 which uses string-based host callback names instead of numeric IDs for registerHostCallback() - Add native C extensions: encoding (TextEncoder/TextDecoder), base64 (btoa/atob), headers (Headers), url (URL/URLSearchParams), and structuredClone - Delete packages/core/src/polyfills/ entirely — TextEncoder, TextDecoder, and Headers polyfills replaced by native extensions - Remove polyfill injection from serde bundle build - Replace manual base64url implementation with native btoa() - Serde bundle size reduced from 22.3 KB to 18.0 KB
Compress VM snapshot data with gzip before writing to disk. The
metadata JSON includes a dataFile field with the binary filename
(e.g. '{runId}.bin.gz') so the correct compression format can be
determined on load. Backward compatible with existing uncompressed
.bin snapshots via fallback when dataFile is absent.
…dles - Use dynamic import with variable indirection for snapshot-entrypoint to prevent esbuild from pulling quickjs-wasi into the workflow bundle (which is CJS and breaks import.meta.url) - Externalize quickjs-wasi from Nitro's server bundle (rollup/rolldown) - Add quickjs-wasi to Next.js serverExternalPackages automatically
The relative path './runtime/snapshot-entrypoint.js' breaks when Turbopack chunks @workflow/core into a different output directory. Use '@workflow/core/runtime/snapshot-entrypoint' package specifier instead, which bundlers resolve correctly regardless of chunking.
…dFileSync Instead of importing quickjs-wasi subpath modules (quickjs-wasi/base64, quickjs-wasi/encoding, etc.) which use import.meta.url internally and break when bundled to CJS, resolve the package directory via createRequire + require.resolve and read the .wasm and .so files directly with readFileSync. This is compatible with nft file tracing and avoids all bundler issues. - Remove quickjs-wasi subpath imports entirely - Construct ExtensionDescriptor objects manually with pre-read bytes - Remove unused wasm option from SnapshotRuntimeOptions - Add initFn for structured-clone extension (hyphen → underscore)
The dynamic import was unnecessary — the flow bundle runs in Node.js (not the QuickJS VM), so quickjs-wasi being in the bundle is correct. Remove the subpath export that was added for the dynamic import.
Use require.resolve directly when available (CJS bundles), falling back to createRequire(import.meta.url) for ESM contexts.
Revert to importing quickjs-wasi subpath modules directly (base64, encoding, headers, url, structured-clone) since they work correctly when externalized. Add quickjs-wasi and quickjs-wasi/* to the external list in both the final workflow bundle and steps bundle esbuild configs, so the extension modules stay as require() calls and their import.meta.url-based .so loading works at runtime.
Generate quickjs-assets.generated.ts at build time containing base64-encoded quickjs.wasm and extension .so files. Import the decoded buffers directly in snapshot-runtime.ts and pass to QuickJS.create()/restore(). This eliminates all runtime filesystem access, import.meta.url resolution, and require.resolve calls — it's just JavaScript importing JavaScript. Remove all quickjs-wasi externalizations from builders, nitro, and next since they're no longer needed.
Match the event-replay runtime's refactor (dcb0761) where builtin step functions use this instead of an explicit parameter. Assign useStep() proxies directly to Response/Request prototypes via Object.defineProperties so the this binding provides the instance, which gets serialized as thisVal by WORKFLOW_USE_STEP.
Summary
Implements the snapshot-based workflow runtime described in RFC #1298. Instead of replaying the full event log on every invocation, workflows run in a QuickJS WASM VM that is snapshotted at suspension points and restored on resumption.
Verified end-to-end with the nextjs-turbopack workbench — a multi-step workflow completes successfully using
WORKFLOW_RUNTIME=snapshot.How it works
quickjs-wasi)Key components
snapshot-runtime.tssnapshot-entrypoint.tsvm-serde-bundle.generated.tsworkflow-vm.tspolyfills/text-encoder.tspolyfills/text-decoder.tsWhat's included
snapshots.save(),snapshots.load(),snapshots.delete().bin+.jsonsidecar)WORKFLOW_RUNTIME=snapshotWhat's NOT included (future work)
createHook/ webhook support in snapshot runtimeHow to test
Depends on #1299 (serialization refactor) via the
serialization-migrationbranch.