fix(session): destroy reaches warm replicas via tombstone + sandbox-gone guard (F7)#155
Conversation
…a gone sandbox (#136) Destroying a sandbox on one replica left warm sessions on other replicas serving ghost state: a written session reloaded a deleted tree into an empty pathCache (non-zero exit + garbage stderr inside an HTTP 200 exec), and a never-written session never reloaded because the DEL'd version key read as 0 and matched its lastSeenVersion of 0. Primary (Redis-independent): SqlFs.reload() detects a zero-row loadAllPaths and throws a typed ESANDBOXGONE instead of installing an empty pathCache. The session manager tears the warm session down and surfaces a clean ENOENT -> 404. Secondary (tombstone): destroy writes a distinct DESTROYED sentinel to the version key (with the version-key TTL) instead of deleting it. ensureFreshCache recognizes the sentinel before the numeric parse and tears down, covering the never-written variant. Re-creating a tombstoned sandbox clears the sentinel and starts cleanly at version 0.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 000f2de941
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
When destroySandboxFn throws for an in-memory session, the sandbox still exists in Postgres but deleteVersionKey was writing the DESTROYED sentinel unconditionally — causing warm sessions on other replicas to tear down with false 404s until the next rehydrate cleared the tombstone.
# Conflicts: # src/sql-fs/sql-fs.ts
Summary
Fixes F7: destroying a sandbox on one replica did not reach warm sessions on other replicas, which served ghost state.
deleteVersionKeyusedredis.del, so the absent counter read as0and matched the session'slastSeenVersion === 0(0 === 0) → no reload ever fired → ghost reads/writes forever.What changed
Primary (Redis-independent) —
src/sql-fs/sql-fs.ts,src/sql-fs/errors.tscreateEsandboxgone(code: "ESANDBOXGONE").SqlFs.reload()now detects a zero-rowloadAllPaths— a live sandbox's recursive-CTE always returns at least its root dir — and throwsESANDBOXGONEwithout clearing the existing caches, instead of installing an empty pathCache. The guard lives inreload()(notready(), which runs right aftercreateSandboxand is guaranteed a root row).ensureFreshCachecatchesESANDBOXGONE, tears the warm session down (mark closing, drop from the map, disconnect the per-session Postgres pool), and rethrows a cleanENOENT→ 404.Secondary (tombstone) —
src/api/session-manager.ts,src/sql-fs/redis-path-snapshot.tsVERSION_TOMBSTONE = "DESTROYED"(distinct from-1, the publish-failure marker; non-numeric).deleteVersionKeynow writesredis.set(key, "DESTROYED", "EX", VERSION_KEY_TTL_SECONDS)instead ofdel.ensureFreshCacherecognizes the sentinel before theNumber(raw) || 0parse → teardown + ENOENT. Covers the never-written variant (lastSeenVersion 0).ensureFreshCachecatch sites preserve a teardownENOENTinstead of remapping it toESESSIONCLOSING.Audited every
Number(raw) || 0clamp site so the sentinel is never silently coerced to 0:ensureFreshCacheGETEX (recognized before parse).initialVersion(sentinel → clears the tombstone and starts at 0, sincebuildFs.createSandboxre-creates the sandbox; prevents a laterINCR DESTROYED).SqlFs.#loadFreshPathCachesnapshot version check (sentinel → impossible-1so it never falsely matches aversion: 0snapshot).Tests
Unit (981 passed, 0 failures)
src/sql-fs/tests/unit/sql-fs.reload-sandbox-gone.test.tsthrows ESANDBOXGONE when loadAllPaths returns zero rows on reload(prior cache left intact)reload succeeds normally when the sandbox still has rowssrc/api/tests/unit/session-manager.f7-destroy-tombstone.test.ts(a) reload's ESANDBOXGONE becomes a teardown + clean ENOENT(b) DESTROYED tombstone is recognised before the numeric parse → teardown + ENOENT(c) never-written variant (lastSeenVersion 0, tombstone) still tears down — not the 0===0 silent ghostre-creating a tombstoned sandbox clears the sentinel and starts at version 0errors.tsemits theESANDBOXGONEcodesession-manager.version-counter.test.ts: the two destroy tests now assert theDESTROYEDtombstone instead of an absent key; FakeRedissetextended for theEXform.Gate:
pnpm typecheck && pnpm lint:fix && pnpm test:unitall pass.Live (real Neon + Redis, server on :8132)
Both vf7 sandboxes torn down; server stopped.
Reviewer manual steps
session_torn_down_sandbox_gonelog line).DESTROYED(self-expiring at the 7-day version TTL) after a destroy, not nil.Closes #136