Skip to content

fix(session): destroy reaches warm replicas via tombstone + sandbox-gone guard (F7)#155

Merged
Hazzng merged 3 commits into
mainfrom
fix/f7-destroy-tombstone
Jun 20, 2026
Merged

fix(session): destroy reaches warm replicas via tombstone + sandbox-gone guard (F7)#155
Hazzng merged 3 commits into
mainfrom
fix/f7-destroy-tombstone

Conversation

@Hazzng

@Hazzng Hazzng commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes F7: destroying a sandbox on one replica did not reach warm sessions on other replicas, which served ghost state.

  • Written session: the warm replica reloaded a deleted tree into an empty pathCache, surfacing as a non-zero exit + garbage stderr inside an HTTP 200 exec (worse than a 404).
  • Never-written session: deleteVersionKey used redis.del, so the absent counter read as 0 and matched the session's lastSeenVersion === 0 (0 === 0) → no reload ever fired → ghost reads/writes forever.

What changed

Primary (Redis-independent)src/sql-fs/sql-fs.ts, src/sql-fs/errors.ts

  • New createEsandboxgone (code: "ESANDBOXGONE").
  • SqlFs.reload() now detects a zero-row loadAllPaths — a live sandbox's recursive-CTE always returns at least its root dir — and throws ESANDBOXGONE without clearing the existing caches, instead of installing an empty pathCache. The guard lives in reload() (not ready(), which runs right after createSandbox and is guaranteed a root row).
  • ensureFreshCache catches ESANDBOXGONE, tears the warm session down (mark closing, drop from the map, disconnect the per-session Postgres pool), and rethrows a clean ENOENT → 404.

Secondary (tombstone)src/api/session-manager.ts, src/sql-fs/redis-path-snapshot.ts

  • New VERSION_TOMBSTONE = "DESTROYED" (distinct from -1, the publish-failure marker; non-numeric).
  • deleteVersionKey now writes redis.set(key, "DESTROYED", "EX", VERSION_KEY_TTL_SECONDS) instead of del.
  • ensureFreshCache recognizes the sentinel before the Number(raw) || 0 parse → teardown + ENOENT. Covers the never-written variant (lastSeenVersion 0).
  • Both ensureFreshCache catch sites preserve a teardown ENOENT instead of remapping it to ESESSIONCLOSING.

Audited every Number(raw) || 0 clamp site so the sentinel is never silently coerced to 0:

  • ensureFreshCache GETEX (recognized before parse).
  • session-creation initialVersion (sentinel → clears the tombstone and starts at 0, since buildFs.createSandbox re-creates the sandbox; prevents a later INCR DESTROYED).
  • SqlFs.#loadFreshPathCache snapshot version check (sentinel → impossible -1 so it never falsely matches a version: 0 snapshot).

Tests

Unit (981 passed, 0 failures)

  • src/sql-fs/tests/unit/sql-fs.reload-sandbox-gone.test.ts
    • throws ESANDBOXGONE when loadAllPaths returns zero rows on reload (prior cache left intact)
    • reload succeeds normally when the sandbox still has rows
  • src/api/tests/unit/session-manager.f7-destroy-tombstone.test.ts
    • (a) reload's ESANDBOXGONE becomes a teardown + clean ENOENT
    • (b) DESTROYED tombstone is recognised before the numeric parse → teardown + ENOENT
    • (c) never-written variant (lastSeenVersion 0, tombstone) still tears down — not the 0===0 silent ghost
    • re-creating a tombstoned sandbox clears the sentinel and starts at version 0
    • guard that errors.ts emits the ESANDBOXGONE code
  • Updated session-manager.version-counter.test.ts: the two destroy tests now assert the DESTROYED tombstone instead of an absent key; FakeRedis set extended for the EX form.

Gate: pnpm typecheck && pnpm lint:fix && pnpm test:unit all pass.

Live (real Neon + Redis, server on :8132)

1. CREATE  → 201 {"id":"c3febdc8-…","name":"vf7-f7-test"}
2. EXEC    echo hello-f7 > /work.txt → HTTP 200, stdout "hello-f7\n/home/user\n", exit 0
   redis: vfs:default:ver:<id> = "1"
3. DELETE  → HTTP 204
   redis: vfs:default:ver:<id> = "DESTROYED"   (NOT nil)
4. Re-EXEC cat /work.txt → HTTP 404 {"code":"ENOENT"}
   Re-READ files API     → HTTP 404

Warm-replica tombstone path (sandbox row still alive, only the version key tombstoned —
isolates the SECONDARY defense):
  create vf7-warm → exec write (ver=1) → set ver=DESTROYED directly in Redis
  → re-exec on the still-warm session → HTTP 404 {"code":"ENOENT","error":"… '/<id>'"}

Both vf7 sandboxes torn down; server stopped.

Reviewer manual steps

  1. Two-replica check (not reproducible single-replica): warm a session on replica B, destroy the sandbox on replica A, then exec on B — expect a clean 404, with B's session dropped and its PG pool disconnected (look for the session_torn_down_sandbox_gone log line).
  2. Confirm the version key holds DESTROYED (self-expiring at the 7-day version TTL) after a destroy, not nil.

Closes #136

…a gone sandbox (#136)

Destroying a sandbox on one replica left warm sessions on other replicas
serving ghost state: a written session reloaded a deleted tree into an empty
pathCache (non-zero exit + garbage stderr inside an HTTP 200 exec), and a
never-written session never reloaded because the DEL'd version key read as 0
and matched its lastSeenVersion of 0.

Primary (Redis-independent): SqlFs.reload() detects a zero-row loadAllPaths
and throws a typed ESANDBOXGONE instead of installing an empty pathCache. The
session manager tears the warm session down and surfaces a clean ENOENT -> 404.

Secondary (tombstone): destroy writes a distinct DESTROYED sentinel to the
version key (with the version-key TTL) instead of deleting it. ensureFreshCache
recognizes the sentinel before the numeric parse and tears down, covering the
never-written variant. Re-creating a tombstoned sandbox clears the sentinel and
starts cleanly at version 0.
@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: e3867a93-156b-434a-8b43-c3566e3de44a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/f7-destroy-tombstone

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Hazzng

Hazzng commented Jun 20, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 000f2de941

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/api/session-manager.ts
Hazzng added 2 commits June 20, 2026 16:25
When destroySandboxFn throws for an in-memory session, the sandbox still
exists in Postgres but deleteVersionKey was writing the DESTROYED sentinel
unconditionally — causing warm sessions on other replicas to tear down
with false 404s until the next rehydrate cleared the tombstone.
@Hazzng Hazzng merged commit 87a08d6 into main Jun 20, 2026
6 checks passed
@Hazzng Hazzng deleted the fix/f7-destroy-tombstone branch June 20, 2026 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

F7: Destroy must reach warm replicas — tombstone + sandbox-gone reload guard

1 participant