feat(run-ops): run-store routing seam + run-engine read seams#4116
Conversation
|
WalkthroughChangesThis PR introduces a RunStore/ControlPlaneResolver seam in RunEngine, separating run-scalar persistence (via Sequence Diagram(s)sequenceDiagram
participant RunEngine
participant WaitpointSystem
participant CrossSeamGuard
participant RunStore
RunEngine->>WaitpointSystem: completeWaitpoint(waitpointId)
WaitpointSystem->>CrossSeamGuard: consult(routeKind)
CrossSeamGuard-->>WaitpointSystem: store decision
WaitpointSystem->>RunStore: updateManyWaitpoints via resolved store
RunStore-->>WaitpointSystem: COMPLETED waitpoint
Related PRs: None identified. Suggested labels: area: run-engine, area: internal-packages Suggested reviewers: matt-aitken, ericallam Poem: 🐰 A rabbit hopped between two dens, 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
7e5633b to
d331802
Compare
88d1290 to
d5610a9
Compare
|
On the out-of-diff |
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
5021523 to
857e7d6
Compare
460477f to
1af2bab
Compare
857e7d6 to
6062eac
Compare
f3248e0 to
fc39e05
Compare
0c9f1bb to
271eca8
Compare
fc39e05 to
49667ee
Compare
271eca8 to
eaa09cf
Compare
49667ee to
f2c2b2c
Compare
eaa09cf to
bf7ca7d
Compare
f2c2b2c to
4ba7198
Compare
bf7ca7d to
9612579
Compare
4ba7198 to
0d2f934
Compare
0d2f934 to
8b126c6
Compare
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uting tests The process-global mint flag was removed from friendlyId; drive NEW-residency ids via generateKsuidId() directly instead of the removed setKsuidMintEnabled(true).
…em test comments Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Migration/drain is deferred; residency is decided purely by id-shape, so the live-migration marker/fence machinery is unused. Removes redirect_marker DB usage.
… describe current enqueue system Strip development-process scaffolding labels (Test A/B, Group A, GAP A/B, Step N) from run-engine test comments and describe/containerTest title strings, keeping the meaningful behavioral descriptions. Also reword the enqueueSystem store JSDoc to describe current behavior instead of a historical delta. Comments and title strings only; no test logic touched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… drop test enumeration labels Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… comment Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Aligns stripGroupARelations / #hydrateGroupARelations / *_GROUP_A with the already-renamed DedicatedRelationHydrator / DedicatedRelationSpec naming. Identifier-only, no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er, not its lagging replica RoutingRunStore.findRun/findRunOrThrow discarded the caller's client and always routed reads to the owning store's replica. Read-after-write callers (api.v1.sessions, api.v1.tasks.$taskId.trigger, sessions.end-and-continue) deliberately pass the control-plane WRITER to read back a just-committed run and beat replica lag. Routed to the lagging replica the read returned null → the run was "not found" → HTTP 500 (intermittent, hottest on the trigger read-back). Key on the passed client's IDENTITY instead of dropping it: a WRITER (exposes $transaction; a read replica does not) signals read-your-writes, so route to the OWNING store's own primary (writer) for BOTH residencies — the legacy writer for a cuid id, the NEW writer for a ksuid id — WITHOUT leaking a control-plane client into a NEW-DB query (each store reads its OWN writer). A replica / nothing keeps the default (owning store's replica), preserving replica offload for plain reads. Adds findRunOnPrimary / findRunOrThrowOnPrimary to PostgresRunStore + the RunStore interface (mirrors the existing findWaitpointOnPrimary primitive); no caller signature changes. Regression test (heteroRunOpsPostgresTest, never mocked) simulates replica lag with a recording proxy and proves read-after-write resolves via the writer for both LEGACY (cuid) and NEW (ksuid) residency, asserts the writer (not replica) was hit, and that plain reads still route to the replica. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… control-plane class at the store write boundary The store can be backed by either the control-plane @trigger.dev/database client or the @internal/run-ops-database client. These are separately generated clients, each with its own copy of the Prisma runtime, so each has its own PrismaClientKnownRequestError class object (identical code, distinct module identity). A P2002 raised by the run-ops client is therefore not `instanceof` the control-plane class, so the webapp's uniform `error instanceof Prisma.PrismaClientKnownRequestError` P2002->422 conversion is skipped and a raw 500 escapes (e.g. idempotent batch re-create returned 500 not 422). PostgresRunStore now wraps its writer/replica clients so any foreign known-request-error is re-thrown as the control-plane Prisma.PrismaClientKnownRequestError (preserving code/message/meta/clientVersion), including inside runInTransaction. This immunizes the whole class of routed-write error catches regardless of which generated client backs the store. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…esenter The batch-results presenter read the batch row via `this._prisma.batchTaskRun.findFirst`, which only hits the control-plane DB. Under the run-ops split a batch (and its items) can live in the NEW run-ops DB, so a bare-constructed presenter (as the results route builds it) missed a NEW-resident batch and returned 404. Route the batch-row lookup through `runStore.findBatchTaskRunByFriendlyId`, whose router probes NEW then LEGACY (dropping the passed client hint under the split, while the non-split single-store path keeps reading the injected client). Mirrors ApiRunResultPresenter's use of `runStore.findRun`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ency Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…v resolution is unavailable Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nv is null The review fix (c9237f5) took env resolution off the expireRun path: org/ project/environment identity now comes from the run's latest execution snapshot, so the run is fully expired (message acked, waitpoint completed to unblock any parent, runExpired emitted) even when the control-plane resolver returns null. The old test still encoded the pre-fix contract (no event, no ack) and asserted expiredEvents.length === 0. Update it to assert the graceful-degradation contract the fix intends. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e-DB The passthrough resolver runs single-DB, so a run and its environment share one database and there is no cross-seam FK to replace (matches main, which dropped the TaskRun env FK). Skip the always-on hot-path findFirst; the split-aware resolver still enforces env existence when the split makes a DB-level FK impossible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9effd74 to
d46aab2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6d641654-57d5-483a-9738-2b47363d764c
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (72)
.server-changes/run-engine-store-routing-seam.mdapps/webapp/app/presenters/v3/ApiBatchResultsPresenter.server.tsapps/webapp/test/presenters/ApiBatchResultsPresenter.split.test.tsapps/webapp/test/updateMetadataStoreRoutingHetero.test.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/errors.tsinternal-packages/run-engine/src/engine/index.tsinternal-packages/run-engine/src/engine/systems/batchSystem.test.tsinternal-packages/run-engine/src/engine/systems/batchSystem.tsinternal-packages/run-engine/src/engine/systems/checkpointSystem.tsinternal-packages/run-engine/src/engine/systems/debounceSystem.test.tsinternal-packages/run-engine/src/engine/systems/debounceSystem.tsinternal-packages/run-engine/src/engine/systems/delayedRunSystem.test.tsinternal-packages/run-engine/src/engine/systems/delayedRunSystem.tsinternal-packages/run-engine/src/engine/systems/dequeueSystem.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.test.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.tsinternal-packages/run-engine/src/engine/systems/executionSnapshotSystem.test.tsinternal-packages/run-engine/src/engine/systems/executionSnapshotSystem.tsinternal-packages/run-engine/src/engine/systems/pendingVersionSystem.test.tsinternal-packages/run-engine/src/engine/systems/pendingVersionSystem.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.test.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.tsinternal-packages/run-engine/src/engine/systems/systems.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/systems/waitpointSystem.test.tsinternal-packages/run-engine/src/engine/systems/waitpointSystem.tsinternal-packages/run-engine/src/engine/tests/blockEdgeResidency.test.tsinternal-packages/run-engine/src/engine/tests/checkpointSystem.controlPlaneResolver.test.tsinternal-packages/run-engine/src/engine/tests/checkpointSystemStore.test.tsinternal-packages/run-engine/src/engine/tests/completeWaitpointCrossSeamGuard.test.tsinternal-packages/run-engine/src/engine/tests/completeWaitpointReadResidency.test.tsinternal-packages/run-engine/src/engine/tests/controlPlaneResolverInjectability.test.tsinternal-packages/run-engine/src/engine/tests/datetimeWaitpointColocation.test.tsinternal-packages/run-engine/src/engine/tests/debounce.test.tsinternal-packages/run-engine/src/engine/tests/delayedRunSystem.controlPlaneResolver.test.tsinternal-packages/run-engine/src/engine/tests/dequeueSystem.controlPlaneResolver.test.tsinternal-packages/run-engine/src/engine/tests/dequeueSystem.recovery.controlPlaneResolver.test.tsinternal-packages/run-engine/src/engine/tests/engineResidualInversions.controlPlaneResolver.test.tsinternal-packages/run-engine/src/engine/tests/lifecycleRouter.test.tsinternal-packages/run-engine/src/engine/tests/runAttemptSystem.controlPlaneResolver.test.tsinternal-packages/run-engine/src/engine/tests/runStoreInjectability.test.tsinternal-packages/run-engine/src/engine/tests/triggerCreateRouting.test.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/tests/waitpointPublicRouter.test.tsinternal-packages/run-engine/src/engine/tests/waitpointSystem.controlPlaneResolver.test.tsinternal-packages/run-engine/src/engine/types.tsinternal-packages/run-engine/src/index.tsinternal-packages/run-store/package.jsoninternal-packages/run-store/src/NoopRunStore.tsinternal-packages/run-store/src/PostgresRunStore.batchProbeReadClient.test.tsinternal-packages/run-store/src/PostgresRunStore.crossGenerationError.test.tsinternal-packages/run-store/src/PostgresRunStore.dedicatedRepro.test.tsinternal-packages/run-store/src/PostgresRunStore.dedicatedSelect.test.tsinternal-packages/run-store/src/PostgresRunStore.dualSchema.test.tsinternal-packages/run-store/src/PostgresRunStore.test.tsinternal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/PostgresRunStore.writeAtomicity.test.tsinternal-packages/run-store/src/batchCompletionResidency.test.tsinternal-packages/run-store/src/clientCompat.test.tsinternal-packages/run-store/src/index.tsinternal-packages/run-store/src/runOpsStore.flipWindowDuplicate.test.tsinternal-packages/run-store/src/runOpsStore.forWaitpointCompletion.test.tsinternal-packages/run-store/src/runOpsStore.idempotencyDedup.test.tsinternal-packages/run-store/src/runOpsStore.mixedResidency.test.tsinternal-packages/run-store/src/runOpsStore.readAfterWrite.test.tsinternal-packages/run-store/src/runOpsStore.snapshots.test.tsinternal-packages/run-store/src/runOpsStore.test.tsinternal-packages/run-store/src/runOpsStore.tsinternal-packages/run-store/src/runOpsStore.waitpoints.test.tsinternal-packages/run-store/src/types.ts
💤 Files with no reviewable changes (1)
- internal-packages/run-store/src/NoopRunStore.ts
✅ Files skipped from review due to trivial changes (1)
- .server-changes/run-engine-store-routing-seam.md
🚧 Files skipped from review as they are similar to previous changes (51)
- internal-packages/run-engine/src/index.ts
- internal-packages/run-store/package.json
- internal-packages/run-engine/src/engine/tests/checkpointSystem.controlPlaneResolver.test.ts
- apps/webapp/app/presenters/v3/ApiBatchResultsPresenter.server.ts
- internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
- internal-packages/run-engine/src/engine/systems/systems.ts
- internal-packages/run-engine/src/engine/errors.ts
- internal-packages/run-engine/src/engine/tests/controlPlaneResolverInjectability.test.ts
- internal-packages/run-engine/src/engine/tests/runStoreInjectability.test.ts
- internal-packages/run-store/src/PostgresRunStore.test.ts
- apps/webapp/test/presenters/ApiBatchResultsPresenter.split.test.ts
- internal-packages/run-store/src/PostgresRunStore.batchProbeReadClient.test.ts
- internal-packages/run-engine/src/engine/tests/waitpointSystem.controlPlaneResolver.test.ts
- internal-packages/run-store/src/PostgresRunStore.dedicatedSelect.test.ts
- internal-packages/run-engine/src/engine/systems/debounceSystem.ts
- internal-packages/run-engine/src/engine/tests/delayedRunSystem.controlPlaneResolver.test.ts
- internal-packages/run-engine/src/engine/tests/dequeueSystem.recovery.controlPlaneResolver.test.ts
- internal-packages/run-engine/src/engine/systems/delayedRunSystem.ts
- internal-packages/run-engine/src/engine/systems/enqueueSystem.test.ts
- internal-packages/run-engine/src/engine/types.ts
- internal-packages/run-engine/src/engine/tests/completeWaitpointReadResidency.test.ts
- internal-packages/run-engine/src/engine/tests/dequeueSystem.controlPlaneResolver.test.ts
- internal-packages/run-engine/src/engine/tests/completeWaitpointCrossSeamGuard.test.ts
- internal-packages/run-engine/src/engine/tests/waitpointPublicRouter.test.ts
- internal-packages/run-engine/src/engine/systems/checkpointSystem.ts
- internal-packages/run-engine/src/engine/systems/delayedRunSystem.test.ts
- internal-packages/run-engine/src/engine/systems/runAttemptSystem.test.ts
- internal-packages/run-engine/src/engine/systems/batchSystem.test.ts
- internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
- internal-packages/run-engine/src/engine/systems/debounceSystem.test.ts
- internal-packages/run-store/src/PostgresRunStore.dualSchema.test.ts
- internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.test.ts
- internal-packages/run-engine/src/engine/tests/debounce.test.ts
- internal-packages/run-store/src/PostgresRunStore.crossGenerationError.test.ts
- apps/webapp/test/updateMetadataStoreRoutingHetero.test.ts
- internal-packages/run-engine/src/engine/systems/batchSystem.ts
- internal-packages/run-engine/src/engine/tests/datetimeWaitpointColocation.test.ts
- internal-packages/run-engine/src/engine/tests/engineResidualInversions.controlPlaneResolver.test.ts
- internal-packages/run-engine/src/engine/systems/ttlSystem.test.ts
- internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts
- internal-packages/run-engine/src/engine/tests/triggerCreateRouting.test.ts
- internal-packages/run-engine/src/engine/tests/runAttemptSystem.controlPlaneResolver.test.ts
- internal-packages/run-engine/src/engine/tests/lifecycleRouter.test.ts
- internal-packages/run-engine/src/engine/systems/waitpointSystem.test.ts
- internal-packages/run-engine/src/engine/tests/checkpointSystemStore.test.ts
- internal-packages/run-store/src/PostgresRunStore.dedicatedRepro.test.ts
- internal-packages/run-engine/src/engine/systems/pendingVersionSystem.test.ts
- internal-packages/run-engine/src/engine/tests/blockEdgeResidency.test.ts
- internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
- internal-packages/run-engine/src/engine/index.ts
- internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
📜 Review details
⏰ Context from checks skipped due to timeout. (28)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: internal / 🧪 Unit Tests: Internal (12, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (3, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (1, 12)
- GitHub Check: packages / 🧪 Unit Tests: Packages (3, 3)
- GitHub Check: internal / 🧪 Unit Tests: Internal (9, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (11, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (6, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (5, 12)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: internal / 🧪 Unit Tests: Internal (7, 12)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: internal / 🧪 Unit Tests: Internal (2, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (10, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (4, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (8, 12)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: e2e / 🧪 CLI v3 tests (blacksmith-4vcpu-windows-2025 - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (blacksmith-4vcpu-windows-2025 - pnpm)
- GitHub Check: packages / 🧪 Unit Tests: Packages (1, 3)
- GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
- GitHub Check: typecheck / typecheck
⚠️ CI failures not shown inline (4)
GitHub Actions: 📝 CLAUDE.md Audit / 0_audit.txt: feat(run-ops): run-store routing seam + run-engine read seams
Conclusion: failure
alt@3.0.0-beta.46 -> `@trigger.dev/yalt`@3.0.0-beta.46
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.47 -> `@trigger.dev/yalt`@3.0.0-beta.47
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.48 -> `@trigger.dev/yalt`@3.0.0-beta.48
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.49 -> `@trigger.dev/yalt`@3.0.0-beta.49
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.5 -> `@trigger.dev/yalt`@3.0.0-beta.5
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.50 -> `@trigger.dev/yalt`@3.0.0-beta.50
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.51 -> `@trigger.dev/yalt`@3.0.0-beta.51
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.52 -> `@trigger.dev/yalt`@3.0.0-beta.52
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.53 -> `@trigger.dev/yalt`@3.0.0-beta.53
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.55 -> `@trigger.dev/yalt`@3.0.0-beta.55
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.6 -> `@trigger.dev/yalt`@3.0.0-beta.6
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.7 -> `@trigger.dev/yalt`@3.0.0-beta.7
* [new tag] build-alert-hotfix.rc1 -> build-alert-hotfix.rc1
* [new tag] build-alert-hotfix.rc2 -> build-alert-hotfix.rc2
* [new tag] build-arm-builds-rc.1 -> build-arm-builds-rc.1
* [new tag] build-arm-builds-rc.2 -> build-arm-builds-rc.2
* [new tag] build-arm-builds-rc.3 -> build-arm-builds-rc.3
* [new tag] build-batchid-carryover-rc.0 -> build-batchid-carryover-rc.0
* [new tag] build-batching-rc.1 -> build-batching-rc.1
* [new tag] build-batching-rc.2 -> build-batching-rc.2
* [new tag] build-billing-0.0.1 -> build-billing-0.0.1
* [new tag] build-billing-0.0.2 -> build-billing-0.0.2
* [new tag] build-billing-0.0.3 -> build-billing-0.0.3
* [new tag] build-buildinfo-rc.0 -> b...
GitHub Actions: 📝 CLAUDE.md Audit / audit: feat(run-ops): run-store routing seam + run-engine read seams
Conclusion: failure
##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
with:
anthropic_***REDACTED***
use_sticky_comment: true
allowed_bots: devin-ai-integration[bot]
claude_args: --max-turns 25
--model claude-opus-4-8
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
prompt: You are reviewing a PR to check whether any CLAUDE.md files or .claude/rules/ files need updating.
## Your task
1. Run `git diff origin/main...HEAD --name-only` to see which files changed in this PR.
2. For each changed directory, check if there's a CLAUDE.md in that directory or a parent directory.
3. Determine if any CLAUDE.md or .claude/rules/ file should be updated based on the changes. Consider:
- New files/directories that aren't covered by existing documentation
- Changed architecture or patterns that contradict current CLAUDE.md guidance
- New dependencies, services, or infrastructure that Claude should know about
- Renamed or moved files that are referenced in CLAUDE.md
- Changes to build commands, test patterns, or development workflows
## Response format
If NO updates are needed, respond with exactly:
✅ CLAUDE.md files look current for this PR.
If updates ARE needed, respond with a short list:
📝 **CLAUDE.md updates suggested:**
- `path/to/CLAUDE.md`: [what should be added/changed]
- `.claude/rules/file.md`: [what should be added/changed]
Keep suggestions specific and brief. Only flag things that would actually mislead Claude in future sessions.
Do NOT suggest updates for trivial changes (bug fixes, small refactors within existing patterns).
Do NOT suggest creating new CLAUDE.md files - only updates to existing ones.
trigger_phrase: `@claude`
label_trigger: claude
branch_prefix: claude/
use_bedrock: false
use_vertex: false
use_foundry: false
classify_inline_comments: true
use_commit_signing: false
bot_id: 41898282
bot_name: claude[bot]
track_progress: false
include_fix_links: true
display_report: false...
GitHub Actions: 🔎 REVIEW.md Drift Audit / 0_audit.txt: feat(run-ops): run-store routing seam + run-engine read seams
Conclusion: failure
ild-legacy-run-engine.fix3
* [new tag] build-manual-checkpoints.rc1 -> build-manual-checkpoints.rc1
* [new tag] build-metadata-upgrade-logging.rc1 -> build-metadata-upgrade-logging.rc1
* [new tag] build-metadata-upgrade-logging.rc2 -> build-metadata-upgrade-logging.rc2
* [new tag] build-metadata-upgrade-logging.rc3 -> build-metadata-upgrade-logging.rc3
* [new tag] build-new-build-system.rc.1 -> build-new-build-system.rc.1
* [new tag] build-otel-upgrade-rc.0 -> build-otel-upgrade-rc.0
* [new tag] build-otel-upgrade-rc.1 -> build-otel-upgrade-rc.1
* [new tag] build-pre-pull-deployments-rc.1 -> build-pre-pull-deployments-rc.1
* [new tag] build-prod-rescue-rc.1 -> build-prod-rescue-rc.1
* [new tag] build-rate-limiter-fix-rc.1 -> build-rate-limiter-fix-rc.1
* [new tag] build-re2.rc0 -> build-re2.rc0
* [new tag] build-realtime-v2-stream-fix -> build-realtime-v2-stream-fix
* [new tag] build-realtime-v2-stream-fix-2 -> build-realtime-v2-stream-fix-2
* [new tag] build-realtime-v2-stream-fix-3 -> build-realtime-v2-stream-fix-3
* [new tag] build-realtime-v2-stream-fix-4 -> build-realtime-v2-stream-fix-4
* [new tag] build-realtime-v2-stream-fix-5 -> build-realtime-v2-stream-fix-5
* [new tag] build-realtimestreams-dedupe -> build-realtimestreams-dedupe
* [new tag] build-registry-maintenance-rc.1 -> build-registry-maintenance-rc.1
* [new tag] build-registry-maintenance-rc.2 -> build-registry-maintenance-rc.2
* [new tag] build-remote-ecr-rc.0 -> build-remote-ecr-rc.0
* [new tag] build-reschedule-hotfix.rc1 -> build-reschedule-hotfix.rc1
* [new tag] build-resume-fixes.rc1 -> build-resume-fixes.rc1
* [new tag] build-resume-fix...
GitHub Actions: 🔎 REVIEW.md Drift Audit / audit: feat(run-ops): run-store routing seam + run-engine read seams
Conclusion: failure
##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
with:
anthropic_***REDACTED***
use_sticky_comment: true
allowed_bots: devin-ai-integration[bot]
claude_args: --max-turns 30
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
prompt: You are auditing this PR for drift against `.claude/REVIEW.md`.
## Context
`.claude/REVIEW.md` is the repo's source of truth for what AI / agent code reviewers should treat as critical findings (rolling-deploy safety, hot-table indexes, recovery-path queries, testcontainers usage, Lua versioning, etc.). It is consumed by review agents to calibrate severity. If REVIEW.md goes stale, every future agent review degrades.
## Strategy — read this first
You have a hard turn budget. Spend it on signal, not coverage. The audit is allowed to miss things; it is NOT allowed to time out.
1. Read `.claude/REVIEW.md` once, in full.
2. Run `git diff origin/main...HEAD --name-only` to get the list of changed files. Do NOT read the diff content yet.
3. Scan the file-list for relevance to REVIEW.md scope. Relevance signals: changes to Prisma schema, Redis / queue / Lua code, hot tables, recovery / restart loops, new packages, deletions of paths REVIEW.md cites. Skim everything else.
4. Open at most **5 files** total — only the ones most likely to surface a real signal. If nothing in the file-list looks relevant to any REVIEW.md rule, do NOT read any files; go straight to the verdict.
5. Form a verdict and stop. Do not exhaust the turn budget exploring.
Large PRs (>50 files changed) are a strong signal to be MORE selective, not more thorough. Pick 3-5 files at most.
## What to look for
- **Stale references** — does any REVIEW.md rule cite a file, directory, function, table, Prisma model, or package name that has been removed or renamed in this PR (or is already gone from `main`)?
- **Contradictions** — does code in this PR clearly violate a current REVIEW.md rule? (Don't re-review the PR. Only flag if REVIE...
🧰 Additional context used
📓 Path-based instructions (10)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
Files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
**/*.{test,spec}.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use vitest for all tests in the Trigger.dev repository
Files:
internal-packages/run-engine/src/engine/tests/ttl.test.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
internal-packages/run-engine/src/engine/tests/**/*.test.ts
📄 CodeRabbit inference engine (internal-packages/run-engine/CLAUDE.md)
Implement tests for RunEngine in
src/engine/tests/using testcontainers for Redis and PostgreSQL containerization
Files:
internal-packages/run-engine/src/engine/tests/ttl.test.ts
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}: Usepnpm run typecheckfor changes in apps (apps/*) and internal packages (internal-packages/*), and never usebuildto verify those changes.
Use Vitest for tests, and never mock anything; use testcontainers instead.
Prefer static imports over dynamicimport(), and only use dynamic imports for unresolved circular dependencies, genuine code-splitting needs, or conditional runtime loading.
Files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
**/*.{test,spec}.{ts,tsx,js,jsx,mts,cts,mjs,cjs}
📄 CodeRabbit inference engine (CLAUDE.md)
Place test files next to the source files they cover (for example,
MyService.ts->MyService.test.ts).
Files:
internal-packages/run-engine/src/engine/tests/ttl.test.ts
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs,md,mdx}
📄 CodeRabbit inference engine (CLAUDE.md)
Always import from
@trigger.dev/sdkwhen writing Trigger.dev tasks; never use@trigger.dev/sdk/v3or deprecatedclient.defineJob.
Files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
**/*.{test,spec}.{js,jsx,ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{test,spec}.{js,jsx,ts,tsx}: Keep test files beside the files under test and write tests with descriptivedescribeanditblocks.
Avoid mocks or stubs in tests, and use helpers from@internal/testcontainerswhen Redis or Postgres are needed.
Files:
internal-packages/run-engine/src/engine/tests/ttl.test.ts
internal-packages/run-engine/src/engine/systems/**/*.ts
📄 CodeRabbit inference engine (internal-packages/run-engine/CLAUDE.md)
Integrate OpenTelemetry tracer and meter instrumentation in RunEngine systems for observability
Files:
internal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
🧠 Learnings (11)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.
Applied to files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.
Applied to files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
📚 Learning: 2026-06-13T19:53:13.759Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3937
File: packages/trigger-sdk/skills/realtime-and-frontend/SKILL.md:258-260
Timestamp: 2026-06-13T19:53:13.759Z
Learning: When reviewing code that uses `trigger.dev/react-hooks`’s `useRealtimeRun`, preserve the call signature where the first argument is the full realtime handle object (not `handle.id`). This is intentional to maintain type-safety and is consistent with the official docs; do not suggest changing the first argument from the handle object to `handle.id`.
Applied to files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
📚 Learning: 2026-06-17T17:13:49.929Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3948
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.bulk-actions.$bulkActionParam/route.tsx:48-62
Timestamp: 2026-06-17T17:13:49.929Z
Learning: In triggerdotdev/trigger.dev, within `dashboardLoader`/`dashboardAction` (or similar context resolver code) whenever you resolve an organization ID from an organization slug for RBAC/enterprise authorization scope, always read from the primary Prisma client (`prisma`), not `$replica`. Using `$replica` can hit replica-lag and cause the RBAC lookup/authorization to run without the correct org scope (bypassing intended role enforcement). Implement the slug→org lookup with `prisma.organization.findFirst(...)` (or equivalent primary-client query) and add an inline comment documenting why the primary client is required (replica lag could lead to unscoped RBAC checks).
Applied to files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
📚 Learning: 2026-06-23T13:04:21.413Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4023
File: apps/webapp/app/services/upsertBranch.server.ts:14-18
Timestamp: 2026-06-23T13:04:21.413Z
Learning: In TypeScript, it’s valid to `import { type X }` and then use `typeof X` in a type-only position, e.g. `type Alias = z.infer<typeof X>`. The `type` modifier suppresses the runtime import, but the type checker still has the full exported type so `z.infer<typeof X>` can resolve correctly. In code reviews, don’t flag this as a TypeScript compile error as long as `typeof X` is used in a type context (e.g., with `z.infer`, `type` aliases, generics), not as a runtime value.
Applied to files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
📚 Learning: 2026-05-18T14:40:02.173Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3658
File: packages/core/src/v3/realtimeStreams/manager.test.ts:1-147
Timestamp: 2026-05-18T14:40:02.173Z
Learning: In the triggerdotdev/trigger.dev repo, the policy “Never mock anything — use testcontainers instead” should only be enforced for integration tests that interact with real external services (e.g., Redis, Postgres) via actual infrastructure. For unit tests that exercise pure in-memory logic (e.g., cache semantics) it is OK to stub collaborators such as `ApiClient` using Vitest (`vi.fn()`) to assert call counts or control behavior. Do not flag `vi.fn()`-based `ApiClient` stubs in unit tests as violations of the testcontainers policy.
Applied to files:
internal-packages/run-engine/src/engine/tests/ttl.test.ts
📚 Learning: 2026-06-04T18:16:35.386Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3836
File: apps/supervisor/src/backpressure/backpressureMonitor.ts:3-5
Timestamp: 2026-06-04T18:16:35.386Z
Learning: When reviewing TypeScript in this repo, apply the rule “prefer type aliases over interfaces” only to data/object shapes and union/intersection type modeling. If an interface is being used as a behavioral contract for collaborators to implement (e.g., method-shape interfaces that define required behavior, such as `BackpressureLogger` / `BackpressureSignalSource` in `apps/supervisor/src/backpressure/backpressureMonitor.ts`), keep it as an `interface` and do not flag it as a type-alias-vs-interface violation.
Applied to files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
📚 Learning: 2026-06-09T17:58:04.699Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3879
File: apps/webapp/app/models/vercelIntegration.server.ts:619-630
Timestamp: 2026-06-09T17:58:04.699Z
Learning: In this codebase, outbound raw `fetch` calls should typically rely on Node/undici’s default request timeout (about ~300s) rather than adding a per-call `AbortController` + `setTimeout` wrapper inside individual functions (e.g. in files like `apps/webapp/app/models/vercelIntegration.server.ts`). During code review, do not flag the absence of a per-call timeout on a single `fetch` as an issue; if per-call timeouts are needed, they should be implemented via a codebase-wide convention (e.g., a shared fetch wrapper or documented pattern) rather than ad-hoc per-function changes.
Applied to files:
internal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/controlPlaneResolver.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
📚 Learning: 2026-06-16T09:19:47.637Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3960
File: apps/webapp/test/prismaInfrastructureErrorCapture.test.ts:0-0
Timestamp: 2026-06-16T09:19:47.637Z
Learning: In this repo’s Vitest setup, `vitest.config.ts` uses `globals: true`, so identifiers like `vi`, `describe`, `it`, and `expect` are available as globals in Vitest test files. During code review, do not flag missing `vi`/`describe`/`it`/`expect` imports as a runtime error or correctness issue when they’re used in `*.test.ts/tsx` or `*.spec.ts/tsx` files. Explicit imports are still preferred for consistency, but they’re not required for runtime behavior.
Applied to files:
internal-packages/run-engine/src/engine/tests/ttl.test.ts
🔇 Additional comments (6)
internal-packages/run-engine/src/engine/controlPlaneResolver.ts (2)
34-186: LGTM!
188-341: 🩺 Stability & AvailabilityDrop the env-scoping concern.
BackgroundWorker.idis globally unique, so theid-only worker lookups don’t cross environments; the prior dequeue flow used the same pattern. TheorderByon the unique-id lookup is redundant.> Likely an incorrect or invalid review comment.internal-packages/run-engine/src/engine/systems/ttlSystem.ts (1)
28-28: LGTM!Also applies to: 84-84, 104-106, 119-121
internal-packages/run-engine/src/engine/tests/ttl.test.ts (1)
8-11: LGTM!Also applies to: 1614-1733
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (2)
208-226: LGTM!Also applies to: 237-240, 272-272, 285-289, 317-317, 527-537, 549-552, 604-613, 835-835, 1077-1095, 1442-1454, 1638-1638, 1652-1707, 1222-1222
393-494: 🩺 Stability & AvailabilityTransaction wiring looks correct; verify
runInTransactionsemantics for the in-txworker.ack.The atomic bump + snapshot via
runStore.runInTransactionwrapped intryCatchis sound. Note theworker.ack(\expireRun:...`)(Line 478) runs inside the DB transaction callback; being a Redis side-effect it won't roll back if the transaction later fails. This mirrors prior behavior so it isn't a regression, but confirm the inner writes (store.startAttempt,createExecutionSnapshot(..., store)) are bound to the tx-scopedstore/tx` and not the router, as the comment asserts.Also, the error log at Line 486 still reads
"prisma.$transaction error"though this now routes throughrunInTransaction.
| const env = await this.$.controlPlaneResolver.resolveEnv(run.runtimeEnvironmentId); | ||
|
|
||
| if (!env) { | ||
| throw new ServiceValidationError("Task run environment not found", 404); | ||
| } | ||
|
|
||
| const newSnapshot = await getLatestExecutionSnapshot(prisma, runId, this.$.runStore); | ||
|
|
||
| await this.$.runQueue.acknowledgeMessage(env.organizationId, runId); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect resolver contract: can resolveEnv/resolveAuthenticatedEnv legitimately return null for a live run's env?
fd controlPlaneResolver.ts internal-packages/run-engine/src/engine --exec cat -nRepository: triggerdotdev/trigger.dev
Length of output: 13091
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the surrounding control-flow in runAttemptSystem and all env resolver call sites.
file="internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts"
echo "== call sites =="
rg -n "resolveEnv\\(|resolveAuthenticatedEnv\\(" "$file"
echo
echo "== line 470-560 =="
sed -n '470,560p' "$file"
echo
echo "== line 760-820 =="
sed -n '760,820p' "$file"
echo
echo "== line 1000-1065 =="
sed -n '1000,1065p' "$file"
echo
echo "== line 1410-1475 =="
sed -n '1410,1475p' "$file"
echo
echo "== line 1620-1685 =="
sed -n '1620,1685p' "$file"Repository: triggerdotdev/trigger.dev
Length of output: 13069
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect runQueue acknowledgement and side-effect helpers referenced by the terminal paths.
root="internal-packages/run-engine/src/engine"
rg -n "acknowledgeMessage\\(|runSucceeded\\(|`#finalizeRun`\\(|completeAttemptSuccess\\(|waitpoint|event emission|emit" "$root" -g '*.ts'
echo
echo "== queue implementation matches =="
fd -a "runQueue" "$root" -t f
echo
echo "== likely queue files =="
rg -n "acknowledgeMessage\\(" "$root" -g '*.ts'Repository: triggerdotdev/trigger.dev
Length of output: 50382
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look for any explicit guarantees/comments around run/env presence and transaction ordering.
file="internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts"
rg -n "finished-but-unacknowledged|unacknowledged|Task run environment not found|assertEnvExists|resolveAuthenticatedEnv|completeAttemptSuccess|completeAttemptFailure|finalizeRun|acknowledgeMessage" "$file"
echo
echo "== repo-wide mentions of the error string =="
rg -n "Task run environment not found" .Repository: triggerdotdev/trigger.dev
Length of output: 3295
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the control-plane resolver tests for null-env scenarios and the terminal-path behavior.
testfile="internal-packages/run-engine/src/engine/tests/runAttemptSystem.controlPlaneResolver.test.ts"
systemfile="internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts"
echo "== controlPlaneResolver test snippets =="
sed -n '330,390p' "$testfile"
echo
echo "== completeAttemptSuccess terminal path =="
sed -n '730,860p' "$systemfile"
echo
echo "== cancel/fail terminal path details =="
sed -n '1430,1735p' "$systemfile"Repository: triggerdotdev/trigger.dev
Length of output: 16415
Resolve env before the terminal write. resolveEnv/resolveAuthenticatedEnv now runs after the run is already marked finished in these paths. If it returns null, the 404 skips queue acking, waitpoint completion, event emission, and #finalizeRun, which can leave a finished run unacknowledged. Move the lookup earlier where possible, or keep the cleanup running even when env lookup fails.
…ing (#4117) ## What Wires the run-ops split into the webapp: database topology, environment flags, split-mode gating, and the control-plane resolver/cache layer that the run-store and run-engine seams from the previous PR plug into. - **DB topology & env** (`apps/webapp/app/db.server.ts`, `env.server.ts`, `entry.server.tsx`): adds the run-ops database clients/topology and the environment variables that configure and gate the split. - **runOpsMigration module** (new `apps/webapp/app/v3/runOpsMigration/`): the webapp-side machinery — `splitMode.server.ts`, `controlPlaneResolver.server.ts` + `controlPlaneCache.server.ts`, `readThrough.server.ts`, `crossSeamGuard.server.ts`, `distinctDbSentinel.server.ts`, id-minting helpers (`mintBatchFriendlyId`, `runOpsMintKind`, `resolveInheritedMintKind`), `runOpsCascadeCleanup.server.ts`, the split read gate, and route/unblock catalogs. - **Store/engine wiring** (`app/v3/runStore.server.ts`, `runEngine.server.ts`, `runEngineHandlers.server.ts` + new `runEngineHandlersShared.server.ts`): points the webapp's store/engine construction at the resolver, and factors shared handler logic out so both seams use one path. - **Read-path touch-ups**: `runtimeEnvironment.server.ts`, `eventRepository/index.server.ts`, `taskRunHeartbeatFailed.server.ts`, `engineVersion.server.ts` route their run/environment lookups read-through the resolver. - `413a94511` — interlocks split mode against the native realtime backend so the two aren't enabled in an incompatible combination (see `.server-changes/run-ops-split-realtime-interlock.md`). - `dc74c57fd` — drops the earlier "known-migrated" read layer; residency is determined by id-shape only. ## Why PR5 of the run-ops split stack. This is the webapp foundation layer: it stands up the DB topology, flags, and resolver/cache the rest of the stack depends on, and repoints webapp read paths through the resolver. Additive when the split is not enabled (existing single-DB behavior preserved behind flags); behavior-changing on the read-through paths and the realtime interlock. ## Tests New vitest coverage across `apps/webapp/test/` and colocated `*.server.test.ts` files: db topology, split mode, split read gate, cross-seam guard, mint cutover / flip latency, control-plane cache, control-plane resolver, distinct-db sentinel, read-through loaders (route loaders, run-detail loaders, `findEnvironmentFromRun`), and the run-engine handlers. Testcontainers-backed; no mocks. `pnpm-lock.yaml` synced for the two new webapp deps. ## Notes Draft, **stacked on #4116** (`runops/pr04-store-engine`). Review that first; this diff is against it. Server-change / changeset note to be added at stack-assembly time. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Introduces the run-store routing seam and the run-engine read seams that let run lifecycle operations be dispatched to either the control-plane database or a separately-generated run-ops database, depending on where a run/batch resides.
internal-packages/run-store): addsrunOpsStore.tsand substantially expandsPostgresRunStore.tsso the store can resolve residency and route reads/writes to the correct backing client.types.tsgrows the routing/residency types;NoopRunStore.tsis removed.internal-packages/run-engine): addsengine/controlPlaneResolver.tsand routes the per-system read paths (dequeue, enqueue, waitpoint, checkpoint, run-attempt, ttl, delayed-run, execution-snapshot, pending-version, debounce, batch) through the resolver/store instead of talking to a single Prisma client directly.engine/errors.ts,engine/types.ts, andengine/index.tsare extended to support injecting the store/resolver.Three fixes are included on top of the seam work:
c6cadd85f— routes read-your-writes to the owning store's writer, not its lagging replica, so an operation immediately reading back what it just wrote sees a consistent result.05c912e05— normalizes run-ops-generation Prisma errors to the control-plane error class at the store write boundary, soinstanceofchecks and theP2002→ 422 handling continue to work across the separately-generated run-ops Prisma client.88d12907f— resolves NEW-resident batches inApiBatchResultsPresenterby routing the batch read through the store, so a dedicated-DB batch resolves instead of returning 404.The change is heavily test-first: the bulk of the diff is new unit/integration coverage for the store routing, residency, and each run-engine system's control-plane resolver path.
Why
PR4 of the run-ops split stack (PR1–PR3 land the ClickHouse test-container and earlier plumbing). This PR is the read-path foundation: it adds the seam and read-routing but leaves the write path to route through the same seam in a later PR. Behavior-changing where the three fixes above touch existing read-your-writes / error-normalization / batch-resolution paths; otherwise additive (new store module, new resolver, injectable dependencies with existing single-client behavior preserved when no dedicated store is configured).
Tests
Extensive new vitest coverage under
run-store/src/*.test.ts(routing, residency, dual-schema select, cross-generation error normalization, read-after-write, idempotency dedup, mixed residency, waitpoint co-location) andrun-engine/src/engine/**/*.test.ts(per-systemcontrolPlaneResolvertests, injectability, block-edge residency, waitpoint read residency, trigger-create routing, lifecycle router). Testcontainers-backed; no mocks.Notes
Draft, stacked on #4114 (
runops/pr03-clickhouse-tc). Review that first; this diff is against it.Server-change / changeset note to be added at stack-assembly time.
🤖 Generated with Claude Code