feat(run-ops): webapp db topology, flags, and split-mode resolver wiring#4117
feat(run-ops): webapp db topology, flags, and split-mode resolver wiring#4117d-cs wants to merge 14 commits into
Conversation
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (10)
apps/webapp/test/v3/runOpsMigration/runEngineControlPlaneResolver.server.test.ts (1)
121-200: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winMissing test coverage for
resolveAuthenticatedEnv.This suite covers
resolveEnv,resolveWorkerVersion, andassertEnvExists, but notresolveAuthenticatedEnv— the one method in the adapter with custom (uncached) query logic. Adding a test asserting it resolves the authenticated shape (includinggit) and returnsnullfor a missing env would close the gap and would likely have caught the cache-bypass behavior flagged inrunEngineControlPlaneResolver.server.ts.apps/webapp/app/v3/taskRunHeartbeatFailed.server.ts (1)
56-64: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low valueOptional: defer
lockedWorkerresolution to where it's used.
envis needed for the early-return guard on every path, butlockedWorkeris only consumed in the completed-status branch (line 158). Resolving it here adds a resolver call (a DB read under split-OFF) forPENDING/DEQUEUED/EXECUTING/etc. that never use it. Moving theresolveRunLockedWorkercall into that branch avoids the unnecessary lookup.apps/webapp/app/v3/runOpsMigration/runOpsCascadeCleanup.server.ts (2)
118-275: 🩺 Stability & Availability | 🔵 TrivialNo logging/audit trail for a destructive cross-DB cleanup.
RunOpsCascadeCleanupServicedeletes rows across multiple writers with nologgercalls anywhere (compare to other.server.tsfiles in this codebase that log around risky operations). Given deletes are non-transactional and rely on "recovered by re-running" on crash, structured logging of the per-table counts returned bycleanupEnvironment/cleanupProject(or at least on entry/exit) would materially help operators diagnose partial-cleanup states and stray rows after a crash.
1-275: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winNo colocated test coverage for this destructive service.
This file isn't accompanied by a test in this cohort. Given it performs unconditional
deleteManycascades across control-plane and run-ops writers (and is deliberately not gated behindisSplitEnabled()), unit/integration coverage (e.g., verifying ordering doesn't throw, per-table counts, dedup-by-reference in single-DB mode) seems warranted before this ships. Want me to draft a Testcontainers-based test using the existing@internal/testcontainershelpers?apps/webapp/app/v3/runOpsMigration/runOpsMintKind.flipLatency.test.ts (1)
13-25: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueTest reimplements the cache wiring instead of exercising
resolveRunIdMintKinddirectly.
makeCachedFlagduplicatesresolveRunIdMintKind's internalmintCache.get/setlogic rather than importing and calling the real function. If the production caching wiring (key derivation, TTL source, override-forwarding) changes, this suite would keep passing against its own copy without catching the drift. Given the comment explicitly frames this as a documented "current-behavior lock," this is acceptable as-is, but consider testingresolveRunIdMintKinditself (with env vars and$replicamocked) if higher-fidelity coverage is later desired.apps/webapp/app/v3/runOpsMigration/types.ts (1)
11-25: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueUse
typeinstead ofinterfacefor these data shapes.
CrossSeamGuardInputandCrossSeamGuardDecisionare plain data shapes, not behavioral contracts implemented by collaborators, so per project convention they should be type aliases.♻️ Suggested fix
-export interface CrossSeamGuardInput { +export type CrossSeamGuardInput = { waitpointId: string; routeKind: UnblockRouteKind; treeOwnerResidency?: RunOpsResidency; isCrossTreeIdempotency?: boolean; hasLegacyParent?: boolean; -} +}; -export interface CrossSeamGuardDecision { +export type CrossSeamGuardDecision = { store: StoreTarget; /** Always the waitpoint's OWN classification, even when pinned to legacy. */ residency: RunOpsResidency; routeKind: UnblockRouteKind; pinnedReason?: "non-tree-owned" | "cross-tree-idempotency" | "legacy-parent-descendant"; -} +};As per coding guidelines: "Use types over interfaces for TypeScript".
Source: Coding guidelines
apps/webapp/app/v3/runOpsMigration/unblockRouteCatalog.ts (1)
10-17: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueUse
typeinstead ofinterfaceforUnblockRoute.
UnblockRouteis a plain data shape (route metadata record), not a behavioral contract.♻️ Suggested fix
-export interface UnblockRoute { +export type UnblockRoute = { id: string; kind: UnblockRouteKind; /** The relative source path, e.g. "internal-packages/run-engine/src/engine/index.ts". */ site: string; /** Enclosing method/symbol name — NEVER a line number. */ symbol: string; -} +};As per coding guidelines: "Use types over interfaces for TypeScript".
Source: Coding guidelines
apps/webapp/app/v3/runOpsMigration/crossSeamGuard.server.ts (1)
11-17: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value
KNOWN_ROUTE_KINDSduplicates theUnblockRouteKindunion.This set must be manually kept in sync with
UnblockRouteKindintypes.ts; adding a new kind there without updating this set silently makesassertKnownRouteKindreject a valid kind (or vice versa if forgotten here). Consider deriving this from the type via asatisfies-checked array, or centralizing the literal list intypes.tsand building both the union and the set from it.apps/webapp/app/v3/runOpsMigration/readThrough.server.test.ts (1)
41-153: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd coverage for the unclassifiable-id → LEGACY fallback.
No test exercises the
UnclassifiableRunIdcatch branch inreadThrough.server.ts(defaults to"LEGACY"+logger.warn), which is a materially different behavior fromcrossSeamGuard.server.ts's loud rethrow for the same error type. A regression here would go undetected.✅ Suggested additional test
heteroPostgresTest( "ambiguous-length id falls back to LEGACY residency (new-then-legacy probe)", async ({ prisma14, prisma17 }) => { const AMBIGUOUS_RUN_ID = "run_" + "a".repeat(10); // neither cuid nor ksuid length const warn = vi.fn(); const result = await readThroughRun({ runId: AMBIGUOUS_RUN_ID, environmentId: "env_1", readNew: (c) => realRead(c, false), readLegacy: (c) => realRead(c, true), deps: { splitEnabled: true, newClient: prisma17 as unknown as PrismaReplicaClient, legacyReplica: prisma14 as unknown as PrismaReplicaClient, logger: { warn }, }, }); expect(result.source).toBe("legacy-replica"); expect(warn).toHaveBeenCalled(); } );apps/webapp/app/v3/runEngineHandlers.server.ts (1)
211-211: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse a named sentinel constant instead of scattered
""literals.
readRunForEventOrThrow/readRunForEventare called with a bare""placeholder forenvironmentIdin three places (runAttemptFailed, and twice in cachedRunCompleted). As per coding guidelines: "Use named constants for sentinel/placeholder values (for example,const UNSET_VALUE = "__unset__") instead of scattering raw string literals across comparisons."♻️ Suggested fix
// near eventReadDeps definition const NO_ENVIRONMENT_ID = ""; // residency is keyed on runId; environmentId is informational onlyThen replace the three
""call-site literals withNO_ENVIRONMENT_ID.Also applies to: 299-299, 330-330
Source: Coding guidelines
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 548c81c5-37b0-432b-b866-589a9d484ed6
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (51)
.server-changes/run-ops-split-realtime-interlock.mdapps/webapp/CLAUDE.mdapps/webapp/app/db.server.tsapps/webapp/app/entry.server.tsxapps/webapp/app/env.server.tsapps/webapp/app/models/runtimeEnvironment.server.tsapps/webapp/app/v3/engineVersion.server.tsapps/webapp/app/v3/eventRepository/index.server.tsapps/webapp/app/v3/featureFlags.tsapps/webapp/app/v3/runEngine.server.tsapps/webapp/app/v3/runEngineHandlers.server.tsapps/webapp/app/v3/runEngineHandlersShared.server.tsapps/webapp/app/v3/runOpsMigration/controlPlaneCache.server.test.tsapps/webapp/app/v3/runOpsMigration/controlPlaneCache.server.tsapps/webapp/app/v3/runOpsMigration/controlPlaneResolver.server.tsapps/webapp/app/v3/runOpsMigration/crossSeamGuard.server.tsapps/webapp/app/v3/runOpsMigration/distinctDbSentinel.server.tsapps/webapp/app/v3/runOpsMigration/mintBatchFriendlyId.server.test.tsapps/webapp/app/v3/runOpsMigration/mintBatchFriendlyId.server.tsapps/webapp/app/v3/runOpsMigration/readThrough.server.test.tsapps/webapp/app/v3/runOpsMigration/readThrough.server.tsapps/webapp/app/v3/runOpsMigration/resolveInheritedMintKind.server.test.tsapps/webapp/app/v3/runOpsMigration/resolveInheritedMintKind.server.tsapps/webapp/app/v3/runOpsMigration/runEngineControlPlaneResolver.server.tsapps/webapp/app/v3/runOpsMigration/runOpsCascadeCleanup.server.tsapps/webapp/app/v3/runOpsMigration/runOpsMintKind.flipLatency.test.tsapps/webapp/app/v3/runOpsMigration/runOpsMintKind.server.test.tsapps/webapp/app/v3/runOpsMigration/runOpsMintKind.server.tsapps/webapp/app/v3/runOpsMigration/runOpsSplitReadGate.tsapps/webapp/app/v3/runOpsMigration/splitMode.server.tsapps/webapp/app/v3/runOpsMigration/types.tsapps/webapp/app/v3/runOpsMigration/unblockRouteCatalog.tsapps/webapp/app/v3/runStore.server.test.tsapps/webapp/app/v3/runStore.server.tsapps/webapp/app/v3/taskRunHeartbeatFailed.server.tsapps/webapp/package.jsonapps/webapp/test/findEnvironmentFromRun.readthrough.test.tsapps/webapp/test/routeLoaders.controlPlane.readthrough.test.tsapps/webapp/test/runDetailLoaders.controlPlane.readthrough.test.tsapps/webapp/test/runEngineHandlers.test.tsapps/webapp/test/runOpsCrossSeamGuard.test.tsapps/webapp/test/runOpsDbTopology.test.tsapps/webapp/test/runOpsMintCutover.test.tsapps/webapp/test/runOpsSplitMode.test.tsapps/webapp/test/runOpsSplitReadGate.test.tsapps/webapp/test/services.controlPlane.readthrough.test.tsapps/webapp/test/v3/runOpsMigration/controlPlaneRepoint.server.test.tsapps/webapp/test/v3/runOpsMigration/controlPlaneResolver.server.test.tsapps/webapp/test/v3/runOpsMigration/distinctDbSentinel.server.test.tsapps/webapp/test/v3/runOpsMigration/runEngineControlPlaneResolver.server.test.tsapps/webapp/vitest.config.ts
88d1290 to
d5610a9
Compare
413a945 to
99643f8
Compare
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
460477f to
1af2bab
Compare
cdc4eb9 to
e0b35d5
Compare
f3248e0 to
fc39e05
Compare
8024e36 to
f9b9b0b
Compare
fc39e05 to
49667ee
Compare
f9b9b0b to
0937b15
Compare
49667ee to
f2c2b2c
Compare
0937b15 to
729daf1
Compare
f2c2b2c to
4ba7198
Compare
729daf1 to
bd6fc79
Compare
4ba7198 to
0d2f934
Compare
bd6fc79 to
a7e0846
Compare
…ngine wiring Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ape only Migration/drain is deferred, so residency is decided purely by id-shape (ownerEngine): 25-char cuid -> LEGACY, 27-char ksuid -> NEW, unclassifiable -> LEGACY. This is behavior-preserving in production, which never injected a custom isKnownMigrated and, with no migration, always saw the default false. - delete knownMigratedFilter.server.ts + its test - readThrough: drop the isKnownMigrated dep + migrated short-circuit; KEEP the unclassifiable->LEGACY new-then-legacy fallback - resolveInheritedMintKind: collapse to pure ownerEngine id-shape (no deps) - mintBatchFriendlyId: drop isKnownMigrated/isSplitEnabled from ResolveDeps - runEngineHandlersShared: drop isKnownMigrated from EventReadDeps/readRunForEvent (batch-write residency probe via newReplica.batchTaskRun.findFirst is untouched) - tests: delete injected-marker cases, keep pure id-shape assertions Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eration labels Add a pure unit test for ControlPlaneCache covering per-slot round-trips, null-vs-miss distinction, epoch-based invalidation, per-slot key isolation, bounded eviction, and TTL expiry. Add a testcontainer test for probeDistinctDatabases covering distinct clusters, same physical database (with reason), same-cluster-different-database, and fail-closed probe failure. Strip developer-enumeration labels from three existing test files (readThrough step numbers, runEngineHandlers Test-X comments) and rename the run-detail loader read-through test to drop the non-domain "shape 1" name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… deps apps/webapp/package.json declares @internal/run-ops-database (workspace) and @testcontainers/postgresql but the lockfile importer entry was never regenerated, so pnpm install --frozen-lockfile fails for the webapp. Regenerate the importer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Enabling RUN_OPS_SPLIT_ENABLED without REALTIME_BACKEND_NATIVE_ENABLED silently breaks realtime: Electric replicates only from the control-plane DB, so NEW-resident (ksuid) runs on the dedicated run-ops DB are invisible and every realtime subscription hangs. Add a boot-time interlock that refuses split mode in that misconfiguration, mirroring the existing distinct-DB data-loss sentinel. The check is a pure predicate (assertSplitRealtimeInterlock) run synchronously inside assertRunOpsSplitSentinel on the same eager-boot path, failing fast before the async DB probe and before any run-ops routing is wired. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n diagnostics - gate runOpsTopology splitEnabled on RUN_OPS_SPLIT_ENABLED so provisioning both DSNs before flipping the flag cannot open a second pool or route writes ahead of the distinct-DB sentinel - rethrow the original UnclassifiableRunId in the cross-seam guard so its value/valueLength keep reflecting the real waitpoint id - log run-found-but-environment-unresolved distinctly from missing-run - correct the RUN_OPS_DATABASE_URL doc comment (Prisma datasource, not the webapp runtime pool) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…by worker-mock fix)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uth-env through the cache-first resolver The ControlPlaneCache served env/org data with no invalidation, so admin/control-plane writes were only reflected after the TTL. Add two invalidation scopes to the cache (invalidateEnvironment for one env's slots; invalidateOrganization via a per-org epoch that env/authEnv values are stamped with, so all of an org's cached rows drop with no reverse index), expose them on the resolver, and call them at every write site that mutates cache-served data: pause/resume, archive, env/org concurrency + burst-factor, API-key regeneration, feature flags, API/batch rate limits, runs enable/disable, org + project delete, and stream-basin provisioning. Also extend the resolver's authenticated-env slot to carry `git` and make the run-engine adapter's resolveAuthenticatedEnv delegate to the cache-first, split-aware resolver instead of issuing its own $replica.findFirst, so it honors splitEnabled() and the cache like its siblings while still returning `git` and the deleted-project guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0d2f934 to
8b126c6
Compare
a7e0846 to
4119616
Compare
… OFF With the split OFF there is a single DB, so a run and its environment are co-located and there is no cross-seam FK/check to replace (matches main). Skip the always-on hot-path read in that branch; the split-ON branch is unchanged (cache-first, throws on a genuinely missing env). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Wires the run-ops split into the webapp: database topology, environment flags, split-mode gating, and the control-plane resolver/cache layer that the run-store and run-engine seams from the previous PR plug into.
apps/webapp/app/db.server.ts,env.server.ts,entry.server.tsx): adds the run-ops database clients/topology and the environment variables that configure and gate the split.apps/webapp/app/v3/runOpsMigration/): the webapp-side machinery —splitMode.server.ts,controlPlaneResolver.server.ts+controlPlaneCache.server.ts,readThrough.server.ts,crossSeamGuard.server.ts,distinctDbSentinel.server.ts, id-minting helpers (mintBatchFriendlyId,runOpsMintKind,resolveInheritedMintKind),runOpsCascadeCleanup.server.ts, the split read gate, and route/unblock catalogs.app/v3/runStore.server.ts,runEngine.server.ts,runEngineHandlers.server.ts+ newrunEngineHandlersShared.server.ts): points the webapp's store/engine construction at the resolver, and factors shared handler logic out so both seams use one path.runtimeEnvironment.server.ts,eventRepository/index.server.ts,taskRunHeartbeatFailed.server.ts,engineVersion.server.tsroute their run/environment lookups read-through the resolver.413a94511— interlocks split mode against the native realtime backend so the two aren't enabled in an incompatible combination (see.server-changes/run-ops-split-realtime-interlock.md).dc74c57fd— drops the earlier "known-migrated" read layer; residency is determined by id-shape only.Why
PR5 of the run-ops split stack. This is the webapp foundation layer: it stands up the DB topology, flags, and resolver/cache the rest of the stack depends on, and repoints webapp read paths through the resolver. Additive when the split is not enabled (existing single-DB behavior preserved behind flags); behavior-changing on the read-through paths and the realtime interlock.
Tests
New vitest coverage across
apps/webapp/test/and colocated*.server.test.tsfiles: db topology, split mode, split read gate, cross-seam guard, mint cutover / flip latency, control-plane cache, control-plane resolver, distinct-db sentinel, read-through loaders (route loaders, run-detail loaders,findEnvironmentFromRun), and the run-engine handlers. Testcontainers-backed; no mocks.pnpm-lock.yamlsynced for the two new webapp deps.Notes
Draft, stacked on #4116 (
runops/pr04-store-engine). Review that first; this diff is against it.Server-change / changeset note to be added at stack-assembly time.
🤖 Generated with Claude Code