Skip to content

Pass app-lifetime context to controller, orchestrator, repomanager#115

Merged
sbalabanov merged 1 commit into
mainfrom
fix-cache-goroutine-ctx
Jun 18, 2026
Merged

Pass app-lifetime context to controller, orchestrator, repomanager#115
sbalabanov merged 1 commit into
mainfrom
fix-cache-goroutine-ctx

Conversation

@sbalabanov

@sbalabanov sbalabanov commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Streaming handlers in the controller only watched the gRPC stream context, so an in-flight request kept running until the client disconnected even after the server began shutting down. This PR adds an app-lifetime context so handlers — and the async cache-write goroutine — can also be cancelled at server teardown.

Changes

  1. All three top-level constructors now accept an app-lifetime context as their first argument:

    • controller.NewController(appCtx, Params)
    • orchestrator.NewNativeOrchestrator(appCtx, Params)
    • repomanager.NewRepoManager(appCtx, Params)
  2. controller.linkRequestCtx(reqCtx) helper returns a context cancelled by either the request's stream context (client disconnect) or the controller's appCtx (server shutdown), backed by context.AfterFunc so the watcher handle is released by the deferred cancel — appCtx itself is never cancelled by this helper. Each streaming handler (GetChangedTargets, GetTargetGraph) now derives its working context from this helper and threads it through all downstream calls.

  3. Goroutines launched from handlers inherit the linked context. The per-job graph-fetch workers in GetChangedTargets are derived via context.WithCancel(ctx), so they abort on either client disconnect or app shutdown. The async cache-write goroutine instead passes c.appCtx directly to storage — the write must survive client disconnect (so detaching from the request context is the whole point) but must NOT outlive the server, and c.appCtx satisfies both with no extra plumbing. Per-operation deadlines are the storage backend's responsibility.

  4. `example/main.go` wires `appCtx` from `signal.NotifyContext(ctx, SIGINT, SIGTERM)` and propagates it to all three constructors.

Test plan

  • `go build ./...`
  • `go test ./controller/... ./orchestrator/... ./core/repomanager/...`
  • New tests cover both cancellation paths (request ctx, app ctx) and the AfterFunc-release semantics, including an end-to-end test that drives `GetChangedTargets` and asserts the cache-write goroutine's context is cancelled by `appCtx` only, not by request cancellation.
  • CI green

@sbalabanov sbalabanov requested review from a team as code owners June 12, 2026 19:07
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sbalabanov sbalabanov force-pushed the fix-cache-goroutine-ctx branch 2 times, most recently from e0fc957 to 641a958 Compare June 12, 2026 19:41
@sbalabanov sbalabanov marked this pull request as draft June 12, 2026 19:41
@sbalabanov sbalabanov changed the title [controller] Use shutdown context for async cache write Pass app-lifetime context to controller, orchestrator, repomanager Jun 12, 2026
@sbalabanov sbalabanov force-pushed the fix-cache-goroutine-ctx branch 3 times, most recently from 4d8e23a to 1866682 Compare June 12, 2026 22:14
@sbalabanov sbalabanov marked this pull request as ready for review June 12, 2026 22:20
@sbalabanov sbalabanov force-pushed the fix-cache-goroutine-ctx branch from 1866682 to f7d9283 Compare June 15, 2026 20:30
@sbalabanov sbalabanov force-pushed the fix-cache-goroutine-ctx branch 2 times, most recently from 6bd1655 to 12fbcd7 Compare June 16, 2026 18:28
// is cancelled on shutdown. Per-operation deadlines are the storage
// backend's responsibility — the controller is backend-agnostic and
// must not encode any one implementation's I/O budget.
treehash1 := readTreehash(c.appCtx, c.storage, logger, request.GetFirstRevision())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the context here should be detached from existing one, so that when c.appctx is cancelled, the upload still continues on best effort

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

App context is cancelled when application shuts down as it is signalled by the host management platform. SIGKILL is very likely to follow soon after. How should "best effort" look like in this case?

@xytan0056 xytan0056 Jun 18, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking best effort of uploading results to survive both client disconnect and app shutdown -- nothing cancels it, upload whatever it can.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are few severe anti-patterns in this decision as I see of.
https://dave.cheney.net/2016/12/22/never-start-a-goroutine-without-knowing-how-it-will-stop

The async cache-write goroutine in GetChangedTargets used
context.Background(), so it had no way to honor server shutdown — a
rolling restart could leak in-flight cache writes past the point where
storage was being torn down. Streaming handlers also only watched the
gRPC stream context, so an in-flight request kept running until the
client disconnected even after the server began shutting down.

Make all three top-level constructors (NewController,
NewNativeOrchestrator, NewRepoManager) accept an appCtx context.Context
as their first argument. The controller routes its async cache write
through appCtx; the orchestrator and repomanager store it for future
fire-and-forget goroutines, with comments documenting the convention.

Add a controller.linkRequestCtx(reqCtx) helper that returns a context
cancelled by either the request's stream context (client disconnect) or
the controller's appCtx (server shutdown), backed by context.AfterFunc
so the watcher handle is released on the request's defer. Each
streaming handler now derives its working context from this helper and
threads it through downstream calls.

example/main.go wires appCtx to signal.NotifyContext on SIGINT/SIGTERM
and propagates it to all three constructors, so background work and
in-flight requests are cancelled at process shutdown rather than left
to leak.
@sbalabanov sbalabanov force-pushed the fix-cache-goroutine-ctx branch from 12fbcd7 to ed8a2e3 Compare June 18, 2026 21:07
@sbalabanov sbalabanov merged commit 4223bf4 into main Jun 18, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants