Skip to content

Wire persistent DCRCredentialStore into EmbeddedAuthServer#5196

Draft
tgrunnagle wants to merge 12 commits intodcr-3b_issue_5184from
dcr-3c_issue_5185
Draft

Wire persistent DCRCredentialStore into EmbeddedAuthServer#5196
tgrunnagle wants to merge 12 commits intodcr-3b_issue_5184from
dcr-3c_issue_5185

Conversation

@tgrunnagle
Copy link
Copy Markdown
Contributor

@tgrunnagle tgrunnagle commented May 5, 2026

DRAFT - not ready for review

Summary

  • Why: Phase 2 of the DCR story shipped an in-memory stub for the DCRCredentialStore that lived only in the runner package. Restarting (or scaling out to) an authserver dropped every RFC 7591 client registration on the floor and re-registered against the upstream on every boot, which is unworkable for the Datadog-style upstream demo and for any multi-replica deployment. This PR wires the persistent DCRCredentialStore introduced in earlier sub-issues (in-memory + Redis backends) into EmbeddedAuthServer so a Redis-backed authserver reuses already-registered clients across replicas and restarts.
  • What:
    • EmbeddedAuthServer.dcrStore is now typed against storage.DCRCredentialStore and is derived from the same storage.Storage value returned by createStorage, so a single storage_type: redis config toggles DCR persistence alongside the rest of authserver state. The storage.Storage interface embeds storage.DCRCredentialStore, promoting the previously-needed runtime type assertion to a compile-time guarantee.
    • The Phase 2 standalone in-memory DCRCredentialStore in pkg/authserver/runner/dcr_store.go is collapsed into a thin storageBackedStore adapter that delegates to storage.DCRCredentialStore and translates DCRResolution <-> DCRCredentials at the boundary. There is now exactly one persistence implementation per backend.
    • The constructor is split into the public NewEmbeddedAuthServer (creates storage) and an unexported newEmbeddedAuthServerWithStorage (owns the cleanup contract). Any error after entry closes the storage backend via a deferred cleanup gated on a named return error, so a crash-looping caller no longer leaks the Redis client connection pool / MemoryStorage cleanup goroutine on every restart.
    • buildPureOAuth2Config remains pure (unchanged signature, no ctx, no I/O); buildUpstreamConfigs is the boundary that consumes the resolver and overlays DCR-resolved credentials onto each upstream.
    • Adds DCRStore() accessor on EmbeddedAuthServer mirroring IDPTokenStorage / UpstreamTokenRefresher, used by integration tests to verify the resolver and the authserver write through the same backend.

This PR also lands the dependency stack that #5185 builds on (the persistent DCRCredentialStore types + memory backend, the Redis backend, the operator CRD surface for DCR, and the runner-side DCR resolver wiring). Each layer was developed and reviewed as a separate commit on this branch; commits are sequenced so each one builds and tests cleanly.

Closes #5185

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Notable test coverage added by this PR:

  • pkg/authserver/integration_dcr_restart_test.go (new) — TestEmbeddedAuthServer_DCRSurvivesRestart boots an EmbeddedAuthServer against a mock AS, captures the DCR store via the new DCRStore() accessor, closes the server, and asserts the persisted DCR row survives the first server's Close. Lives in package authserver_test to avoid the runner -> authserver import cycle. The full "boot, close, boot again, observe zero /register" scenario across a fresh constructor is documented as a gap (the production Redis path requires Sentinel, which miniredis does not speak); test docstring records the conditions under which it can be closed.
  • pkg/authserver/runner/embeddedauthserver_test.goTestBuildUpstreamConfigs_DCR exercises first-call registration + cache-hit on the second call (zero additional HTTP requests) and asserts the caller's RunConfig.Upstreams slice is never mutated. TestNewEmbeddedAuthServer_ClosesStorageOnError uses a closeTrackingStorage wrapper to verify the deferred-cleanup contract.
  • pkg/authserver/storage/memory_test.go, redis_test.go, redis_integration_test.go — coverage for the persistent DCRCredentialStore operations on both backends, including ScopesHash canonicalisation (sort + dedupe + newline join).

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

The CRD changes are additive: OAuth2UpstreamConfig.clientId becomes optional with a CEL constraint requiring exactly one of clientId or dcrConfig, and a new dcrConfig field is added. Existing MCPExternalAuthConfig / VirtualMCPServer resources that set clientId continue to validate unchanged.

Does this introduce a user-facing change?

Yes. Operators of OAuth2 upstreams can now configure RFC 7591 Dynamic Client Registration in the operator CRD via dcrConfig (with discoveryUrl or registrationEndpoint, plus optional initialAccessTokenRef, softwareId, softwareStatement) instead of statically configuring clientId + clientSecret. When the authserver is configured with storage_type: redis, DCR registrations persist across restarts and are shared across replicas; in single-replica memory mode, registrations live for the process lifetime as before.

Special notes for reviewers

  • This PR is the terminal task in the Phase 3 DCR DAG and pulls along the dependency stack from sub-issues 1 and 2 (persistent DCRCredentialStore types + memory + Redis backends), the operator CRD surface, and the Phase 2 resolver wiring. The size is above the usual 400-line / 10-file limit; each commit is self-contained and the stack reads top-to-bottom in commit order. Reviewers may prefer to walk the per-commit diffs.
  • The full "boot, close, boot again, zero /register" cross-constructor restart scenario is not exercised; closing it requires either miniredis-Sentinel emulation or a Docker-based Redis Sentinel cluster in the test harness. The wiring that the second boot would consume — the type of dcrStore being the same storage.DCRCredentialStore that authserver.New writes through — is verified at compile time by storage.Storage embedding storage.DCRCredentialStore and by TestEmbeddedAuthServer_DCRSurvivesRestart asserting the persistence boundary.
  • buildPureOAuth2Config was kept intentionally pure (no ctx, no I/O) to preserve the architectural gate established in Authserver DCR integration (Phase 2, Steps 2a-2g) #4978; the wiring change swaps the implementation passed into the resolver, not the call shape.
  • No secrets (client_secret, registration_access_token, initial_access_token, refresh tokens) appear as arguments to slog.* calls; the grep assertion from Authserver DCR integration (Phase 2, Steps 2a-2g) #4978 still applies.

tgrunnagle and others added 12 commits May 1, 2026 06:46
Implements Phase 2 steps 2d/2g of the DCR story (#5039):

- EmbeddedAuthServer now owns an in-memory DCRCredentialStore and calls
  resolveDCRCredentials for any OAuth2 upstream with DCRConfig. The
  resolved ClientSecret is overlaid on the built upstream.OAuth2Config
  after buildPureOAuth2Config (whose signature and body remain
  intentionally unchanged) so that RFC 7591-obtained credentials flow
  through the same execution path as file/env-resolved secrets.
- Each UpstreamRunConfig element is shallow-copied and its OAuth2
  sub-config is deep-copied before DCR resolution, preserving the
  caller's RunConfig.Upstreams slice per .claude/rules/go-style.md
  "Copy Before Mutating Caller Input".
- resolveDCRCredentials emits structured logs: Debug on cache hit with
  dcr_age_days, an additional Warn when the cached registration exceeds
  dcrStaleAgeThreshold (90 days), and Error with a "step" attribute
  identifying which phase failed on every error path.
- The /oauth/register handler upgrades its success log to Info with
  upstream, issuer, client_id, software_id, token_endpoint_auth_method,
  and scopes. SoftwareID is threaded through DCRRequest validation so
  incoming "software_id" is captured. A small helper guards against a
  nil embedded *fosite.Config (a legitimate test-only condition).
- isTransientNetworkError's permanent-4xx branch now emits a Warn with
  a DCR remediation hint before returning false unchanged. The
  MonitoredTokenSource gains an optional SetUpstreamContext setter so
  the upstream and client_id fields can be threaded into the log
  without breaking the existing NewMonitoredTokenSource contract.
- Integration tests exercise the full DCR boot path against a mock AS,
  verify the cache-hit short-circuit issues zero additional HTTP
  requests, and assert the caller's original RunConfig.Upstreams slice
  element is unchanged across both calls.

Address authserver DCR review feedback

Fixes from the CODE_REVIEW_ISSUES.md review of commit 71c4f43:

Critical
- Wire upstream/client_id into MonitoredTokenSource so the DCR remediation
  warning carries meaningful correlation fields. Promote the fields to
  constructor parameters (replacing SetUpstreamContext) to remove the
  unsynchronized writer and force callers to supply them at construction.
- Runner populates the new fields from the RemoteAuthConfig, preferring
  the DCR-cached client_id over the statically configured one.

High
- /oauth/register handler drops the redundant upstream attribute that
  mirrored issuer, and omits issuer when empty rather than emitting a
  bare issuer="".
- Resolver no longer logs each error branch and then returns; it now
  wraps failures in a DCRStepError and the boundary caller
  (buildUpstreamConfigs) emits a single slog.Error via LogDCRStepError.
- The DCR-resolved ClientSecret is applied through a new
  applyResolutionToOAuth2Config helper paired with applyResolution, so
  the DCR application sites live side-by-side and future call sites
  cannot silently drop the secret.

Medium
- Remove the Type==OAuth2 guard that duplicated needsDCR's nil check.
- Cap software_id to 256 characters and require printable ASCII in
  ValidateDCRRequest; expose MaxSoftwareIDLength.
- Add TestNewEmbeddedAuthServer_DCRBoot to drive the full constructor
  and assert EmbeddedAuthServer.dcrStore is populated after boot.
- Remove the nil-guard in Handler.issuer() and add TestHandler_issuer
  so a real wiring bug fails loudly instead of logging issuer="".
- Sanitize error strings before logging to strip URL query parameters
  that could plausibly carry tokens in a future refactor.

Preserve URL-trailing punctuation in DCR log sanitiser

The query-stripping regex in sanitizeErrorForLog matches `[^\s"']+`, so a
URL ending with sentence punctuation (`.`, `,`, `)`, `]`, etc.) pulls
that punctuation into the URL match. url.Parse then absorbs it into the
raw query, and the Strip + Reassemble step drops it along with the rest
of the query — mangling the surrounding prose.

Split the trailing run of terminal punctuation off the match before
parsing, and re-append it verbatim after the query is stripped. URL
matches without a query are returned untouched so the pass is
idempotent for URLs that are already clean. New test cases cover commas,
periods, closing parens, mixed runs, and a Go http.Client-style quoted
URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A misbehaving or malicious authorization server could echo arbitrary
content into the upstream's /register error response. handleHTTPResponse
read that body via io.ReadAll with no LimitReader, then embedded it
verbatim in the returned error — which downstream callers log. Cap the
read at 8 KiB (far larger than any conformant RFC 7591 error response)
so operator log volume cannot be inflated by a non-conformant upstream.

Addresses #5044 review finding F2 (HIGH).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The doc comment claimed remoteAuthLogContext mirrored the precedence of
remote.Handler.resolveClientCredentials, but the implementation skipped
the CachedCIMDClientID check entirely. For CIMD-authenticated workloads
the new DCR remediation Warn would have reported a stale or empty
client_id rather than the CIMD URL actually being sent on token refresh,
defeating the operator-correlation the field exists for.

Restore the documented precedence (CachedCIMDClientID >
CachedClientID > ClientID) and add a TestRemoteAuthLogContext case
covering the CIMD-wins path.

Addresses #5044 review findings F3 (HIGH) and F26
(MEDIUM, closed by the new test case).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
isTransientNetworkError previously emitted the cached-DCR-client
remediation Warn from inside its classifier on every permanent 4xx.
A tight Token() loop hitting the same condition would spam the same
record on every call before the workload's Unauthenticated status
propagated. The same branch also fired for non-DCR workloads, which
saw a remediation telling them to "delete cached credentials" they
never had.

Strip the side effect from the classifier and emit the Warn from
markAsUnauthenticated, which already gates the close-monitoring
transition through stopOnce. The Warn now fires at most once per
state transition, is suppressed when no client_id context is
available, and reads honestly about the variability ("if this
workload uses cached DCR or CIMD credentials they may be stale").

Addresses #5044 review finding F5 (MEDIUM).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related polish items in the authserver handlers package:

NewHandler did not validate that AuthorizationServerConfig (or its
embedded *fosite.Config) was non-nil. issuer() reached into the
config at request time, so a misconfigured caller would panic deep
inside an HTTP handler instead of failing at startup. Add the nil
check to the constructor and simplify issuer() to rely on the
constructor invariant. Pin the new invariant with
TestNewHandler_ErrorsOnNilConfig.

The /oauth/register success log was promoted to Info even though the
operation is neither long-running nor exceptional. Demote back to
Debug; an audit-log path is the right home for the audit signal if
that becomes a requirement.

Addresses #5044 review findings F6 and F7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundles ten review-feedback items in pkg/authserver/runner; each
addresses internal-API hygiene or doc-comment drift in the DCR
resolver, no behaviour change for callers.

Sanitizer hardening (F1, F19, F18, F23): sanitizeErrorForLog now
strips userinfo and fragment in addition to the query, since either
can carry credentials or tokens (https://user:pass@host,
implicit-flow #access_token=...). queryStrippingPattern matches
http/https case-insensitively per RFC 3986 §3.1 so HTTPS://...
cannot escape sanitisation. trimURLTrailingPunctuation switches to
strings.IndexByte to match the ASCII-only terminator set without
the rune-decoding overhead. Test cases added for each.

Resolver error API (F4, F13, F20): the dcrStepRegister panic-recovery
branch no longer emits a duplicate slog.Error; the captured stack
travels with the wrapped *dcrStepError to the boundary log so a
single panic produces a single record. DCRStepError /
LogDCRStepError lowercased to dcrStepError / logDCRStepError since
no caller lives outside the package. logDCRStepError now no-ops on
nil err so the unknown-step branch cannot fire on a missing failure.

Resolution helpers (F11, F12, F25): applyResolution renamed to
consumeResolution to communicate that it is a one-shot state
transition (clearing DCRConfig is unconditional), not an idempotent
defaulting step. The applyResolutionToOAuth2Config doc now states
the paired-call invariant explicitly without referencing a specific
test.

Lifecycle docs (F21, F22): the per-instance dcrStore vs.
process-wide dcrFlight asymmetry is now stated on both sides, and
EmbeddedAuthServer.Close documents the future-Close hook for a
backend with handles.

Inline rules-file rationale (F24): production comments no longer
cite .claude/rules/... by path; the principle is inlined.

Addresses #5044 review findings F1, F4, F11, F12,
F13, F18, F19, F20, F21, F22, F23, F24, F25.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements changes for issue #5040 (Phase 2 DCR CRD surface):

- Add DCRUpstreamConfig CRD type (discoveryUrl, registrationEndpoint,
  initialAccessTokenRef, softwareId, softwareStatement) and a new
  dcrConfig field on OAuth2UpstreamConfig so Kubernetes users can
  configure RFC 7591 Dynamic Client Registration on upstream providers.
- Make OAuth2UpstreamConfig.clientId optional and add CEL validation
  requiring exactly one of clientId or dcrConfig, and exactly one of
  discoveryUrl or registrationEndpoint inside dcrConfig. Mirror the
  checks at runtime via validateOAuth2DCRConfig for defense-in-depth.
- Wire the conversion in controllerutil/authserver.go so DCRConfig is
  mapped onto authserver.DCRUpstreamConfig. InitialAccessTokenRef is
  resolved to an env var (TOOLHIVE_UPSTREAM_DCR_INITIAL_ACCESS_TOKEN_*)
  populated from the referenced Secret, mirroring the ClientSecretRef
  pattern. Extract small helpers for env-var generation to keep
  cyclomatic complexity within lint limits.
- Regenerate zz_generated.deepcopy.go, CRD YAML manifests, and CRD API
  reference docs.
- Add table-driven validation tests covering DCR+ClientID conflict,
  both endpoints set, neither endpoint set, valid single-endpoint
  cases, and neither-auth configuration. Add conversion tests covering
  DCR discoveryUrl/registrationEndpoint paths and initial-access-token
  env var wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address code review feedback

Fixed issues from code review of the DCR CRD surface commit:

- CRITICAL: CEL markers contained a Unicode smart quote (U+201D) that
  gofmt's doc-comment formatter reintroduced on every lint-fix. Rewrote
  both markers to use CEL's size(...) > 0 idiom instead of `!= ''`, which
  sidesteps the typographic normalization entirely and keeps regeneration
  idempotent. Verified no U+2018-U+201F characters remain in source or CRDs.
- HIGH: buildUpstreamRunConfig now calls the exported
  mcpv1beta1.ValidateOAuth2DCRConfig before producing a RunConfig, so
  malformed ClientID/DCRConfig pairs that bypass admission fail at
  reconcile time rather than at authserver startup. Error propagation
  threaded through BuildAuthServerRunConfig; split OIDC and OAuth2
  branches into helpers to stay under the gocyclo limit.
- HIGH: Added table case exercising validateUpstreamProvider rejection
  of an OIDC-typed provider whose OAuth2Config carries a DCRConfig.
- MEDIUM: Added kubebuilder CEL XValidation on UpstreamProviderConfig
  enforcing oidcConfig/oauth2Config mutual exclusivity paired to the
  declared type, closing the silent-pod-failure YAML-apply gap.
- MEDIUM: Added MaxLength=255 to SoftwareID and MaxLength=4096 to
  SoftwareStatement to prevent unbounded input from inflating CRs
  beyond etcd object limits.
- MEDIUM: Pinned the "neither ClientID nor DCRConfig" error assertion to
  the scoped `oauth2Config:` prefix; added a regression case exercising
  the non-DCR OAuth2 path (ClientID only, DCRConfig nil); added a new
  TestBuildAuthServerRunConfig_InvalidDCR suite covering all four
  invalid DCR/ClientID pairings at the conversion layer.
- MEDIUM: Renamed UpstreamDCRInitialAccessTokenEnvVar to
  UpstreamDCRInitialAccessTokenEnvVarPrefix and expanded the godoc on
  both prefix constants to show the resolved <prefix>_<PROVIDER> form.

All task lint/lint-fix/license-check pass; regenerated CRDs and
deepcopy are idempotent; affected unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address iteration-2 review feedback

Polish items raised in the second review pass:

- MEDIUM: Trim duplicate upstream name from reconcile-time DCR validation
  errors. Added scopedFieldPath() helper in
  cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go so
  ValidateOAuth2DCRConfig prepends a dotted prefix only when one is
  given, and the conversion call site now passes an empty prefix so
  BuildAuthServerRunConfig's outer "upstream %q: %w" wrap is the only
  mention of the upstream name. Strengthened
  TestBuildAuthServerRunConfig_InvalidDCR to assert the upstream name
  appears exactly once in the error string.
- MEDIUM: Make the UpstreamProviderConfig CEL rule fail closed for
  unrecognized future provider types. Restructured the rule from a
  binary discriminator into a chain of equality checks ending in an
  explicit `false`, and updated the message to "type must be 'oidc'
  or 'oauth2'; ...". Added a contributor-facing doc comment reminding
  future authors to extend both the rule and validateUpstreamProvider
  when adding a new UpstreamProviderType.
- MEDIUM: Refresh the godoc on extractUpstreamSecretRefs to describe
  the actual invariants that hold post-CEL: OIDC providers can only
  return a clientSecretRef; OAuth2 providers can return both
  independently; other (currently unreachable) types return nil/nil.
  Cross-linked to the CEL rule and noted that BuildAuthServerRunConfig
  is the reconcile-time backstop callers should not rely on this
  helper to enforce.

Regenerated CRD YAMLs and crd-api.md prose. task lint, lint-fix,
license-check, and the affected unit tests pass; regeneration is
idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 sub-issue 1 of #5183. Define the persisted DCRCredentials value
type and the storage-level DCRCredentialStore interface in
pkg/authserver/storage/, and ship the in-process memory implementation
that single-replica deployments and unit tests use. The Redis backend
(sub-issue 2) and the wiring change (sub-issue 3) build on this.

DCRKey consolidation: chose option (a) from the issue — DCRKey and its
ScopesHash constructor move to pkg/authserver/storage/ so any future
backend hashes keys identically. The runner package keeps a
package-local type alias (type DCRKey = storage.DCRKey) and a var
binding for scopesHash so existing call sites compile unchanged; the
canonical form has a single source of truth.

DCRCredentials carries ClientSecretExpiresAt so the Redis backend can
drive a SetEX TTL without re-touching the value type or regenerating
mocks. The interface contract documents this as SHOULD honor when
backend-supported; MemoryStorage retains entries verbatim for the
process lifetime.

StoreDCRCredentials rejects nil creds and zero-valued Key.Issuer or
Key.RedirectURI with fosite.ErrInvalidRequest, matching the
StoreUpstreamTokens validation pattern. Stats reports
dcrCredentials count for parity with the other in-memory maps.

The runner-side DCRCredentialStore (Get/Put *DCRResolution) is left in
place as the thin adapter sub-issue 3 will swap. This sub-issue lands
the new storage-level interface, MemoryStorage implementation, and
regenerated mock without touching the wire-up.

DCR credentials are intentionally excluded from cleanupExpired:
RFC 7591 client registrations are long-lived and the authoritative
expiry signal is client_secret_expires_at, which the Redis backend
will honor as a SetEX TTL.
Implements the persistent Redis backend for DCR credentials behind the
DCRCredentialStore interface, so an authserver backed by Redis Sentinel
shares dynamic-client-registration state across replicas and survives
restarts. The wire format honors RFC 7591 client_secret_expires_at as a
Redis TTL when non-zero, falling back to a long-lived row when the
upstream did not assert an expiry.

Implements changes for issue #5184:
- Add KeyTypeDCR const and length-prefixed redisDCRKey helper that
  handles colons in RedirectURI without ambiguity
- Add RedisStorage.StoreDCRCredentials/GetDCRCredentials with JSON
  serialization, defensive copy via decode, and TTL derived from
  ClientSecretExpiresAt
- Add unit tests for redisDCRKey distinctness/determinism plus
  miniredis-backed tests for round-trip, overwrite, validation, defensive
  copy, TTL semantics, and concurrent access
- Add Redis Sentinel integration tests covering round-trip, distinct
  keys, overwrite, real-Redis TTL (-1 for never-expires), and concurrent
  access
Fixed issues from code review:
- MEDIUM: Drop input validation from RedisStorage.GetDCRCredentials so
  it matches MemoryStorage and the DCRCredentialStore interface contract
  (an unpopulated key is now a normal ErrNotFound miss). Fold the
  empty-key cases into TestRedisStorage_DCRCredentials_NotFound.
- MEDIUM: Apply pastExpiryDCRTTL = 1s when ClientSecretExpiresAt is in
  the past so already-expired DCR rows self-evict instead of living
  forever. Update docstring and rewrite the corresponding TTL subtest.
- MEDIUM: Assert no Redis row remains after each rejected
  StoreDCRCredentials call, mirroring the memory backend's
  Stats().DCRCredentials == 0 guard.
- LOW: Loosen the integration TTL upper-bound assertion by 1s so
  truncation/Redis second-granularity cannot flake the assertion.
- LOW: Drop the inaccurate SetEX rationale from StoreDCRCredentials'
  docstring (resolved as part of the past-expiry fix).
- LOW: Remove dcrIntegrationFixtureKey duplicate; integration tests
  now share dcrFixtureKey from redis_test.go (no build tag) so the
  canonical fixture has a single source of truth.
Type EmbeddedAuthServer.dcrStore against storage.DCRCredentialStore and
derive it from the same storage.Storage value returned by createStorage
via a single type assertion, so a Redis-backed authserver reuses
already-registered RFC 7591 clients across replicas and restarts
instead of re-registering at every boot.

Phase 2 left two parallel DCR stores: a runner-side in-memory map in
dcr_store.go and the storage-level interface added in sub-issue 1. This
collapses the runner-side implementation into a thin storageBackedStore
adapter that delegates to storage.DCRCredentialStore, leaving exactly
one persistence implementation per backend (storage.MemoryStorage and
storage.RedisStorage).

NewInMemoryDCRCredentialStore is preserved as a test helper that wraps
storage.NewMemoryStorage so existing resolver tests compile unchanged;
the standalone inMemoryDCRCredentialStore type and its map / RWMutex
are deleted. buildPureOAuth2Config is unchanged — the wiring change
swaps the implementation passed to the resolver, not the call shape.

Add TestEmbeddedAuthServer_DCRSurvivesRestart in
embeddedauthserver_test.go (next to TestNewEmbeddedAuthServer_DCRBoot)
covering the durable-restart case: boot, close, rebuild against the
same storage.MemoryStorage instance, assert the second resolve makes
zero AS requests. The integration_test.go file under pkg/authserver
would otherwise be the natural home, but it is in package authserver
and importing runner from there would cycle (runner already imports
authserver); the test docstring records this constraint.
Fixed issues from code review of #5185 wiring change:

- HIGH: Storage backend leaked on NewEmbeddedAuthServer error paths.
  Split the constructor into a public NewEmbeddedAuthServer that calls
  createStorage and an unexported newEmbeddedAuthServerWithStorage that
  owns the cleanup contract via a deferred Close gated on a named
  return error. Verified by TestNewEmbeddedAuthServer_ClosesStorageOnError
  using a closeTrackingStorage wrapper.

- MEDIUM: Comment claimed interface embedding that did not exist.
  Embed storage.DCRCredentialStore in the storage.Storage interface
  instead, promoting the runtime type assertion to a compile-time
  guarantee (the AC's explicitly preferred outcome). The dead error
  branch and its outdated comment are gone; mocks regenerated via
  task gen.

- MEDIUM: Test placement deviated from AC instruction. Moved
  TestEmbeddedAuthServer_DCRSurvivesRestart out of the runner package
  and into a new pkg/authserver/integration_dcr_restart_test.go in
  package authserver_test, so the test lives next to the other
  pkg/authserver integration tests without inducing the runner ->
  authserver import cycle. Added a small public DCRStore() accessor
  on EmbeddedAuthServer mirroring existing IDPTokenStorage /
  UpstreamTokenRefresher accessors.

- MEDIUM: Durable-restart not exercised end-to-end. Strengthened the
  restart test to go through NewEmbeddedAuthServer for the first boot
  (full constructor path with DCR), capture the storage via the new
  DCRStore() accessor, and assert the DCR row survives the first
  server's Close. The full "boot, close, boot again, observe zero
  /register" scenario remains a documented gap (the production Redis
  path requires Sentinel which miniredis does not speak); the gap and
  the conditions under which it can be closed are recorded in the test
  docstring per the review's accept-the-gap branch.
@tgrunnagle tgrunnagle changed the base branch from main to dcr-3b_issue_5184 May 5, 2026 15:48
@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label May 5, 2026
@tgrunnagle tgrunnagle force-pushed the dcr-3b_issue_5184 branch from 43ead51 to b0bf320 Compare May 6, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Persistent DCRCredentialStore: wire into EmbeddedAuthServer

1 participant