Skip to content

Add persistent DCRCredentialStore types and memory backend#5186

Open
tgrunnagle wants to merge 7 commits intomainfrom
dcr-3a_issue_5183
Open

Add persistent DCRCredentialStore types and memory backend#5186
tgrunnagle wants to merge 7 commits intomainfrom
dcr-3a_issue_5183

Conversation

@tgrunnagle
Copy link
Copy Markdown
Contributor

@tgrunnagle tgrunnagle commented May 4, 2026

Summary

  • Why: Phase 3 of the authserver-driven DCR story (Persistent DCRCredentialStore backends (memory + Redis) #4979 / parent Authserver-driven DCR for upstream OAuth 2.1 MCP servers #4976) needs the persisted credential store moved into pkg/authserver/storage/ so both MemoryStorage and the future Redis backend (sub-issue 2) can implement the same interface, and so the wiring change (sub-issue 3) has a single canonical type to swap to. Until that lands, every authserver replica re-registers its own DCR client on boot, no client_secret_expires_at TTL is honored at the storage layer, and the runner-side cache from Authserver DCR integration (Phase 2, Steps 2a-2g) #4978 has no production-shaped sibling.
  • What — adds the storage-layer DCRCredentials value type, the segregated DCRCredentialStore interface, and a MemoryStorage implementation. Production deployments will use the Redis backend introduced in sub-issue 2; the in-memory implementation lands here for tests and single-replica development. The wiring swap is intentionally out of scope.
    • New DCRCredentials value type in pkg/authserver/storage/types.go carrying the canonical DCRKey, ProviderName, the RFC 7591 client fields (ClientID, ClientSecret, TokenEndpointAuthMethod), the RFC 7592 management fields (RegistrationAccessToken, RegistrationClientURI), the upstream-resolved endpoints (AuthorizationEndpoint, TokenEndpoint), and CreatedAt / ClientSecretExpiresAt for the staleness and TTL signals.
    • New DCRCredentialStore interface (GetDCRCredentials, StoreDCRCredentials) added to the package's mockgen directive; mocks regenerated. The interface doc states the defensive-copy contract, the SHOULD-honor-TTL contract for backends with native TTL support, and the rationale for embedding DCRKey in the value type (self-describing blobs for Redis SCAN / admin tooling).
    • DCRKey and its ScopesHash constructor moved from pkg/authserver/runner/ into pkg/authserver/storage/ so any future backend hashes keys identically. The runner keeps a package-local type DCRKey = storage.DCRKey alias and call sites use storage.ScopesHash directly.
    • MemoryStorage gains a dcrCredentials map[DCRKey]*DCRCredentials guarded by the existing sync.RWMutex. StoreDCRCredentials defensively copies on write and rejects nil creds, an unpopulated Key (empty Issuer, RedirectURI, or ScopesHash), and missing RFC 7591 mandatory response fields (ClientID, AuthorizationEndpoint, TokenEndpoint); GetDCRCredentials defensively copies on read and returns wrapped ErrNotFound on miss. DCR entries are intentionally excluded from cleanupExpired (registrations are long-lived; Redis will TTL via SetEX). Stats exposes the new map's count for parity with the other in-memory maps.
    • Renamed the runner-side DCRCredentialStore (a runtime resolution cache for *DCRResolution) to lowercase dcrResolutionCache so it does not collide with the new storage.DCRCredentialStore during the Phase 3 migration. The runner-side cache is the thin adapter sub-issue 3 will swap.

Closes #5183

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Specifically:

  • pkg/authserver/storage/memory_test.go — round-trip on every DCRCredentials field including ClientSecretExpiresAt; distinct DCRKey values do not collide along the Issuer / RedirectURI / Scopes axes; overwrite semantics replace the prior entry; Get on an absent key returns wrapped ErrNotFound via errors.Is; defensive-copy assertions on both read and write paths; StoreDCRCredentials rejects nil creds and the seven invalid-input cases (empty Issuer / RedirectURI / ScopesHash / ClientID / AuthorizationEndpoint / TokenEndpoint, plus nil); Stats reports the DCRCredentials count; DCR entries survive cleanupExpired; concurrent fan-out (16 workers × 200 ops, -race) over overlapping and disjoint key spaces with a fail-fast deadline.
  • pkg/authserver/storage/memory_test.go (TestScopesHash_*) — canonical scopes-hash form exercised here as the single source of truth (permutation, duplicates, boundary-join collision, distinct-scopes inequality).
  • task gen rerun; pkg/authserver/storage/mocks/mock_storage.go regenerated for the new interface.

Special notes for reviewers

The branch is one commit on top of main. The runner-side dcrResolutionCache is renamed but functionally unchanged — it still holds *DCRResolution values and is consumed only by resolveDCRCredentials. Sub-issue 3 will swap the resolver to read/write storage.DCRCredentialStore and delete the runner-side cache; this PR keeps both side by side under non-colliding names so the intermediate state is unambiguous to reviewers and tooling.

The interface contract documents ClientSecretExpiresAt as SHOULD-honor (not MUST), with the in-memory backend retaining the field verbatim and the resolver re-checking on read. The Redis backend (sub-issue 2) plumbs the field into SetEX so an entry evicts before the upstream rejects the secret at the token endpoint.

🤖 Generated with Claude Code

@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label May 4, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@tgrunnagle tgrunnagle changed the base branch from main to issue_5040_dcr-2d May 4, 2026 19:46
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 4, 2026
@github-actions github-actions Bot dismissed their stale review May 4, 2026 19:47

PR size has been reduced below the XL threshold. Thank you for splitting this up!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up!

@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 4, 2026
@tgrunnagle tgrunnagle force-pushed the dcr-3a_issue_5183 branch from 17bd2ab to c0fed52 Compare May 4, 2026 19:47
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 4, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 96.22642% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.72%. Comparing base (c146093) to head (409a66e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/authserver/storage/memory.go 94.59% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5186      +/-   ##
==========================================
- Coverage   67.75%   67.72%   -0.04%     
==========================================
  Files         607      607              
  Lines       62053    62090      +37     
==========================================
+ Hits        42044    42049       +5     
- Misses      16848    16878      +30     
- Partials     3161     3163       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tgrunnagle tgrunnagle force-pushed the issue_5040_dcr-2d branch from 64feb73 to ee7d4cc Compare May 5, 2026 14:32
@tgrunnagle tgrunnagle force-pushed the dcr-3a_issue_5183 branch from c0fed52 to cc92472 Compare May 5, 2026 15:50
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 5, 2026
Base automatically changed from issue_5040_dcr-2d to main May 6, 2026 14:24
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels May 6, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

Phase 3 sub-issue 1 of #5183. Define the persisted DCRCredentials value
type and the storage-level DCRCredentialStore interface in
pkg/authserver/storage/, and ship the in-process memory implementation
that single-replica deployments and unit tests use. The Redis backend
(sub-issue 2) and the wiring change (sub-issue 3) build on this.

DCRKey consolidation: chose option (a) from the issue — DCRKey and its
ScopesHash constructor move to pkg/authserver/storage/ so any future
backend hashes keys identically. The runner package keeps a
package-local type alias (type DCRKey = storage.DCRKey) and a var
binding for scopesHash so existing call sites compile unchanged; the
canonical form has a single source of truth.

DCRCredentials carries ClientSecretExpiresAt so the Redis backend can
drive a SetEX TTL without re-touching the value type or regenerating
mocks. The interface contract documents this as SHOULD honor when
backend-supported; MemoryStorage retains entries verbatim for the
process lifetime.

StoreDCRCredentials rejects nil creds and zero-valued Key.Issuer or
Key.RedirectURI with fosite.ErrInvalidRequest, matching the
StoreUpstreamTokens validation pattern. Stats reports
dcrCredentials count for parity with the other in-memory maps.

The runner-side DCRCredentialStore (Get/Put *DCRResolution) is left in
place as the thin adapter sub-issue 3 will swap. This sub-issue lands
the new storage-level interface, MemoryStorage implementation, and
regenerated mock without touching the wire-up.

DCR credentials are intentionally excluded from cleanupExpired:
RFC 7591 client registrations are long-lived and the authoritative
expiry signal is client_secret_expires_at, which the Redis backend
will honor as a SetEX TTL.
@tgrunnagle tgrunnagle force-pushed the dcr-3a_issue_5183 branch from cc92472 to 7aa7366 Compare May 6, 2026 14:33
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 6, 2026
@github-actions github-actions Bot dismissed their stale review May 6, 2026 14:33

PR size has been reduced below the XL threshold. Thank you for splitting this up!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up!

Copy link
Copy Markdown
Contributor Author

@tgrunnagle tgrunnagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-agent review summary

6 specialist agents reviewed the diff (Go style/API design, concurrency/locking, OAuth/RFC compliance & security, test coverage, review-friendliness, general quality). 8 findings reached consensus (score ≥ 7); 13 dropped to the appendix below.

Author note: I'm the PR author, so this review is posted as COMMENT (GitHub rejects APPROVE / REQUEST_CHANGES on your own PR). If a co-reviewer agrees with the HIGH/MEDIUM findings, a formal REQUEST_CHANGES is warranted.

Consensus summary

ID Finding Severity Score
F1 PR body advertises scope not present in the diff HIGH (process) 7
F2 Two DCRCredentialStore interfaces with the same name in sibling packages HIGH 9
F3 var scopesHash = storage.ScopesHash is a mutable runtime binding, not an alias MEDIUM 8
F4 StoreDCRCredentials validation is narrower than the doc contract claims MEDIUM 9
F5 "Bounded by upstream count" claim is incorrect when scopes change MEDIUM 8
F6 StoreDCRCredentials signature/naming inconsistent with sibling Storage methods MEDIUM 7
F7 Missing concurrent-access test for new DCR Store/Get methods LOW-MEDIUM 7
F8 PR is marked DRAFT but a review was requested MEDIUM (process) 7

F2–F6 are inline below. F1, F7, F8 are PR-level and addressed here:


F1 — PR body / diff mismatch (HIGH, process)

The Summary enumerates changes in cmd/thv-operator/api/v1beta1/, pkg/auth/monitored_token_source.go, pkg/oauthproto/dcr.go, pkg/runner/runner.go, and the v1beta1 CRD/operator paths. None of those files appear in the diff: the actual change is 7 files confined to pkg/authserver/storage/ and pkg/authserver/runner/. Reviewers will read the body, expect operator/runner-side wiring, and either think the diff is incomplete or burn time hunting for code that isn't here.

Recommendation: Rewrite the Summary so it describes only what is in the diff (the storage-package DCR types + memory backend + runner-side type aliases). Move the "follow-up stack" narrative to a forward-looking note in Special notes for reviewers, not to the Summary.

F7 — Missing concurrent-access test for new DCR Store/Get methods (LOW-MEDIUM)

TestMemoryStorage_ConcurrentAccess doesn't include the new methods, and all the new DCR tests run single-goroutine. A regression that, e.g., dropped the lock or returned an internal pointer instead of a clone would not be caught by the current suite under -race. The runner-side TestInMemoryDCRCredentialStore_ConcurrentAccess covers a different implementation.

Recommendation: Add a fan-out test in pkg/authserver/storage/memory_test.go — N goroutines doing StoreDCRCredentials with distinct keys, M doing GetDCRCredentials, with a timed wg.Wait() barrier per .claude/rules/testing.md. Pattern after the existing concurrent tests in this package or the runner-side TestInMemoryDCRCredentialStore_ConcurrentAccess.

F8 — PR is marked DRAFT (process)

Body opens with "# DRAFT - not ready for review". Combined with F1, this suggests the body was written before commits were dropped or rebased. The 555-line / 7-file diff is well within the 400-line / 10-file PR-size limit when generated/test files are excluded, so size isn't a blocker.

Recommendation: Once F1–F6 are addressed, take the PR out of draft and request reviewers.


Holistic assessment

Architecturally the change is sound: the storage interface is narrowly segregated (Get/Store only), the defensive-copy contract mirrors the existing UpstreamTokens pattern, the cross-replica/TTL contract is documented honestly (the in-memory backend's TTL ignorance is offset by the resolver re-checking ClientSecretExpiresAt on read at dcr.go:672), and the compile-time interface assertion (memory.go:1316) catches drift. The PR is well-scoped against the 400-line/10-file limit when generated and test files are excluded (~250 LOC of production code across 4 files).

The main holistic concern is the transitional state: after this PR lands there are two DCRCredentialStore interfaces with the same name in sibling packages (runner.DCRCredentialStore and storage.DCRCredentialStore). The PR body acknowledges this is intentional during sub-issue 2/3, but the chosen name collision is the most reviewer-hostile option — it makes every subsequent review of intermediate PRs require mental disambiguation. F2 below proposes the cheapest fix.

Dropped findings appendix (score < 7)

Title Severity Reason dropped
cloneDCRCredentials shallow copy brittle to future fields LOW Correct today; doc warns; reflection-test mitigation is optional
Silent overwrite when two upstreams collide on Key but differ in ProviderName LOW Improbable; a slog.Warn on collision would suffice
ExcludedFromCleanupExpired test uses misleading stale CreatedAt LOW Cosmetic: assertion is correct, just the comment misleads
Stats().DCRCredentials lacks positive-count assertion LOW One-line test addition; nice-to-have
ScopesHash empty-input non-emptiness not asserted LOW One assert.NotEmpty; nice-to-have
Inconsistent "DCR"/"dcr" casing in error hints LOW Cosmetic
Comment-only placeholder where deleted tests used to live LOW Style preference
Doc comment redundancy between canonical type and runner-side alias LOW Style preference
DCRCredentials.ClientSecret no String()/Marshal redaction LOW Defense-in-depth; today's code doesn't need it
DCRCredentials.ProviderName stored but never read LOW YAGNI; would be consumed by a future PR
Stats struct addition compat INFO Project uses keyed struct literals — non-issue
G117 nolint cites wrong upstream rule INFO Resolved — follows established repo convention per .golangci.yml
slog.Debug call inside RLock on miss path LOW Resolved — matches existing pattern (memory.go:345/490/545/...)

File-by-file findings (would-be inline comments)

GitHub's inline-comment API rejected the batched and per-comment posts with a pull_request_review_thread.base internal-error, so the inline findings are folded into the main body below. Each block names the exact file and line.

pkg/authserver/storage/types.go:263

F2 — Naming collision with runner.DCRCredentialStore (HIGH, score 9)

After this PR lands there are two interfaces sharing this name in sibling packages:

  • runner.DCRCredentialStore (pkg/authserver/runner/dcr_store.go:57) — Get/Put returning *DCRResolution, the in-process resolver cache.
  • storage.DCRCredentialStore (here) — GetDCRCredentials/StoreDCRCredentials returning *DCRCredentials, the persistent contract.

The runner package imports pkg/authserver/storage, so unqualified references inside runner/ are ambiguous to readers and grep tooling. The PR body documents this is intentional during sub-issue 2/3, but the same-name collision is the worst possible outcome for review-friendliness — it prevents the compiler/reader from telling them apart at a glance, and the two will inevitably drift before sub-issue 3 swaps them.

This is the cross-cutting .claude/rules/go-style.md "Avoid parallel types that drift" failure mode.

Recommendation: Rename the runner-side interface (the cheap path) to dcrResolutionCache (lowercase — only the resolver consumes it). Reserve the canonical DCRCredentialStore name for the storage package the future Redis backend will implement. Doing this in this PR costs one rename and removes ambiguity from every subsequent review.

pkg/authserver/storage/types.go:272

F6 — Method signature & naming inconsistent with sibling Storage methods (MEDIUM, score 7)

Two related drifts from the package's local conventions:

  1. Signature shape: Sibling methods take (ctx, key, value):

    • StorePendingAuthorization(ctx, state, pending)
    • StoreUpstreamTokens(ctx, sessionID, providerName, tokens)
    • StoreAccessTokenSession(ctx, signature, request)

    This method takes (ctx, creds) with the key embedded as creds.Key. The embedded Key doubles storage and creates a structural opportunity for caller/key inconsistency — a caller could plausibly construct &DCRCredentials{Key: keyA, ...} and expect Store to use a separate key parameter; the implementation insists on creds.Key, but only because the parameter list doesn't expose the alternative.

  2. Method names: Sibling methods omit the type suffix because the verb implies the subject — StoreAuthCode/GetAuthCode, StoreUpstreamTokens/GetUpstreamTokens, GetUser, GetClient. StoreDCRCredentials/GetDCRCredentials over-specifies.

Recommendation:

// Suggested
type DCRCredentialStore interface {
    GetDCR(ctx context.Context, key DCRKey) (*DCRCredentials, error)
    StoreDCR(ctx context.Context, key DCRKey, creds *DCRCredentials) error
}

...and drop DCRCredentials.Key accordingly. Worth fixing before the interface is exported into the wiring PR (sub-issue 3) since renaming after merge churns more files. If retaining the embedded Key is intentional for self-describing Redis serialisation, document the rationale on the interface so future readers see the reason for the inconsistency.

pkg/authserver/storage/memory.go:97

F5 — "Bounded by upstream count" claim is incorrect when scopes change (MEDIUM, score 8)

The comment asserts: "The map is bounded by the operator-configured upstream count, so unbounded growth is not a concern."

This is only true if (Issuer, RedirectURI, ScopesHash) is invariant per upstream over the process lifetime. ScopesHash changes whenever:

  • the operator widens/narrows the scope set on MCPServer.Spec.OAuthConfig.Scopes
  • the upstream's RFC 8414 scopes_supported rotates and the resolver re-derives the scope set

Each distinct ScopesHash produces a fresh entry; nothing evicts the prior entry (cleanupExpired deliberately skips this map per the comment just below). In a long-lived process where scope sets rotate, growth is upstream count × distinct-scope-sets-ever-registered, not upstream count. Each retained entry holds a ClientSecret, so this is also a (slow) sensitive-data accumulation.

Same false claim is also at pkg/authserver/runner/dcr_store.go:71.

Recommendation: Tighten the comment:

	// dcrCredentials maps DCRKey -> DCRCredentials for RFC 7591 Dynamic Client
	// Registration credentials. Entries are intentionally excluded from the
	// periodic cleanupExpired loop: DCR registrations are long-lived and the
	// authoritative expiry signal is RFC 7591 client_secret_expires_at, which
	// is honored at read time by callers (and by the future Redis backend's
	// SetEX TTL). Growth is bounded by the operator-configured upstream count
	// times the number of distinct scope sets ever registered against each
	// upstream during the process lifetime; for a stable configuration this
	// is the upstream count, but rotating scope sets accumulate stale entries
	// that survive until process restart. The Redis backend's SetEX TTL
	// mitigates this in production deployments.

pkg/authserver/storage/memory.go:1231

F4 — Validation is narrower than the documented contract (MEDIUM, score 9)

The doc says "callers must populate Key" but the implementation only validates Key.Issuer != "" and Key.RedirectURI != "".

  • Key.ScopesHash is silently accepted as empty even though the field doc on DCRKey (in types.go) says "Use ScopesHash() to compute this value — do NOT hash scopes by hand at call sites." A caller that forgets to populate ScopesHash gets a successful Store under the empty-string slot; a sibling that did compute ScopesHash(nil) (a non-empty digest) lands on a different slot. The two records refer to the same logical scope set but live in different cache cells.
  • ClientID, AuthorizationEndpoint, TokenEndpoint are RFC 7591 mandatory response fields. The struct doc says "All fields are populated from the upstream's DCR response." An empty value is a caller bug, not a real registration — but Store accepts it. A subsequent Get returns a working-looking record that fails downstream at the upstream's token endpoint, with no audit trail tying the failure back to the bad Store.

Violates .claude/rules/go-style.md "Validate Parsed Results" / "Constructor Validation: Fail Loudly on Invalid Input".

Recommendation: Tighten validation — reject empty Key.ScopesHash, ClientID, AuthorizationEndpoint, and TokenEndpoint with fosite.ErrInvalidRequest.WithHint(...) matching the existing pattern at lines 1235–1243. Leave ClientSecret permissive (RFC 7591 §2 public clients with none auth method legitimately have no secret).

pkg/authserver/runner/dcr.go:235

F3 — var scopesHash = storage.ScopesHash is a mutable runtime binding, not an alias (MEDIUM, score 8)

Unlike type DCRKey = storage.DCRKey in dcr_store.go:27 (a true compile-time type alias — sound, identity-preserving), this is a package-level variable whose value happens to be a function. Three concrete drawbacks:

  1. Mutable: any code in the runner package can reassign it at runtime. No compile-time guard.
  2. Indirection: extra hop on every call; tempts future contributors to add behaviour by reassigning "the local scopesHash".
  3. Tooling: gopls find-references, refactoring tools, and go doc all see the variable, not storage.ScopesHash. The canonical-source-of-truth motivation argues for visible qualification, not against it.

The alias semantics are also asymmetric with the type alias in the same package — readers see two different aliasing strategies for two related symbols.

Recommendation (pick one):

Option A — qualify call sites:

// at every call site in runner/dcr.go and runner/dcr_store_test.go
storage.ScopesHash(scopes)

Mechanical edit, removes the indirection, makes the storage dependency visible.

Option B — forwarding func (immutable, gets inlined):

func scopesHash(s []string) string { return storage.ScopesHash(s) }

Option A preferred — the entire reason ScopesHash moved to storage/ was to make it the single source of truth, so the qualifier is informational, not noise.

Eliminates the post-merge name collision between runner.DCRCredentialStore
and storage.DCRCredentialStore. The two interfaces serve different
purposes (runtime resolution cache vs persistent credential store), and
the same-name collision in sibling packages forces every reader and grep
search to mentally disambiguate during the Phase 3 migration. Renaming
the runner-side interface to lowercase dcrResolutionCache keeps the
canonical DCRCredentialStore name reserved for the storage package,
which is where the future Redis backend will plug in.

The rename is internal to pkg/authserver/runner — no external callers
exist. Renamed symbols:
  - DCRCredentialStore                -> dcrResolutionCache
  - inMemoryDCRCredentialStore        -> inMemoryDCRResolutionCache
  - NewInMemoryDCRCredentialStore     -> newInMemoryDCRResolutionCache
  - TestInMemoryDCRCredentialStore_*  -> TestInMemoryDCRResolutionCache_*

Addresses #5186 review F2 (HIGH, score 9): naming
collision with runner.DCRCredentialStore.
tgrunnagle added 5 commits May 6, 2026 09:34
Replace `var scopesHash = storage.ScopesHash` with direct calls to
storage.ScopesHash. The variable was a mutable runtime binding (any
package code could reassign it, no compile-time guard) and an extra
indirection layer for tooling and refactors. Qualifying call sites
makes the storage dependency visible and matches the type-alias
strategy already used for DCRKey in dcr_store.go.

Addresses #5186 review F3 (MEDIUM, score 8): mutable
runtime binding, not a true alias.
Reject empty Key.ScopesHash, ClientID, AuthorizationEndpoint, and
TokenEndpoint with fosite.ErrInvalidRequest, matching the broader
"required field" contract documented on the DCRCredentials struct.

The previous validation only checked Key.Issuer and Key.RedirectURI,
which silently accepted partial registrations: a caller forgetting to
populate ScopesHash would Store under the empty-string slot while a
sibling that did compute ScopesHash(nil) (a non-empty digest) would
land on a different cache cell, splitting one logical record across
two slots. Likewise, accepting an empty ClientID, AuthorizationEndpoint,
or TokenEndpoint would Store a working-looking record that fails
downstream at the upstream's token endpoint with no audit trail.

ClientSecret is left permissive because RFC 7591 §2 public clients
(token_endpoint_auth_method = "none") legitimately have no secret.

Tests: replaced the three-case nil/issuer/redirect_uri table with a
seven-case table that constructs a fully valid DCRCredentials and
isolates one missing field per subtest; updated the existing
RoundTrip / DistinctKeys / Overwrite / DefensiveCopy /
StoreCopyIsolates / ExcludedFromCleanupExpired tests to populate the
newly-required fields so they exercise the success path under the
tightened contract.

Addresses #5186 review F4 (MEDIUM, score 9):
validation narrower than the documented contract.
The previous comments asserted the cache is "bounded by the
operator-configured upstream count". This holds only when
(Issuer, RedirectURI, ScopesHash) is invariant per upstream over the
process lifetime. ScopesHash changes whenever the operator widens or
narrows the scope set on MCPServer.Spec.OAuthConfig.Scopes, or when an
upstream's RFC 8414 scopes_supported rotates and the resolver re-derives
the scope set. Each distinct ScopesHash produces a fresh entry; nothing
evicts the prior one (cleanupExpired deliberately skips this map), so
in a long-lived process with rotating scope sets growth is
upstream count × distinct-scope-sets-ever-registered, not upstream
count alone. Each retained entry holds a ClientSecret, so this is also
a (slow) sensitive-data accumulation. The Redis backend's SetEX TTL
mitigates this in production.

Updated parallel comments at memory.go:97 and runner/dcr_store.go:61
to state the actual bound and the production mitigation.

Addresses #5186 review F5 (MEDIUM, score 8):
"bounded by upstream count" claim incorrect when scopes change.
The DCRCredentialStore interface takes (ctx, creds) on Store rather
than the (ctx, key, value) shape used by sibling methods on Storage.
The asymmetry is intentional — embedding DCRKey in DCRCredentials.Key
makes the persisted blob self-describing, so a Redis SCAN, an admin
dump, or a cross-replica reconciliation path can identify a record's
logical cache slot from the value alone — but it diverges from the
package's "key + value" convention enough that future readers will
ask why.

Add a "Why the key is embedded" section to the interface doc that
states the rationale and points at the Memory backend's validation
contract for the populate-Key requirement.

Addresses #5186 review F6 (MEDIUM, score 7):
signature/naming inconsistent with sibling Storage methods. The
reviewer offered two paths — rename to GetDCR/StoreDCR + drop Key,
or document the rationale. Documenting was selected because the
embedded-Key shape is load-bearing for the Phase 3 Redis backend.
The existing TestMemoryStorage_ConcurrentAccess fan-out covers
authcodes, tokens, and clients but not the new DCRCredentialStore
methods, and every other DCR test runs single-goroutine. A regression
that dropped the RWMutex or returned an internal pointer instead of a
defensive copy would not be caught by the suite under -race.

Add a dedicated TestMemoryStorage_DCRCredentials_ConcurrentAccess that
fans out 16 workers × 200 ops, alternating Store / Get against an
overlapping key space (lock must serialise writes) and a disjoint key
space (reads must always hit). Bounded by a 10s fail-fast deadline so a
deadlock regression surfaces with a clear message rather than the
global test timeout. Mirrors the runner-side
TestInMemoryDCRResolutionCache_ConcurrentAccess pattern.

Addresses #5186 review F7 (LOW-MEDIUM, score 7):
missing concurrent-access test for new DCR Store/Get methods.
@tgrunnagle tgrunnagle marked this pull request as ready for review May 6, 2026 16:54
@tgrunnagle
Copy link
Copy Markdown
Contributor Author

Addressed all 8 findings from the multi-agent review (pullrequestreview-4237822982).

ID Finding Resolution
F1 (HIGH, process) PR body / diff mismatch Body rewritten to describe only what's in this PR; forward-looking notes moved to "Special notes for reviewers".
F2 (HIGH, score 9) Naming collision with runner.DCRCredentialStore Renamed runner-side interface to lowercase dcrResolutionCache (and inMemoryDCRResolutionCache / newInMemoryDCRResolutionCache). Storage-side DCRCredentialStore retains the canonical name for the future Redis backend. Commit 96eac7b4.
F3 (MEDIUM, score 8) var scopesHash is a mutable runtime binding Removed the var; runner call sites now invoke storage.ScopesHash directly. Commit 29d1c917.
F4 (MEDIUM, score 9) StoreDCRCredentials validation narrower than the doc contract Now also rejects empty Key.ScopesHash, ClientID, AuthorizationEndpoint, TokenEndpoint. ClientSecret left permissive (RFC 7591 §2 public clients). Table-driven test extended to cover all seven invalid-input cases. Commit 012e2b30.
F5 (MEDIUM, score 8) "Bounded by upstream count" comment incorrect when scopes change Both parallel comments (memory.go:97 and runner/dcr_store.go:61) now state growth is upstream count × distinct-scope-sets-ever-registered, with the Redis SetEX TTL mitigation called out. Commit 22ab153e.
F6 (MEDIUM, score 7) Signature/naming inconsistent with sibling Storage methods Documented the rationale on the interface — embedding DCRKey in DCRCredentials makes the persisted blob self-describing for Redis SCAN / admin tooling, which is load-bearing for sub-issue 2. Commit 72337f47.
F7 (LOW-MEDIUM, score 7) Missing concurrent-access test for new methods Added TestMemoryStorage_DCRCredentials_ConcurrentAccess — 16 workers × 200 ops over overlapping and disjoint key spaces, fail-fast deadline, passes under -race. Commit 409a66e0.
F8 (MEDIUM, process) PR marked DRAFT but review requested PR is now out of draft.

task lint-fix clean (0 issues); go test -count=1 -race ./pkg/authserver/... green across all packages.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels May 6, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Persistent DCRCredentialStore: types, interface, and in-memory backend

1 participant