Skip to content

Expose explicit primaryUpstreamProvider for Cedar authz on VirtualMCPServer#5199

Open
tgrunnagle wants to merge 6 commits intomainfrom
authz-provider_issue_5197
Open

Expose explicit primaryUpstreamProvider for Cedar authz on VirtualMCPServer#5199
tgrunnagle wants to merge 6 commits intomainfrom
authz-provider_issue_5197

Conversation

@tgrunnagle
Copy link
Copy Markdown
Contributor

@tgrunnagle tgrunnagle commented May 5, 2026

Summary

A VirtualMCPServer with multiple upstream IDPs configured on its embedded auth server has no way to point Cedar at a non-first upstream — the operator unconditionally binds PrimaryUpstreamProvider to whichever upstream happens to be listed first in the spec. This forces users to reorder the list as a side-channel for an authz decision and offers no opt-out beyond editing the upstream order. This PR exposes an explicit primaryUpstreamProvider field on the inline authz config so users can pin Cedar's claim source, while preserving existing first-upstream behavior when the field is empty so existing manifests continue to work unchanged. Behavior brings Cedar/authz to parity with the SubjectProviderName precedent already established on the token-exchange and AWS-STS strategies of MCPExternalAuthConfig.

Closes #5197

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

The new primaryUpstreamProvider field is optional with omitempty. Existing manifests without the field continue to use the first-upstream auto-binding fallback verbatim.

Changes

File Change
cmd/thv-operator/api/v1beta1/mcpserver_types.go Add optional PrimaryUpstreamProvider to shared InlineAuthzConfig (vMCP-only in practice, mirrors SubjectProviderName).
cmd/thv-operator/api/v1beta1/virtualmcpserver_types.go Add ConditionReasonAuthzUpstreamUnknown for the new rejection paths.
cmd/thv-operator/pkg/vmcpconfig/converter.go Replace unconditional first-upstream binding with explicit-then-fallback resolution; both branches normalize via authserver.ResolveUpstreamName.
cmd/thv-operator/controllers/virtualmcpserver_controller.go Reject explicit names that don't match any declared upstream and reject explicit names without an embedded AS (AuthServerConfigValidated=False, AuthzUpstreamUnknown); suppress the AuthzUpstreamSelectionWarning advisory when an explicit choice is set; advisory message now points users at the new field.
cmd/thv-operator/controllers/virtualmcpserver_controller_test.go Validator cases for explicit-match, explicit-suppresses-multi-upstream-advisory, explicit-no-match rejection, and explicit-without-AS rejection.
cmd/thv-operator/pkg/vmcpconfig/converter_test.go Converter cases for explicit single-upstream, explicit override of multi-upstream, normalization through ResolveUpstreamName ("" upstream + "default" explicit), and explicit-forwarded-without-AS contract.
deploy/charts/operator-crds/files/crds/*.yaml, templates/*.yaml, docs/operator/crd-api.md Regenerated CRD manifests and API reference for the new field.

Does this introduce a user-facing change?

Yes. Users authoring VirtualMCPServer manifests can now set spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider to explicitly choose which upstream IDP's access token claims Cedar evaluates. When unset, behavior is unchanged: the controller defaults to the first configured upstream and emits the existing AuthzUpstreamSelectionWarning advisory if multiple upstreams are declared. Setting the field to a name that does not match any declared upstream — or setting it without an embedded auth server — now rejects the spec at admission rather than failing silently at request time.

Implementation plan

Approved implementation plan

Sub-issue of #5146, addressing root cause 1: the operator auto-binds PrimaryUpstreamProvider to the first upstream provider's name without exposing a CRD field that lets users override or opt out.

  1. CRD type: Add PrimaryUpstreamProvider string to the shared InlineAuthzConfig (matches the SubjectProviderName precedent on token-exchange/AWS-STS strategies — also defined on shared types but documented as vMCP-only).
  2. Converter wiring: Replace the unconditional first-upstream assignment with explicit-then-fallback. Run ResolveUpstreamName on the user-supplied value too, so the field is normalized consistently with the auto-selected path.
  3. Validation (validateAuthzUpstreamAvailable): Reject (SpecValidationError, AuthServerConfigValidated=False) when an explicit PrimaryUpstreamProvider is set but does not match any entry in Spec.AuthServerConfig.UpstreamProviders after ResolveUpstreamName normalization on both sides. Suppress the AuthzUpstreamSelectionWarning advisory when the user has set the field explicitly. Leave the existing "embedded AS but zero upstreams" reject path unchanged. (Code review added: also reject when an explicit name is set with no embedded auth server at all.)
  4. Tests: Converter and validator cases for the new branches.
  5. Generated artifacts: task operator-generate, task operator-manifests, task -d cmd/thv-operator crdref-gen.
  6. Backward compatibility: Existing manifests without the new field keep current behavior — the converter's fallback branch is unchanged for that path.

Special notes for reviewers

  • The new field lives on the shared InlineAuthzConfig type (also reachable from MCPServer and MCPRemoteProxy) but is documented as vMCP-only and has no effect elsewhere. This matches the SubjectProviderName precedent. The alternative — introducing a vMCP-specific VirtualMCPInlineAuthzConfig and AuthzConfigRef — was rejected as a larger refactor with knock-on changes in the converter, controllerutil.GenerateOrUpdateInlineAuthzConfigMap, and tests. Happy to revisit if reviewers prefer the stricter typing.
  • Out of scope for this PR: the Cedar opaque-token fallback fix (root cause 3 in VirtualMCPServer Cedar authorization rejects all requests when upstream OIDC provider issues opaque access tokens #5146) lives in pkg/authz/authorizers/cedar/core.go and will be a separate change. This PR only makes the auto-binding overridable.
  • The second commit (Address code review feedback) reflects feedback from /review-code-for-issue and adds the "explicit name without embedded AS" rejection plus a converter test that actually exercises ResolveUpstreamName normalization (the previous test was misleading because ResolveUpstreamName is the identity function for non-empty input).

Adds an optional primaryUpstreamProvider field to the inline authz config
on VirtualMCPServer so users with multiple upstream IDPs can pin Cedar to
a non-first provider, instead of being silently bound to whichever
upstream happens to be listed first.

Changes for issue #5197:
- Add PrimaryUpstreamProvider to InlineAuthzConfig (shared type, vMCP-only
  in practice, mirroring the SubjectProviderName precedent on the token-
  exchange and AWS-STS strategies).
- Switch the converter from unconditional first-upstream binding to an
  explicit-then-fallback resolution; both branches normalize through
  authserver.ResolveUpstreamName.
- Reject the spec with AuthServerConfigValidated=False
  (AuthzUpstreamUnknown) when the explicit name does not match any
  declared upstream — Cedar would otherwise deny every request at runtime.
- Suppress the AuthzUpstreamSelectionWarning advisory when the user has
  set the field explicitly; the auto-selection it warns about is no
  longer happening.
- Extend converter and validator tests; regenerate CRD YAMLs and API
  docs.

Existing manifests without the new field keep current behavior — the
fallback branch is unchanged for that path.
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels May 5, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.72%. Comparing base (9572f6a) to head (47fb9af).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...d/thv-operator/controllers/mcpserver_controller.go 28.57% 14 Missing and 1 partial ⚠️
...-operator/controllers/mcpremoteproxy_controller.go 46.66% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5199      +/-   ##
==========================================
+ Coverage   67.65%   67.72%   +0.06%     
==========================================
  Files         607      607              
  Lines       61982    62241     +259     
==========================================
+ Hits        41937    42153     +216     
- Misses      16883    16925      +42     
- Partials     3162     3163       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tgrunnagle tgrunnagle force-pushed the authz-provider_issue_5197 branch from f24fcbe to b5d2821 Compare May 5, 2026 19:55
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels May 5, 2026
Fixed issues from code review:
- MEDIUM: Reject explicit primaryUpstreamProvider when no embedded auth
  server is configured. The early-return direct-IdP branch in
  validateAuthzUpstreamAvailable now checks for a non-empty explicit
  name first and returns SpecValidationError with
  ConditionReasonAuthzUpstreamUnknown when set — closing the silent
  misconfiguration where the converter would forward an unresolvable
  name into Cedar config at runtime.
- MEDIUM: Update the converter block comment so it accurately describes
  both rejection paths (mismatch with declared upstreams AND explicit
  name without an embedded AS), keeping the comment synchronized with
  the validator's behavior per the go-style.md rule.
- MEDIUM: Replace the misleading "is normalized via ResolveUpstreamName"
  converter test with a case that actually exercises normalization
  (upstream Name:"" resolving to "default", user pinning to "default").
  Removes redundancy with the single-upstream-honored case and matches
  the test's claimed assertion.
- MEDIUM: Add validator test case for explicit primaryUpstreamProvider
  with no embedded auth server, locking the new rejection in.
@tgrunnagle tgrunnagle force-pushed the authz-provider_issue_5197 branch from b5d2821 to b1499ed Compare May 5, 2026 20:04
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels May 5, 2026
Copy link
Copy Markdown
Contributor Author

@tgrunnagle tgrunnagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test

Copy link
Copy Markdown
Contributor Author

@tgrunnagle tgrunnagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent Consensus Review

Agents consulted: kubernetes-operator, security, architecture, test-coverage, go-quality, review-friendliness, general/cross-cutting. Phase 4.5 codex cross-review skipped (CLI not installed).

Consensus Summary

# Finding Consensus Severity Action
F1 Field on shared InlineAuthzConfig is silent no-op on MCPServer/MCPRemoteProxy 10/10 MEDIUM Fix
F2 validateAuthzUpstreamAvailable size + 3× duplicated rejection sequence 10/10 MEDIUM Fix
F3 Helper in wrong package; converter inlines duplicate 9/10 MEDIUM Fix
F4 Converter forwards explicit name without AS — defense in depth 9/10 MEDIUM Fix
F5 ConfigMap-sourced authz bypasses validation, no tests, TODO not tracked 9/10 MEDIUM Fix
F6 Doc comment on validateAuthzUpstreamAvailable not updated for new behavior 8/10 MEDIUM Fix
F7 No test for clearing stale AuthzUpstreamUnknown condition 8/10 MEDIUM Fix
F8 Mutable matched bool — use slices.ContainsFunc 8/10 LOW Fix
F9 Stale comment "Leaving PrimaryUpstreamProvider empty…" in converter 8/10 LOW Fix
F10 ConditionReasonAuthzUpstreamUnknown collapses two distinct paths 7/10 LOW Fix
F11 Free-string field not pre-validated against DNS-label regex 7/10 LOW Fix
F12 kubebuilder doc omits inline-only constraint 7/10 LOW Fix
F13 Helper name verbose / misleading 7/10 LOW Fix
F14 PR description missing "where to focus" hint 7/10 LOW Body-only

Overall

The PR is well-scoped and addresses a real, documented user pain (Cedar evaluating against the wrong upstream IDP because PrimaryUpstreamProvider was hard-bound to whichever upstream was listed first). The core inline-authz fix is correct, idiomatic, and well-tested for the inline path; the converter switch + validator rejection branches form a sensible explicit-then-fallback shape, and the multi-upstream advisory is correctly suppressed when the user disambiguates. The SubjectProviderName precedent invoked in the PR description is genuine and the field/condition naming is consistent with it. Reviewing as draft → COMMENT-level event (also forced because the reviewer is the PR author).

The MEDIUM findings cluster around three architectural choices worth deciding before flipping out of draft. F1 (shared InlineAuthzConfig exposes the field on MCPServer and MCPRemoteProxy as a silent no-op) is the most user-visible — neither admission nor a status condition tells operators their override has no effect on those CRDs. F4 (converter forwards an explicit name even when no AS is configured) reduces defense in depth: the validator is the only thing standing between a misconfigured spec and Cedar's deny-every-request fail-closed behavior. F5 (ConfigMap-sourced authz silently bypasses the new admission validator) is the most future-incident-prone gap; the TODO is acknowledged but not tracked.

The remaining findings are mechanical: the validator function has tripled its rejection-sequence boilerplate (F2), the helper is in the wrong package and the converter inlines its logic (F3), one comment is now stale (F9), one doc block is missing the new behavior (F6), one mutable-bool loop violates the immutable-assignment rule (F8). F10–F13 are polish: a single condition reason for two distinct paths, no kubebuilder Pattern on the upstream-name string, doc could mention the inline-only constraint, and the helper name is verbose. None block correctness on the inline-authz happy path.

Documentation

  • cmd/thv-operator/controllers/virtualmcpserver_controller.go:569 — function doc on validateAuthzUpstreamAvailable does not describe the two new rejection branches (F6).
  • cmd/thv-operator/pkg/vmcpconfig/converter.go:199 — "Leaving PrimaryUpstreamProvider empty (no embedded AS or no upstreams)…" comment is no longer accurate after the explicit-forwarding switch (F9).
  • cmd/thv-operator/api/v1beta1/mcpserver_types.go:783 — kubebuilder marker doesn't mention the field is only honored when AuthzConfigRef.Type == 'inline' (F12).
  • F14 (PR-level): This PR exceeds the repo's stated 400 LOC / 10-file guideline because the bulk is generated CRD YAML + auto-generated docs. Per .claude/rules/pr-creation.md, large PRs are acceptable for generated code, but a one-sentence "Where to focus" hint under Special notes would convert the apparent 13-file review into the actual 4 hand-written files (controller, converter, two API types) plus 2 test files.

Generated with Claude Code

@tgrunnagle
Copy link
Copy Markdown
Contributor Author

Inline review detail (companion to the consensus review above)

GitHub's review-comment API is returning persistent 422 pull_request_review_thread.base "internal error" for every inline-comment attempt against this PR right now (verified with single-comment, bisected-half, and position-based payloads). Posting the per-line detail here so it isn't lost. Re-attempt with gh api repos/stacklok/toolhive/pulls/5199/reviews --method POST --input post.json later if you'd like the comments anchored to lines.

Pinned to head b1499ed66b7c8b1d5039f5bc709f61b7fb32baf7.


F1 · cmd/thv-operator/api/v1beta1/mcpserver_types.go:785 · 10/10 MEDIUM

Field on shared InlineAuthzConfig is a silent no-op on MCPServer / MCPRemoteProxy

By adding the field to the shared Go type, the new primaryUpstreamProvider now appears in the OpenAPI schema for MCPServer and MCPRemoteProxy (visible in the regenerated CRD YAMLs). The doc says "Ignored by MCPServer and MCPRemoteProxy", but neither admission validation nor a controller status condition rejects/warns when the field is set on those CRDs — kubectl apply succeeds with no signal that the value has no effect. Per .claude/rules/operator.md Status Condition Parity rule ("a gap means one type silently accepts invalid config that the other rejects") and go-style.md Interface Design / No silent no-ops.

Forward-compatibility risk: if the field is later wired up on those CRs, an existing manifest will silently change behavior at upgrade.

Recommendation (one of):

  1. Move the field onto a vMCP-only authz type so the schema doesn't even expose it on the other CRDs.
  2. Add a +kubebuilder:validation:XValidation rule on MCPServer / MCPRemoteProxy that rejects a non-empty value, with a message pointing at VirtualMCPServer.
  3. Add a status-warning condition on the MCPServer / MCPRemoteProxy reconcile when the field is set.

The SubjectProviderName precedent from MCPExternalAuthConfig is acknowledged in the PR description; the precedent itself is worth re-examining rather than being the bar.

Raised by: kubernetes-operator, security, architecture, review-friendliness, general


F12 · cmd/thv-operator/api/v1beta1/mcpserver_types.go:783 · 7/10 LOW

kubebuilder marker doc omits the inline-only constraint

The doc explains upstream-name semantics, the auto-default, the rejection condition, and which CRDs ignore the field. It does not mention that the field can only ever take effect when the parent AuthzConfigRef.Type == 'inline'. The XValidation on the parent gates inline set/unset, but a user reading just this field's kubectl explain output has no breadcrumb.

Recommendation: Add a sentence such as: "Only honored when the parent AuthzConfigRef.Type == 'inline'; on Type == 'configMap' this value is currently not loaded (see TODO in pkg/vmcpconfig/converter.go)." Then regenerate CRD YAML and docs/operator/crd-api.md.

Raised by: kubernetes-operator


F11 · cmd/thv-operator/api/v1beta1/mcpserver_types.go:784 · 7/10 LOW

Free-string upstream-name field is not pre-validated against the DNS-label regex

pkg/authserver/config.go defines upstreamNameRegex = ^[a-z0-9]([a-z0-9-]*[a-z0-9])?$ for upstream names (intentional, per the comment "to prevent storage key injection"). The new CRD field accepts any string and relies on the controller-side match-loop to fail closed. This works (a bogus value cannot match a declared upstream and produces the rejection), but the user gets a misleading "does not match any upstream" message instead of a clearer "name is structurally invalid" — and kubectl apply succeeds before the controller even sees the spec.

Recommendation: Add a +kubebuilder:validation:Pattern matching the same regex used on the upstream side. This pushes the well-formedness check from the controller into kube-apiserver admission so kubectl apply fails immediately on a malformed value.

	// +optional
	// +kubebuilder:validation:Pattern=`^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`
	PrimaryUpstreamProvider string `json:"primaryUpstreamProvider,omitempty"`

Raised by: security, architecture


F10 · cmd/thv-operator/api/v1beta1/virtualmcpserver_types.go:407 · 7/10 LOW

ConditionReasonAuthzUpstreamUnknown collapses two distinct rejection paths

The same reason is set for two different misconfigurations: (a) primaryUpstreamProvider set without an embedded AS, and (b) primaryUpstreamProvider does not match any declared upstream. The user-facing .Message differs and is helpful, but .Reason is what tooling typically filters on (alertmanager rules, dashboards, kubectl JSON queries) — coalescing the two cases under one reason makes alerting/routing rules less precise.

Recommendation: Split into two reasons such as AuthzPrimaryProviderRequiresAuthServer (no-AS case) and AuthzUpstreamUnknown (mismatch case). If the split is rejected for admission-API stability, expand the const's doc comment to enumerate both meanings explicitly so consumers know to also key off the message.

Raised by: kubernetes-operator, security


F3 · cmd/thv-operator/controllers/virtualmcpserver_controller.go:562 · 9/10 MEDIUM

Helper lives in wrong package; converter inlines a duplicate of the same logic

extractExplicitPrimaryUpstreamProvider is package-private to controllers/. The converter in pkg/vmcpconfig/converter.go:213-215 needs the same lookup but cannot import controllers (cycle), so it inlines the equivalent two-level nil-walk + non-empty check. Two parallel implementations of the same accessor must stay in sync; the planned configMap loader (TODO at converter.go:210) would then need to be applied in two places. Violates go-style.md "Avoid parallel types that drift" at the function level.

Recommendation: Move to api/v1beta1 as a method on *AuthzConfigRef, mirroring MCPGroupRef.GetName() at mcpserver_types.go:756-762:

func (r *AuthzConfigRef) ExplicitPrimaryUpstreamProvider() string {
    if r == nil || r.Inline == nil {
        return ""
    }
    return r.Inline.PrimaryUpstreamProvider
}

Then both the validator (here) and the converter (in vmcpconfig) call the same method. Eliminates the inline duplicate and gives the future configMap-aware variant one obvious place to live.

Raised by: architecture, go-quality, review-friendliness, general

F13 · same line · 7/10 LOW

Helper name extractExplicitPrimaryUpstreamProvider is verbose and extract is misleading

The helper is a 4-line nil-safe field accessor — "extract" implies parsing or transforming. Rename to explicitPrimaryUpstreamProvider, or — paired with F3 — convert to the *AuthzConfigRef method shown above.

Raised by: go-quality, review-friendliness


F6 · cmd/thv-operator/controllers/virtualmcpserver_controller.go:569 · 8/10 MEDIUM

Function doc on validateAuthzUpstreamAvailable not updated for the new behavior

The function-level comment block describes only the original behaviors (require an upstream when an embedded AS is configured, ignore direct-IdP, advisory on multi-upstream). After this PR the validator also rejects (a) explicit primaryUpstreamProvider set with no embedded AS, and (b) explicit name not matching any declared upstream. Per .claude/rules/go-style.md "Keep Comments Synchronized With Code", the doc should describe what the function does today.

Recommendation: Add 1-2 sentences. Suggested:

"When spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider is set explicitly, the validator additionally rejects (a) the direct-IdP case (no embedded AS) because the field is meaningless without an AS, and (b) any name that does not resolve to one of spec.authServerConfig.upstreamProviders."

Raised by: go-quality


F2 · cmd/thv-operator/controllers/virtualmcpserver_controller.go:628 (and 643-677, 695-724) · 10/10 MEDIUM

3× duplicated rejection sequence; function is now ~190 LOC

The same 7-line sequence — RemoveConditionsWithPrefix → INFO log → SetPhase(Failed)SetMessageSetAuthServerConfigValidatedCondition(Reason, msg, False)SetObservedGeneration → return &SpecValidationError{...} — appears in three rejection branches: the existing "no upstreams" branch (lines 643-677), the new "explicit provider without AS" branch (~609-640), and the new "explicit provider doesn't match" branch (lines 695-724).

Drift risk: a fix to one branch's condition reason or phase that misses the others; cognitive load on every cold reviewer who has to compare three near-duplicate blocks; future fourth/fifth rejection reasons compound the problem.

Recommendation: Extract a helper such as:

func rejectAuthzAdmission(
    ctx context.Context, vmcp *mcpv1beta1.VirtualMCPServer,
    statusManager virtualmcpserverstatus.StatusManager,
    reason, userMessage, errSummary string,
    extraLogFields ...any,
) *SpecValidationError {
    statusManager.RemoveConditionsWithPrefix(mcpv1beta1.ConditionTypeAuthzUpstreamSelectionWarning, []string{})
    log.FromContext(ctx).Info(errSummary, append([]any{"name", vmcp.Name, "namespace", vmcp.Namespace, "reason", reason}, extraLogFields...)...)
    statusManager.SetPhase(mcpv1beta1.VirtualMCPServerPhaseFailed)
    statusManager.SetMessage(userMessage)
    statusManager.SetAuthServerConfigValidatedCondition(reason, userMessage, metav1.ConditionFalse)
    statusManager.SetObservedGeneration(vmcp.Generation)
    return &SpecValidationError{Message: errSummary}
}

Each call site collapses to ~3 lines (compute message + delegate) and the pre-existing "no upstreams" branch becomes a fourth caller.

Raised by: architecture, go-quality, review-friendliness, general


F8 · cmd/thv-operator/controllers/virtualmcpserver_controller.go:687 · 8/10 LOW

Mutable matched bool conflicts with the "Immutable Variable Assignment" rule

The matched := false / for { if cond { matched = true; break } } / if !matched shape is exactly what .claude/rules/go-style.md ("Immutable Variable Assignment") asks contributors to avoid. slices.ContainsFunc is in the Go stdlib and reduces this to one immutable expression.

		matched := slices.ContainsFunc(
			vmcp.Spec.AuthServerConfig.UpstreamProviders,
			func(up mcpv1beta1.UpstreamProviderConfig) bool {
				return authserver.ResolveUpstreamName(up.Name) == resolved
			},
		)

Add "slices" to the import block if not already present.

Raised by: architecture, go-quality


F9 · cmd/thv-operator/pkg/vmcpconfig/converter.go:199 · 8/10 LOW

Comment block "Leaving PrimaryUpstreamProvider empty (no embedded AS or no upstreams)…" is no longer accurate post-PR

The pre-existing comment claims "no embedded AS" produces an empty PrimaryUpstreamProvider. After the new switch case, an explicit value is forwarded into the runtime config even when vmcp.Spec.AuthServerConfig == nil (the converter test "explicit primary provider without auth server is forwarded" explicitly locks this in). The new comment block below mentions the explicit override but doesn't retract this stale claim. Per go-style.md "Keep Comments Synchronized With Code".

Recommendation: Tighten the original sentence to describe the post-PR contract precisely. For example:

"Leaving PrimaryUpstreamProvider empty (no upstreams configured AND no explicit override) lets Cedar fall back to claims from the ToolHive-issued token. When the user has set primaryUpstreamProvider explicitly, it is forwarded verbatim — validateAuthzUpstreamAvailable rejects any combination where the runtime would not resolve."

Raised by: security, go-quality


F5 · cmd/thv-operator/pkg/vmcpconfig/converter.go:210 · 9/10 MEDIUM

ConfigMap-sourced authz silently bypasses the new validator + no test coverage + TODO not tracked

The extractExplicitPrimaryUpstreamProvider helper returns "" for Type == "configMap" (no inline). The new admission rejections therefore never fire for configMap users, even though the runtime ConfigMap loader at cmd/thv-operator/pkg/controllerutil/authz.go does unmarshal cedar.ConfigOptions.PrimaryUpstreamProvider from disk — so a user can ship an unresolvable name through that path and only see runtime denials. There's also no validator test pinning the documented "fall through to auto-select" behavior.

This TODO is also not linked to a tracking issue and is paired with another configMap TODO at line 192 (Cedar policies from ConfigMap), so the gap is broader than just this field.

Recommendation:

  1. Open a tracking issue covering both configMap TODOs and reference it from the comment.
  2. Add a validator test case with AuthzConfigRef{Type: "configMap", ConfigMap: ...} + multi-upstream that asserts no error + advisory IS emitted (locks the inline-only contract).
  3. Optional but valuable: emit an advisory condition on configMap-sourced authz when multi-upstream is configured, since the user has no way to override auto-select today and the parent issue's wrong-claim-source incident applies equally.

Raised by: kubernetes-operator, security, architecture, test-coverage


F4 · cmd/thv-operator/pkg/vmcpconfig/converter.go:213 · 9/10 MEDIUM

Converter forwards explicit primaryUpstreamProvider even when no AuthServerConfig is configured — single point of defense

This switch case unconditionally forwards Inline.PrimaryUpstreamProvider whenever it's non-empty — including when vmcp.Spec.AuthServerConfig == nil. The converter test "explicit primary provider without auth server is forwarded" explicitly locks this in as a defined contract. The validator at validateAuthzUpstreamAvailable rejects this combination, but if validation is ever bypassed (a future caller invoking Convert outside the reconcile flow — CLI dry-run, webhook, test harness, refactor reordering), the runtime would receive an unresolvable name with no upstream-token table to look it up in.

Cedar's resolveClaims in pkg/authz/authorizers/cedar/core.go does fail closed at runtime (returns an error → denies request), but that's "deny every request" rather than "fail loudly at admission" — the precise failure mode the validator was added to prevent. .claude/rules/security.md calls for defense in depth here, and go-style.md "Validate Parsed Results — successful parse ≠ valid for use" applies.

Recommendation (in order of preference):

  1. Add a defensive cross-check inside the converter that returns an error if the explicit name does not match a declared upstream (so the converter cannot produce an unresolvable value even if invoked outside the reconcile flow).
  2. Drop the value (fall through to auto-select / nothing) when the spec is internally inconsistent; preserve the validator as the primary user-facing fail-loud point.
  3. At minimum, document the precondition prominently on Converter.Convert per go-style.md "Document Architectural Constraints on Exported Functions".

Raised by: security, architecture, general


F7 · cmd/thv-operator/controllers/virtualmcpserver_controller_test.go:3718 · 8/10 MEDIUM

No test verifying the validator clears a stale AuthServerConfigValidated=False, reason=AuthzUpstreamUnknown condition after the spec is fixed

TestVirtualMCPServerValidateAuthzUpstreamAvailable_ClearsStaleWarning (just below at line 3801) covers the recovery path for the advisory AuthzUpstreamSelectionWarning condition. There is no parallel test for the new AuthServerConfigValidated=False, reason=AuthzUpstreamUnknown failure condition. A regression where the validator forgets to transition the condition back to True (or otherwise clear it) on a corrected spec would silently leave VMCPs stuck in Failed phase forever — the unit test for the rejection would still pass.

Recommendation: Add a test analogous to _ClearsStaleWarning that pre-populates the VMCP status with AuthServerConfigValidated=False, reason=AuthzUpstreamUnknown and Phase=Failed, then runs the validator with a corrected spec (e.g., primaryUpstreamProvider updated from "ping""okta" matching a real upstream), and asserts the condition transitions to True and the phase is no longer Failed. This locks the recoverability contract.

Raised by: test-coverage


Generated with Claude Code

tgrunnagle added 3 commits May 6, 2026 11:39
Addresses #5199 review comments:
- MEDIUM cmd/thv-operator/api/v1beta1/mcpserver_types.go:785 (F1):
  surface advisory condition AuthzPrimaryUpstreamProviderIgnored on
  MCPServer/MCPRemoteProxy reconcilers when primaryUpstreamProvider is
  set; the field has no embedded auth server to honor it.
- MEDIUM cmd/thv-operator/controllers/virtualmcpserver_controller.go:562
  (F3): move the explicit-provider lookup to AuthzConfigRef.
  ExplicitPrimaryUpstreamProvider() so the validator and the converter
  share one accessor.
- LOW mcpserver_types.go:784 (F11): add kubebuilder Pattern matching the
  upstream-name regex from pkg/authserver/config.go so kubectl apply
  rejects malformed values up-front.
- LOW mcpserver_types.go:783 (F12): doc the inline-only constraint and
  point at the configMap TODO in vmcpconfig/converter.go.
- LOW virtualmcpserver_controller.go:562 (F13): renamed via the move in
  F3 — the new method has the cleaner name.
Addresses #5199 review comments:
- MEDIUM cmd/thv-operator/controllers/virtualmcpserver_controller.go:628
  (F2): extract rejectAuthzAdmission to dedupe the boilerplate (clear
  advisory + log + Phase=Failed + condition + ObservedGeneration +
  return *SpecValidationError) used by the two new authz rejection
  branches.
- MEDIUM virtualmcpserver_controller.go:569 (F6): expand the function
  doc to describe the new explicit-primaryUpstreamProvider rejections
  alongside the original behaviors.
- LOW virtualmcpserver_controller.go:687 (F8): replace the
  matched-bool/break loop with slices.ContainsFunc.

The "no upstreams declared" branch (line 635) is intentionally not
folded into the helper because it returns a plain error (causing
requeue) rather than *SpecValidationError (no requeue) — preserving
the existing semantics for that branch.
Addresses #5199 review comments:
- MEDIUM cmd/thv-operator/pkg/vmcpconfig/converter.go:213 (F4): add a
  defensive cross-check so Convert refuses to produce an unresolvable
  PrimaryUpstreamProvider — even if invoked outside the reconcile
  flow (CLI dry-run, webhook, test harness). Errors when the explicit
  name has no embedded auth server, or doesn't match any declared
  upstream. Validator remains the primary user-facing fail-loud point.
- LOW converter.go:199 (F9): retract the stale "no embedded AS leaves
  PrimaryUpstreamProvider empty" claim and describe the post-PR
  contract precisely.
- LOW cmd/thv-operator/api/v1beta1/virtualmcpserver_types.go:407 (F10):
  introduce ConditionReasonAuthzPrimaryProviderRequiresAuthServer for
  the no-AS case and keep ConditionReasonAuthzUpstreamUnknown for the
  mismatch case, so dashboards/alerts can route the two
  misconfigurations separately. Updated the validator branch and the
  matching test to use the new reason.

Updated converter test "explicit primary provider without auth server
is forwarded" → "is rejected" to reflect the new fail-closed contract,
and added a sibling test for the upstream-mismatch case.
Addresses #5199 review comments:
- MEDIUM cmd/thv-operator/pkg/vmcpconfig/converter.go:210 (F5):
  - File tracking issue #5208 covering both configMap
    TODOs (policy load + primaryUpstreamProvider override) and
    reference it from the comments.
  - Add validator test
    TestVirtualMCPServerValidateAuthzUpstreamAvailable_ConfigMapFallThrough
    that pins the documented inline-only contract: configMap-sourced
    authz never trips the new admission rejections, the multi-upstream
    advisory still fires, and configMap users get auto-selection of
    the first upstream.
- MEDIUM cmd/thv-operator/controllers/virtualmcpserver_controller_test.go
  :3718 (F7): add
  TestVirtualMCPServerValidateAuthzUpstreamAvailable_ClearsStaleAuthzUnknown
  table with both new failure reasons. Mirrors the production
  reconcile order (validateAuthServerConfig → validateAuthzUpstreamAvailable)
  to lock in the recoverability contract: a corrected spec
  transitions AuthServerConfigValidated back to True and the stale
  rejection reason does not survive.
@tgrunnagle
Copy link
Copy Markdown
Contributor Author

Addressed all 13 findings from the consensus review across 4 commits.

Finding Severity Commit
F1 (silent no-op on MCPServer/MCPRemoteProxy) MEDIUM c15f615
F3 (helper to AuthzConfigRef method) MEDIUM c15f615
F11 (kubebuilder Pattern marker) LOW c15f615
F12 (inline-only doc) LOW c15f615
F13 (helper rename) LOW c15f615
F2 (extract rejectAuthzAdmission helper) MEDIUM da56b33
F6 (validator doc updated) MEDIUM da56b33
F8 (slices.ContainsFunc) LOW da56b33
F4 (defensive converter cross-check) MEDIUM 722853a
F9 (stale comment retracted) LOW 722853a
F10 (split ConditionReason) LOW 722853a
F5 (configMap test + tracking issue) MEDIUM 47fb9af
F7 (recovery test) MEDIUM 47fb9af

Notes:

  • F1 went with option 3 (status-warning condition) per maintainer choice — XValidation was the more surgical alternative but soft-signal was preferred to keep the shared InlineAuthzConfig type intact and avoid breaking existing manifests.
  • F4 went with option 1 (defensive cross-check returning error) — Convert now refuses to produce an unresolvable PrimaryUpstreamProvider even if invoked outside the reconcile flow.
  • F10 went with split into two reasonsAuthzPrimaryProviderRequiresAuthServer (no-AS case) and AuthzUpstreamUnknown (mismatch case).
  • F5 went with test + tracking issue only, not the optional advisory condition. Filed vMCP: Load authz config from ConfigMap (policies + primaryUpstreamProvider) #5208 covering both configMap TODOs (policy load + primaryUpstreamProvider override). Both TODOs in the converter now reference vMCP: Load authz config from ConfigMap (policies + primaryUpstreamProvider) #5208.
  • The "no upstreams declared" branch in validateAuthzUpstreamAvailable was intentionally not folded into the F2 helper — it returns a plain error (causing requeue) rather than *SpecValidationError (no requeue), preserving existing semantics.

@tgrunnagle tgrunnagle marked this pull request as ready for review May 6, 2026 20:41
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/M Medium PR: 300-599 lines changed size/L Large PR: 600-999 lines changed labels May 6, 2026
@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose explicit primaryUpstreamProvider for Cedar authz on VirtualMCPServer

1 participant