Summary
defaultMultiSessionFactory.RestoreSession at pkg/vmcp/session/factory.go:560-567 reconstructs an *auth.Identity from stored session metadata with only the Subject field populated. Token, UpstreamTokens, TokenType, Metadata, and PrincipalInfo are all zero-valued. That partial identity is then passed through makeBaseSession → initOneBackend → backend.NewHTTPConnector and captured by identityRoundTripper in pkg/vmcp/session/internal/backend/mcp_session.go:271.
For deployments configured with Redis-backed session storage and replicas > 1 (an opt-in multi-replica topology supported by the operator and documented in docs/arch/13-vmcp-scalability.md), this manifests as a cross-pod failover failure: after RestoreSession runs on a different pod, every backend tool call using upstreamInject / tokenExchange / aws_sts outgoing auth goes out with an empty UpstreamTokens map and returns 401.
Single-replica deployments with the default LocalSessionDataStorage never hit RestoreSession and are unaffected by the visible symptom.
Once #5323 (and its sibling for mcp_session.go's round-tripper) lands, the captured identity is no longer used to override req.Context(), so the visible failure mode goes away. But the RestoreSession code still constructs and propagates the partial identity to anything that reads it — a landmine for any future consumer (decorator, hook, telemetry, audit log) that assumes a non-nil *auth.Identity is fully populated.
This issue is filed to fix the underlying reconstruction itself, not just its current consumer.
Code reference
pkg/vmcp/session/factory.go:560-567:
// Reconstruct a minimal identity from stored metadata. The original bearer
// token is never persisted (only its HMAC-SHA256 hash is), so Token is empty.
// The security decorator is restored from the stored hash/salt below.
var identity *auth.Identity
if subject := storedMetadata[MetadataKeyIdentitySubject]; subject != "" {
identity = &auth.Identity{}
identity.Subject = subject
}
The comment acknowledges Token is intentionally empty. What the comment does not mention is that the resulting half-populated *auth.Identity is then handed to consumers that have no way to know it's partial.
Downstream propagation:
pkg/vmcp/session/factory.go:580 — makeBaseSession(ctx, id, identity, …)
pkg/vmcp/session/factory.go:446 — initOneBackend(ctx, b, identity, …)
pkg/vmcp/session/factory.go:235 — f.connector(bCtx, target, identity, sessionHint) (where connector is backend.NewHTTPConnector(registry), wired at factory.go:194)
pkg/vmcp/session/internal/backend/mcp_session.go:271 — identityRoundTripper{base: base, identity: identity} (captures the partial identity)
Why this is a separate concern from #5323 and the sibling
#5323 and the sibling issue for mcp_session.go's identityRoundTripper both fix the consumption of stale/partial captured identity — they teach the round-trippers to defer to the per-request identity on req.Context() instead of overriding it.
This issue is about production of the partial identity. Even with both round-tripper fixes in:
- The partial identity still flows through
makeBaseSession and initOneBackend. Any new code on those paths that reads identity.UpstreamTokens or identity.Token will silently get empty values.
- The
MakeSessionWithID path enforces a guard at pkg/vmcp/session/factory.go:368 that rejects bound sessions where identity.Token == "":
if (identity == nil || identity.Token == "") && !allowAnonymous {
return nil, fmt.Errorf("invalid session configuration: cannot create bound session ...")
}
RestoreSession bypasses that guard. The invariant "a bound session has a non-empty Token" — explicitly enforced for new sessions — is silently violated for restored ones.
- The
*auth.Identity type carries an immutability contract in its godoc (pkg/auth/identity.go:64: "MUST NOT be mutated after the Identity is placed in the request context") but no contract about field completeness. There's no compile-time or runtime signal that a returned *auth.Identity is partial.
Risk analysis
| Scenario |
Current behavior |
After #5323 + sibling |
After this fix |
| Tool call after cross-pod failover (Redis + replicas > 1, upstream auth) |
401 (captured empty identity wins) |
Works (context identity wins) |
Works |
Future code reads identity.UpstreamTokens from a defaultMultiSession or decorator |
Silently empty |
Silently empty |
Either gets nil identity (explicit) or correct identity |
Future code reads identity.Token (e.g., for a new outgoing-auth strategy) |
Silently empty |
Silently empty |
Same |
| Audit log / telemetry includes identity fields |
Reports incomplete fields silently |
Reports incomplete fields silently |
Same |
The pattern matches the project rule "Constructor Validation: Fail Loudly on Invalid Input" (.claude/rules/go-style.md) and the spirit of vMCP anti-pattern #8.
Reproduction
The existing e2e test at test/e2e/thv-operator/virtualmcp/virtualmcp_redis_session_test.go:149-243 ("Should allow a session established on pod A to be reconstructed on pod B") covers cross-pod restore but does not exercise upstream-auth tool calls after restore. Extend it:
- Configure a
VirtualMCPServer with replicas: 2, Redis session storage, and one or more MCPServerEntry backends using upstreamInject outgoing auth against an upstream IDP.
- Authenticate through pod A and exercise a tool call — succeeds.
- Kill pod A's vMCP instance (or force the session to evict from pod A's cache).
- Make the same tool call against pod B (which will trigger
RestoreSession).
- Expected: pod B's
TokenValidator.Middleware populates a fresh identity on req.Context(); the tool call succeeds.
- Actual (with current code): the partial identity reconstructed by
RestoreSession is captured by identityRoundTripper and overrides the fresh context; the upstream returns 401.
Suggested fix
Two options, depending on the team's preference.
Option A — Pass nil and preserve Subject metadata explicitly (recommended)
RestoreSession shouldn't fabricate an *auth.Identity. The Subject metadata can be preserved directly without round-tripping through a partial struct. makeBaseSession already guards all identity dereferences with if identity != nil checks (verified end-to-end: factory.go:491-493, identityRoundTripper.RoundTrip:67, RestoreHijackPrevention takes no identity arg).
// Don't fabricate a partial identity. The original bearer is not persisted;
// downstream consumers must read identity from req.Context() (populated by
// TokenValidator.Middleware on each incoming request).
baseSession, err := f.makeBaseSession(ctx, id, nil, filteredBackends, sessionHints)
if err != nil {
return nil, fmt.Errorf("RestoreSession: failed to rebuild backend connections: %w", err)
}
// Preserve subject metadata across restore for audit and future re-restores.
// makeBaseSession creates a fresh transport session, so the key isn't carried
// forward implicitly — re-write it from stored metadata.
if subject := storedMetadata[MetadataKeyIdentitySubject]; subject != "" {
baseSession.SetMetadata(MetadataKeyIdentitySubject, subject)
}
baseSession is *defaultMultiSession, which embeds transportsession.Session and exposes SetMetadata (already used at factory.go:597 for the hijack-prevention hash/salt). No API change.
Option B — Introduce an explicit RestoredIdentity type
If the Subject needs to remain accessible via the same type used in the live-session path, introduce a distinct type that signals "partial":
type RestoredIdentity struct {
Subject string
}
…and change the signatures that need to accept either. More invasive and adds a type that mostly only RestoreSession produces. Option A is preferred unless there's a concrete consumer that needs the Subject through the same code path as a live *auth.Identity.
Either way: add a doc-comment to auth.Identity clarifying that a non-nil *auth.Identity is always fully populated for its session-binding semantics, and that anonymous / partial states are represented by nil or a distinct type — not by a struct with empty fields.
Acceptance criteria
Severity
Related
Summary
defaultMultiSessionFactory.RestoreSessionatpkg/vmcp/session/factory.go:560-567reconstructs an*auth.Identityfrom stored session metadata with only theSubjectfield populated.Token,UpstreamTokens,TokenType,Metadata, andPrincipalInfoare all zero-valued. That partial identity is then passed throughmakeBaseSession→initOneBackend→backend.NewHTTPConnectorand captured byidentityRoundTripperinpkg/vmcp/session/internal/backend/mcp_session.go:271.For deployments configured with Redis-backed session storage and
replicas > 1(an opt-in multi-replica topology supported by the operator and documented indocs/arch/13-vmcp-scalability.md), this manifests as a cross-pod failover failure: afterRestoreSessionruns on a different pod, every backend tool call usingupstreamInject/tokenExchange/aws_stsoutgoing auth goes out with an emptyUpstreamTokensmap and returns 401.Single-replica deployments with the default
LocalSessionDataStoragenever hitRestoreSessionand are unaffected by the visible symptom.Once #5323 (and its sibling for
mcp_session.go's round-tripper) lands, the captured identity is no longer used to overridereq.Context(), so the visible failure mode goes away. But theRestoreSessioncode still constructs and propagates the partial identity to anything that reads it — a landmine for any future consumer (decorator, hook, telemetry, audit log) that assumes a non-nil*auth.Identityis fully populated.This issue is filed to fix the underlying reconstruction itself, not just its current consumer.
Code reference
pkg/vmcp/session/factory.go:560-567:The comment acknowledges
Tokenis intentionally empty. What the comment does not mention is that the resulting half-populated*auth.Identityis then handed to consumers that have no way to know it's partial.Downstream propagation:
pkg/vmcp/session/factory.go:580—makeBaseSession(ctx, id, identity, …)pkg/vmcp/session/factory.go:446—initOneBackend(ctx, b, identity, …)pkg/vmcp/session/factory.go:235—f.connector(bCtx, target, identity, sessionHint)(whereconnectorisbackend.NewHTTPConnector(registry), wired atfactory.go:194)pkg/vmcp/session/internal/backend/mcp_session.go:271—identityRoundTripper{base: base, identity: identity}(captures the partial identity)Why this is a separate concern from #5323 and the sibling
#5323 and the sibling issue for
mcp_session.go'sidentityRoundTripperboth fix the consumption of stale/partial captured identity — they teach the round-trippers to defer to the per-request identity onreq.Context()instead of overriding it.This issue is about production of the partial identity. Even with both round-tripper fixes in:
makeBaseSessionandinitOneBackend. Any new code on those paths that readsidentity.UpstreamTokensoridentity.Tokenwill silently get empty values.MakeSessionWithIDpath enforces a guard atpkg/vmcp/session/factory.go:368that rejects bound sessions whereidentity.Token == "":RestoreSessionbypasses that guard. The invariant "a bound session has a non-empty Token" — explicitly enforced for new sessions — is silently violated for restored ones.*auth.Identitytype carries an immutability contract in its godoc (pkg/auth/identity.go:64: "MUST NOT be mutated after the Identity is placed in the request context") but no contract about field completeness. There's no compile-time or runtime signal that a returned*auth.Identityis partial.Risk analysis
identity.UpstreamTokensfrom adefaultMultiSessionor decoratornilidentity (explicit) or correct identityidentity.Token(e.g., for a new outgoing-auth strategy)The pattern matches the project rule "Constructor Validation: Fail Loudly on Invalid Input" (
.claude/rules/go-style.md) and the spirit of vMCP anti-pattern #8.Reproduction
The existing e2e test at
test/e2e/thv-operator/virtualmcp/virtualmcp_redis_session_test.go:149-243("Should allow a session established on pod A to be reconstructed on pod B") covers cross-pod restore but does not exercise upstream-auth tool calls after restore. Extend it:VirtualMCPServerwithreplicas: 2, Redis session storage, and one or moreMCPServerEntrybackends usingupstreamInjectoutgoing auth against an upstream IDP.RestoreSession).TokenValidator.Middlewarepopulates a fresh identity onreq.Context(); the tool call succeeds.RestoreSessionis captured byidentityRoundTripperand overrides the fresh context; the upstream returns 401.Suggested fix
Two options, depending on the team's preference.
Option A — Pass
niland preserve Subject metadata explicitly (recommended)RestoreSessionshouldn't fabricate an*auth.Identity. The Subject metadata can be preserved directly without round-tripping through a partial struct.makeBaseSessionalready guards all identity dereferences withif identity != nilchecks (verified end-to-end:factory.go:491-493,identityRoundTripper.RoundTrip:67,RestoreHijackPreventiontakes no identity arg).baseSessionis*defaultMultiSession, which embedstransportsession.Sessionand exposesSetMetadata(already used atfactory.go:597for the hijack-prevention hash/salt). No API change.Option B — Introduce an explicit
RestoredIdentitytypeIf the Subject needs to remain accessible via the same type used in the live-session path, introduce a distinct type that signals "partial":
…and change the signatures that need to accept either. More invasive and adds a type that mostly only
RestoreSessionproduces. Option A is preferred unless there's a concrete consumer that needs the Subject through the same code path as a live*auth.Identity.Either way: add a doc-comment to
auth.Identityclarifying that a non-nil*auth.Identityis always fully populated for its session-binding semantics, and that anonymous / partial states are represented bynilor a distinct type — not by a struct with empty fields.Acceptance criteria
RestoreSessionno longer produces a half-populated*auth.Identityand is internally consistent with theToken-required invariant enforced for new sessions inpkg/vmcp/session/factory.go:368.MetadataKeyIdentitySubjectis preserved across restore (regression guard — verify it's set on the restored session when present instoredMetadata).auth.Identitygodoc states the field-completeness contract.test/e2e/thv-operator/virtualmcp/virtualmcp_redis_session_test.go: cross-pod restore followed by a tool call against a backend withupstreamInjectoutgoing auth succeeds end-to-end (depends on [vMCP] identityPropagatingRoundTripper captures stale identity, bypassing per-request upstream token refresh #5323 + sibling fix being in).pkg/vmcp/session/constructs an&auth.Identity{Subject: …}withTokenandUpstreamTokensunset.Severity
replicas > 1+ an upstream-auth strategy: bug. Every backend tool call after a cross-pod restore fails until a fresh MCP session is established. Worth filing asbugand pulling forward with [vMCP] identityPropagatingRoundTripper captures stale identity, bypassing per-request upstream token refresh #5323 + the sibling fix.Related
pkg/vmcp/client/client.go'sidentityPropagatingRoundTripper).pkg/vmcp/session/internal/backend/mcp_session.go'sidentityRoundTripper.docs/arch/13-vmcp-scalability.md— multi-replica deployment topology this affects.