You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The hijackPreventionDecorator in pkg/vmcp/session/internal/security/security.go binds an MCP session to an HMAC hash of the raw bearer-token bytes at session creation, and rejects any subsequent request whose token bytes don't hash to the same value. This treats every legitimate OAuth2 refresh-token grant (RFC 6749 §6) as a session hijack attempt — the new AT has different bytes, the hash mismatches, the session is terminated with ErrUnauthorizedCaller ("caller identity does not match session owner").
The MCP streamable-HTTP transport terminates sessions via HTTP 404, never via token rotation, and requires Authorization on every request "even if they are part of the same logical session." Conformant clients keep the same Mcp-Session-Id across token refreshes — and trigger this bug every time their AT rotates.
Reproduction
Reproduced end-to-end on a kind cluster with the in-repo reproducer at deploy/session-debug/. Key config in 03-vmcp-debug.yaml:
pkg/vmcp/session/factory.go:525 — construction site (PreventSessionHijacking).
pkg/vmcp/server/sessionmanager/factory.go:252-262 — the invocation site: the per-tool SDK handler closure captures the decorated session and calls capturedSess.CallTool, which is where validateCaller actually runs and where the session-termination side-effect is triggered.
The decorator overrides CallTool, ReadResource, GetPrompt. Initialization and listing are unaffected, which makes the failure intermittent from a UX perspective: the client looks healthy until the user actually invokes a tool.
Note: pkg/vmcp/server/adapter/handler_factory.go:97 calls backendClient.CallTool directly (no session decorator in the path) — this is a parallel handler factory used in non-session-managed flows, not the path under discussion. Mentioned to disambiguate.
Why this is a bug, not a defensible design
MCP transport semantics: streamable-HTTP transport (docs/arch/03-transport-architecture.md:141,497,515) treats Mcp-Session-Id as transport state independent of the Authorization header — the spec terminates sessions via HTTP 404, never via token rotation, and requires auth on every request even within a session. Returning Unauthorized: caller identity does not match session owner on a refreshed AT is not a defined session-termination signal.
OAuth conformance: RFC 6749 §6 + the OAuth Security BCP (RFC 9700 §4.14) require short ATs and refresh-token rotation as the default deployment model. Binding session validity to AT bytes is incompatible with that model.
Wrong primitive: if real sender-constraint is required, the standards are DPoP (RFC 9449) and mTLS (RFC 8705), both of which explicitly preserve the binding across refresh (RFC 9449 §5: refresh token MUST be bound to the same key; RFC 8705 §4: refresh token SHOULD be bound to the same certificate). Hashing token bytes provides neither sender-constraint nor refresh-safety.
Marginal protection given up is small: the only attack the current check blocks is "attacker obtained a session ID but not the user's token, and has their own valid token for a different identity." In practice, session ID and bearer token travel in the same HTTP request over the same TLS connection; capturing one almost always means capturing both. Authorization decisions and upstream-token selection are keyed off JWT claims per-request (pkg/auth/token.go:1203, authRoundTripper.Authenticate), not session-bound state.
Proposed fix
Bind sessions to the tsid claim in the signed JWT instead of raw token bytes.
tsid is set by the embedded authserver at pkg/authserver/server/session/session.go:57,116-123, lives in the signed JWT payload (unforgeable without the AS signing key), and is stable across refresh because fosite reuses the persisted authorize-session on refresh_token grants (pkg/authserver/server/handlers/token.go:19-28). Downstream code at pkg/auth/token.go:570-573,1184-1199 already treats it as the stable auth-session identifier (used to look up upstream tokens).
The reproducer in this issue confirmed same tsid and same sub across a refresh_token exchange while same bytes = False.
Replace boundTokenHash with boundSessionID (the tsid value).
Update both PreventSessionHijacking and RestoreHijackPrevention.
Drop the per-session salt + HMAC machinery (no longer needed; tsid is already an unforgeable signed claim).
Fallback for non-embedded-AS deployments (JWT from an external IdP without tsid): bind on iss + sub per OIDC Core §2 (sub is issuer-scoped, not global, so iss is required to disambiguate across multi-IdP deployments). Document explicitly. Use client_id only for audit logging, not for matching.
Keep the anonymous-session-upgrade prevention at security.go:111-119; it does not depend on the token-bytes hash and survives the fix.
Migration
The persisted session metadata schema changes (vmcp.token.hash / vmcp.token.salt → vmcp.session.tsid). Options for existing sessions on upgrade:
Invalidate on read: sessions without the new key are rejected, forcing one re-auth at deploy time. Simple.
Read both, write new: tolerate old-format sessions until they expire naturally. Smoother rollout.
After filing this issue we (a) verified the reproduction end-to-end on a kind cluster using deploy/session-debug/06-repro-session-bind.py (full script self-contained: DCR registration, PKCE auth, MCP initialize, refresh_token grant, tools/call with refreshed AT — the response body contains the exact caller identity does not match session owner text), and (b) consulted a three-expert panel (Go, OAuth, MCP) on the two open design points. Both came back unanimous.
Design change: drop the two-tier proposal, bind on iss + sub only
The issue originally proposed tsid claim (primary) with iss + sub (fallback). Investigation revealed two problems with the two-tier approach:
tsid is filtered out of *auth.Identity.Claims by filterInternalClaims() at pkg/auth/context.go:96-111 for a documented security reason ("exposing tsid widens the attack surface if a webhook receiver is compromised"). Using tsid at the security decorator would require adding a TokenSessionID field to *auth.Identity and populating it from raw claims pre-filter — touching three pkg/auth files for one consumer (violates .claude/rules/vmcp-anti-patterns.mdStart simple hardcoded registry #8).
External IdPs (Keycloak, Entra, Okta) don't emit tsid. For any non-embedded-AS deployment the "fallback" IS the primary path. A two-tier design that's "fallback-only" in production isn't worth the complexity.
iss + sub alone is threat-model adequate. The check defends against "attacker with stolen Mcp-Session-Id + their own token for a different identity" — iss + sub defeats that completely. The granularity tsid would buy (concurrent OAuth sessions for the same user) is same-user-vs-self, not a hijack, and is not in any MCP client's observed behavior.
Migration: invalidate on read, not rebind-on-first-use
The issue presented "invalidate on read" and "read both, write new" as equivalent options. The panel was unanimous that read-both-write-new is unsafe: the rebind-on-first-use window re-creates the exact attack the check exists to prevent — an attacker with a stolen session-id and their own valid token could claim an orphaned legacy session during the window. Invalidate-on-read fails closed: legacy-format sessions are rejected on restore, the client gets the same result.isError=true MCP error it already handles by re-initializing, one forced re-auth per active session at deploy.
Code-site clarification
The decorator's invocation point is pkg/vmcp/server/sessionmanager/factory.go:252-262 (the per-tool SDK handler closure that captures the decorated session and calls capturedSess.CallTool). This is where the bug's session-termination side effect is triggered and where the regression test will exercise the fix. pkg/vmcp/server/adapter/handler_factory.go:97 is a parallel (non-session-managed) path that bypasses the decorator and is not under discussion here.
Plan document
Full implementation plan with commit-level breakdown lives in vmcp-session-binding-fix-plan.md in the repo root (not committed; intended to drive the PR).
Summary
The
hijackPreventionDecoratorinpkg/vmcp/session/internal/security/security.gobinds an MCP session to an HMAC hash of the raw bearer-token bytes at session creation, and rejects any subsequent request whose token bytes don't hash to the same value. This treats every legitimate OAuth2 refresh-token grant (RFC 6749 §6) as a session hijack attempt — the new AT has different bytes, the hash mismatches, the session is terminated withErrUnauthorizedCaller("caller identity does not match session owner").The MCP streamable-HTTP transport terminates sessions via HTTP 404, never via token rotation, and requires
Authorizationon every request "even if they are part of the same logical session." Conformant clients keep the sameMcp-Session-Idacross token refreshes — and trigger this bug every time their AT rotates.Reproduction
Reproduced end-to-end on a kind cluster with the in-repo reproducer at
deploy/session-debug/. Key config in03-vmcp-debug.yaml:The reproducer script
deploy/session-debug/06-repro-session-bind.pyis self-contained:AT₁ + RT₁initializewithAT₁→ capturesMcp-Session-Id(session is now bound tohash(AT₁))tools/list→ 200 (sanity)POST /oauth/tokenwithgrant_type=refresh_token→AT₂. Confirmed: samesub, sametsid, different bytes.tools/call <real-tool>withAT₂and the sameMcp-Session-Id→ bug fires.Observed in the response body (HTTP 200, JSON-RPC
result.isError=true):{"jsonrpc":"2.0","id":3,"result":{"content":[ {"type":"text","text":"Unauthorized: caller identity does not match session owner"} ],"isError":true}}And in the vMCP logs:
The session is removed from storage; the client must re-establish.
Code site
pkg/vmcp/session/internal/security/security.go:65—hashTokenHMACs the raw bearer-token bytes.pkg/vmcp/session/internal/security/security.go:107—validateCallerperforms the comparison.pkg/vmcp/session/internal/security/security.go:145— the sole AT-hash comparison site.pkg/vmcp/session/internal/security/security.go:207-260—RestoreHijackPrevention(cross-pod state restore, Add RestoreHijackPrevention and RestoreSession interface stub #4405) has the same flaw.pkg/vmcp/session/factory.go:525— construction site (PreventSessionHijacking).pkg/vmcp/server/sessionmanager/factory.go:252-262— the invocation site: the per-tool SDK handler closure captures the decorated session and callscapturedSess.CallTool, which is wherevalidateCalleractually runs and where the session-termination side-effect is triggered.The decorator overrides
CallTool,ReadResource,GetPrompt. Initialization and listing are unaffected, which makes the failure intermittent from a UX perspective: the client looks healthy until the user actually invokes a tool.Note:
pkg/vmcp/server/adapter/handler_factory.go:97callsbackendClient.CallTooldirectly (no session decorator in the path) — this is a parallel handler factory used in non-session-managed flows, not the path under discussion. Mentioned to disambiguate.Why this is a bug, not a defensible design
docs/arch/03-transport-architecture.md:141,497,515) treatsMcp-Session-Idas transport state independent of theAuthorizationheader — the spec terminates sessions via HTTP 404, never via token rotation, and requires auth on every request even within a session. ReturningUnauthorized: caller identity does not match session owneron a refreshed AT is not a defined session-termination signal.pkg/auth/token.go:1203,authRoundTripper.Authenticate), not session-bound state.Proposed fix
Bind sessions to the
tsidclaim in the signed JWT instead of raw token bytes.tsidis set by the embedded authserver atpkg/authserver/server/session/session.go:57,116-123, lives in the signed JWT payload (unforgeable without the AS signing key), and is stable across refresh because fosite reuses the persisted authorize-session onrefresh_tokengrants (pkg/authserver/server/handlers/token.go:19-28). Downstream code atpkg/auth/token.go:570-573,1184-1199already treats it as the stable auth-session identifier (used to look up upstream tokens).same tsidandsame subacross arefresh_tokenexchange whilesame bytes = False.boundTokenHashwithboundSessionID(thetsidvalue).PreventSessionHijackingandRestoreHijackPrevention.tsidis already an unforgeable signed claim).tsid): bind oniss + subper OIDC Core §2 (subis issuer-scoped, not global, soissis required to disambiguate across multi-IdP deployments). Document explicitly. Useclient_idonly for audit logging, not for matching.security.go:111-119; it does not depend on the token-bytes hash and survives the fix.Migration
The persisted session metadata schema changes (
vmcp.token.hash/vmcp.token.salt→vmcp.session.tsid). Options for existing sessions on upgrade:Related
Update — implementation plan & design refinement
After filing this issue we (a) verified the reproduction end-to-end on a kind cluster using
deploy/session-debug/06-repro-session-bind.py(full script self-contained: DCR registration, PKCE auth, MCPinitialize,refresh_tokengrant,tools/callwith refreshed AT — the response body contains the exactcaller identity does not match session ownertext), and (b) consulted a three-expert panel (Go, OAuth, MCP) on the two open design points. Both came back unanimous.Design change: drop the two-tier proposal, bind on
iss + subonlyThe issue originally proposed
tsidclaim (primary) withiss + sub(fallback). Investigation revealed two problems with the two-tier approach:tsidis filtered out of*auth.Identity.ClaimsbyfilterInternalClaims()atpkg/auth/context.go:96-111for a documented security reason ("exposing tsid widens the attack surface if a webhook receiver is compromised"). Usingtsidat the security decorator would require adding aTokenSessionIDfield to*auth.Identityand populating it from raw claims pre-filter — touching three pkg/auth files for one consumer (violates.claude/rules/vmcp-anti-patterns.mdStart simple hardcoded registry #8).tsid. For any non-embedded-AS deployment the "fallback" IS the primary path. A two-tier design that's "fallback-only" in production isn't worth the complexity.iss + subalone is threat-model adequate. The check defends against "attacker with stolenMcp-Session-Id+ their own token for a different identity" —iss + subdefeats that completely. The granularitytsidwould buy (concurrent OAuth sessions for the same user) is same-user-vs-self, not a hijack, and is not in any MCP client's observed behavior.Migration: invalidate on read, not rebind-on-first-use
The issue presented "invalidate on read" and "read both, write new" as equivalent options. The panel was unanimous that read-both-write-new is unsafe: the rebind-on-first-use window re-creates the exact attack the check exists to prevent — an attacker with a stolen session-id and their own valid token could claim an orphaned legacy session during the window. Invalidate-on-read fails closed: legacy-format sessions are rejected on restore, the client gets the same
result.isError=trueMCP error it already handles by re-initializing, one forced re-auth per active session at deploy.Code-site clarification
The decorator's invocation point is
pkg/vmcp/server/sessionmanager/factory.go:252-262(the per-tool SDK handler closure that captures the decorated session and callscapturedSess.CallTool). This is where the bug's session-termination side effect is triggered and where the regression test will exercise the fix.pkg/vmcp/server/adapter/handler_factory.go:97is a parallel (non-session-managed) path that bypasses the decorator and is not under discussion here.Plan document
Full implementation plan with commit-level breakdown lives in
vmcp-session-binding-fix-plan.mdin the repo root (not committed; intended to drive the PR).