feat(vmcp): plumb operational.timeouts.default into request timeouts#5205
Open
thomasandal wants to merge 1 commit intostacklok:mainfrom
Open
feat(vmcp): plumb operational.timeouts.default into request timeouts#5205thomasandal wants to merge 1 commit intostacklok:mainfrom
thomasandal wants to merge 1 commit intostacklok:mainfrom
Conversation
…imeout The CRD field `VirtualMCPServer.spec.config.operational.timeouts.default` exists and is documented as "the default timeout for backend requests" but is dead config in v0.26.1: it's parsed (config.go:524), defaulted (defaults.go:54) and validated (validator.go:367) but never read by the request-path code. The actual ceiling is hardcoded at `pkg/vmcp/session/internal/backend/mcp_session.go:35` (`defaultBackendRequestTimeout = 30 * time.Second`) and applied on lines 298-304 to both the streamable-HTTP `http.Client.Timeout` and the mark3labs SDK's `WithHTTPTimeout`. Tool calls that take >30s upstream — long ClickHouse aggregations, deep Perplexity research, heavy GitHub searches — surface as 502 from any nginx in front of vmcp because vmcp closes the backend HTTP client before the response arrives. Plumb the existing config through: - `NewHTTPConnector` and `createMCPClient` accept a `requestTimeout time.Duration`; 0 falls back to the existing 30s constant so callers that pass nothing keep the old behavior. - New `session.WithBackendRequestTimeout` factory option mirrors the existing `WithBackendInitTimeout`. - `NewSessionFactory` pre-applies opts to a sentinel struct so the connector can be constructed with the configured timeout (option funcs are pure setters, so double-application is idempotent). - `cmd/vmcp/cli/serve.go` reads `cfg.Operational.Timeouts.Default` and passes it to `createSessionFactory`, which forwards it via the new option. Nil-safe — falls back to 30s when `Operational` or `Timeouts` is unset, preserving upstream defaults. SSE backends are unaffected: their stream lifetime is unbounded by design and `http.Client.Timeout` is intentionally omitted on that code path. All existing tests pass; updated four call sites in mcp_session_test.go and serve_test.go to pass 0 for the new timeout arg (preserves prior behavior).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The CRD field
VirtualMCPServer.spec.config.operational.timeouts.defaultis documented as "the default timeout for backend requests" but is currently dead config. It's defined (pkg/vmcp/config/config.go:524), defaulted (pkg/vmcp/config/defaults.go:54) and validated (pkg/vmcp/config/validator.go:367), but never read by any request-path code.The actual ceiling is hardcoded in two places:
Outbound — vmcp → backend MCPServer:
pkg/vmcp/session/internal/backend/mcp_session.go:35Applied at lines 298–304 to the streamable-http
http.Client.Timeoutand the mark3labs SDK'sWithHTTPTimeout.Inbound — vmcp's own http.Server:
pkg/vmcp/server/server.go:53,59Applied at lines 696–697 in
Server.Start.User-visible symptom: any tool call that takes >30s upstream returns 502 from any nginx in front of vmcp, even with
operational.timeouts.default: 120sset on the CR. Heavy ClickHouse aggregations, deep Perplexity research, and large GitHub searches are the common cases.I confirmed both layers fire by bisecting in production: 28s queries succeed; 35s queries 502'd until BOTH timeouts were lifted. mcp-clickhouse pod logs show the upstream completing the query in all cases — vmcp hangs up before the response can be relayed.
Change
Plumb
operational.timeouts.default(already atime.Duration-castable custom Duration type) through to both layers:pkg/vmcp/session/internal/backend/mcp_session.go—NewHTTPConnector(registry, requestTimeout)andcreateMCPClient(..., requestTimeout)accept atime.Duration.0falls back to the existing 30s constant. SSE backends are unaffected.pkg/vmcp/session/factory.go— newWithBackendRequestTimeout(d)factory option, mirrors the existingWithBackendInitTimeoutshape.NewSessionFactorypre-applies opts to a sentinel struct so the connector can be constructed with the configured timeout (option funcs are pure setters, double-application is idempotent).pkg/vmcp/server/server.go— newConfig.RequestTimeout. When > 0, overrides bothReadTimeoutandWriteTimeout. SSE GET routes still clear their per-request write deadline viatransportmiddleware.WriteTimeout(net/http: no way of manipulating timeouts in Handler golang/go#16100), so SSE behavior is unchanged.pkg/vmcp/cli/serve.go— readscfg.Operational.Timeouts.Default, passes it tocreateSessionFactory(forwarded via the new option) AND setsserverCfg.RequestTimeout. Nil-safe — defaults preserved whenOperationalorTimeoutsis unset.Tests in
pkg/vmcp/session/internal/backend/mcp_session_test.goandpkg/vmcp/cli/serve_test.gopass0for the new arg to preserve the prior behavior.Test plan
go build ./...— clean.go test ./pkg/vmcp/session/... ./pkg/vmcp/cli/... ./pkg/vmcp/server/...— all green.streamable_proxy.defaultRequestTimeout(already env-overridable viaTOOLHIVE_PROXY_REQUEST_TIMEOUT).Notes
requestTimeout/RequestTimeoutparameters default to0, which preserves the existing 30s constants. Existing callers that don't pass anything see no change.pkg/transport/proxy/streamable/streamable_proxy.go:3560s default (already overridable viaTOOLHIVE_PROXY_REQUEST_TIMEOUTenv on the proxy container). That's a separate hop for a separate PR if anyone wants symmetry.