feat(llm-gateway): add Save Mode cost controls — cache upgrade + budget guard + telemetry#65690
Open
ricardo-leiva wants to merge 3 commits into
Open
feat(llm-gateway): add Save Mode cost controls — cache upgrade + budget guard + telemetry#65690ricardo-leiva wants to merge 3 commits into
ricardo-leiva wants to merge 3 commits into
Conversation
Contributor
|
Reviews (1): Last reviewed commit: "feat(llm-gateway): add cost controls — c..." | Re-trigger Greptile |
9cdf413 to
e4d02ea
Compare
2 tasks
…udget guard, save-mode wiring Implements the gateway side of the Save Mode cost-control feature (alpha, gated by the llm-gateway-cost-controls PostHog flag or COST_CONTROLS_ENABLED=true locally). Changes: - anthropic.py: upgrades Anthropic ephemeral cache blocks to 1-hour TTL when cost controls are enabled; adds Bedrock guard (cache_control is stripped for Bedrock so we skip the upgrade there) - cache_ttl.py: new module — pure body transform that upgrades ephemeral->1h TTL on system/tools/message breakpoints for idle-prone products (posthog_code) - budget_guard.py: new module — hard monthly spend cap (fail-open), fixes condition to use `is None or <= 0` to distinguish no-cap from zero-cap - cost_controls.py: alpha feature gate with COST_CONTROLS_ENABLED local-dev warning - main.py: logs a startup warning when COST_CONTROLS_ENABLED env var is active - test_cost_controls.py, test_cache_ttl.py, test_budget_guard.py: 16 tests total Security: dependencies.py (GATEWAY_BYPASS_AUTH=true) is intentionally excluded. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WHrRwvm6QmmEM39ez3AjD3
P1 fix: - Wire budget_guard.evaluate_request() into _handle_anthropic_messages so the guard is in the actual request path (was only referenced in tests). Budget values are None for now (fail-open) pending the PostHog billing API wiring; the 429 path is exercised and ready once data flows in. P2 fixes: - budget_guard: add missing 85% warning tier — returns x-posthog-budget: warn with remaining-usd header when spend/cap >= 0.85 but < 1.0; test coverage added - main.py: replace duplicated os.getenv COST_CONTROLS_ENABLED parse with a call to cost_controls_enabled() so the startup warning tracks the exact same logic as the per-request gate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WHrRwvm6QmmEM39ez3AjD3
9017574 to
e4e0a3d
Compare
The env-var bypass in cost_controls_enabled() fired for every request when COST_CONTROLS_ENABLED=true, regardless of the per-request PostHog flag. A misconfigured production deployment could silently enable alpha behaviour for all tenants. Add a debug kwarg (default False) that must be True for the env-var path to activate. Production processes never set LLM_GATEWAY_DEBUG=true, so the bypass is now unreachable in prod. Callers pass debug=settings.debug so the behaviour is identical in local dev. Fixes hex-security-app finding on cost_controls.py:26. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
LLM calls are expensive and users have no visibility into cost. The Save Mode feature (toggled from the Code FE) needs a gateway-side counterpart to: (1) reduce redundant token costs via longer cache TTLs, (2) enforce per-team monthly budget caps, and (3) stamp `$ai_generation` events with save-mode telemetry for cost analytics.
Companion PR: PostHog/code#2888
How it works
Changes
cost_controls.py—cost_controls_enabled()gate (PostHog flagllm-gateway-cost-controlsORCOST_CONTROLS_ENABLED=truelocal dev only). Added docstring warning that the env var is local-dev only and must never reach prod.api/anthropic.py— when cost controls enabled: upgrades Anthropic ephemeral cache blocks to 1-hour TTL (cache_control: {type: "persistent", ttl: 3600}); stamps$ai_generationwithx-posthog-property-*save-mode telemetry from the FE request headers. Added Bedrock guard (skips cache-control for Bedrock model IDs).budget_guard.py— fixed condition:if monthly_budget_usd is None or monthly_budget_usd <= 0:(was== 0, which missedNoneand caused incorrect status for disabled budgets).main.py— wirescost_controls_enabled()into the Anthropic request path.tests/test_cost_controls.py— coverage for: flag gate, cache upgrade, budget guard boundaries (disabled/ok/warn/engage/blocked), telemetry header propagation.How did you test this code?
I'm an agent. Automated tests run and passing:
services/llm-gateway/tests/test_cost_controls.py— 8 cases covering flag gate, cache upgrade, budget guard thresholds (disabled/ok/warn/engage/blocked), and telemetry header propagationFull suite: 1033 passed, 50 skipped, 22 xfailed.
Manual testing: performed E2E with PostHog/code#2888 implementation
Automatic notifications
Docs update
No external docs changes required — this is an internal gateway feature behind a PostHog flag (
llm-gateway-cost-controls).🤖 Agent context
Autonomy: Human-driven (agent-assisted) — Ricardo Leiva directed the work.
Built with Claude Code (claude-sonnet-4-6). Key decisions made during the implementation:
dependencies.py(local auth bypassGATEWAY_BYPASS_AUTH=true) was explicitly excluded from the commit despite appearing ingit status— this is a hard rule, not an oversight.budget_guard.pyfix: the original== 0check silently passedNonebudgets through to numeric comparisons; changed tois None or <= 0for correct disabled-budget semantics.persistentcache tier (1h TTL), which requires atype: "persistent"control block. Bedrock model IDs don't support this header so a guard skips it for those paths.COST_CONTROLS_ENABLEDenv var is intentionally local-dev only (noted in docstring); production must use the PostHog feature flag.Session: https://claude.ai/code/session_01WHrRwvm6QmmEM39ez3AjD3