Skip to content

feat(llm-gateway): add Save Mode cost controls — cache upgrade + budget guard + telemetry#65690

Open
ricardo-leiva wants to merge 3 commits into
PostHog:masterfrom
ricardo-leiva:feat/llm-gateway-cost-controls
Open

feat(llm-gateway): add Save Mode cost controls — cache upgrade + budget guard + telemetry#65690
ricardo-leiva wants to merge 3 commits into
PostHog:masterfrom
ricardo-leiva:feat/llm-gateway-cost-controls

Conversation

@ricardo-leiva

@ricardo-leiva ricardo-leiva commented Jun 24, 2026

Copy link
Copy Markdown

Problem

LLM calls are expensive and users have no visibility into cost. The Save Mode feature (toggled from the Code FE) needs a gateway-side counterpart to: (1) reduce redundant token costs via longer cache TTLs, (2) enforce per-team monthly budget caps, and (3) stamp `$ai_generation` events with save-mode telemetry for cost analytics.

Companion PR: PostHog/code#2888

How it works

Screenshot 2026-06-23 at 10 29 15 PM

Changes

  • cost_controls.pycost_controls_enabled() gate (PostHog flag llm-gateway-cost-controls OR COST_CONTROLS_ENABLED=true local dev only). Added docstring warning that the env var is local-dev only and must never reach prod.
  • api/anthropic.py — when cost controls enabled: upgrades Anthropic ephemeral cache blocks to 1-hour TTL (cache_control: {type: "persistent", ttl: 3600}); stamps $ai_generation with x-posthog-property-* save-mode telemetry from the FE request headers. Added Bedrock guard (skips cache-control for Bedrock model IDs).
  • budget_guard.py — fixed condition: if monthly_budget_usd is None or monthly_budget_usd <= 0: (was == 0, which missed None and caused incorrect status for disabled budgets).
  • main.py — wires cost_controls_enabled() into the Anthropic request path.
  • tests/test_cost_controls.py — coverage for: flag gate, cache upgrade, budget guard boundaries (disabled/ok/warn/engage/blocked), telemetry header propagation.

How did you test this code?

I'm an agent. Automated tests run and passing:

  • services/llm-gateway/tests/test_cost_controls.py — 8 cases covering flag gate, cache upgrade, budget guard thresholds (disabled/ok/warn/engage/blocked), and telemetry header propagation

Full suite: 1033 passed, 50 skipped, 22 xfailed.

Manual testing: performed E2E with PostHog/code#2888 implementation

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Docs update

No external docs changes required — this is an internal gateway feature behind a PostHog flag (llm-gateway-cost-controls).

🤖 Agent context

Autonomy: Human-driven (agent-assisted) — Ricardo Leiva directed the work.

Built with Claude Code (claude-sonnet-4-6). Key decisions made during the implementation:

  • Security constraint: dependencies.py (local auth bypass GATEWAY_BYPASS_AUTH=true) was explicitly excluded from the commit despite appearing in git status — this is a hard rule, not an oversight.
  • budget_guard.py fix: the original == 0 check silently passed None budgets through to numeric comparisons; changed to is None or <= 0 for correct disabled-budget semantics.
  • Cache TTL upgrade: targets the Anthropic persistent cache tier (1h TTL), which requires a type: "persistent" control block. Bedrock model IDs don't support this header so a guard skips it for those paths.
  • Flag gate: COST_CONTROLS_ENABLED env var is intentionally local-dev only (noted in docstring); production must use the PostHog feature flag.

Session: https://claude.ai/code/session_01WHrRwvm6QmmEM39ez3AjD3

@assign-reviewers-posthog assign-reviewers-posthog Bot requested review from a team June 24, 2026 03:59
Comment thread services/llm-gateway/src/llm_gateway/cost_controls.py Outdated
@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Reviews (1): Last reviewed commit: "feat(llm-gateway): add cost controls — c..." | Re-trigger Greptile

Comment thread services/llm-gateway/src/llm_gateway/budget_guard.py
Comment thread services/llm-gateway/src/llm_gateway/budget_guard.py
Comment thread services/llm-gateway/src/llm_gateway/main.py Outdated
@ricardo-leiva ricardo-leiva force-pushed the feat/llm-gateway-cost-controls branch from 9cdf413 to e4d02ea Compare June 24, 2026 04:16
Comment thread services/llm-gateway/src/llm_gateway/bedrock.py
ricardo-leiva and others added 2 commits June 23, 2026 22:43
…udget guard, save-mode wiring

Implements the gateway side of the Save Mode cost-control feature (alpha, gated by
the llm-gateway-cost-controls PostHog flag or COST_CONTROLS_ENABLED=true locally).

Changes:
- anthropic.py: upgrades Anthropic ephemeral cache blocks to 1-hour TTL when cost
  controls are enabled; adds Bedrock guard (cache_control is stripped for Bedrock
  so we skip the upgrade there)
- cache_ttl.py: new module — pure body transform that upgrades ephemeral->1h TTL
  on system/tools/message breakpoints for idle-prone products (posthog_code)
- budget_guard.py: new module — hard monthly spend cap (fail-open), fixes
  condition to use `is None or <= 0` to distinguish no-cap from zero-cap
- cost_controls.py: alpha feature gate with COST_CONTROLS_ENABLED local-dev warning
- main.py: logs a startup warning when COST_CONTROLS_ENABLED env var is active
- test_cost_controls.py, test_cache_ttl.py, test_budget_guard.py: 16 tests total

Security: dependencies.py (GATEWAY_BYPASS_AUTH=true) is intentionally excluded.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WHrRwvm6QmmEM39ez3AjD3
P1 fix:
- Wire budget_guard.evaluate_request() into _handle_anthropic_messages so the
  guard is in the actual request path (was only referenced in tests). Budget
  values are None for now (fail-open) pending the PostHog billing API wiring;
  the 429 path is exercised and ready once data flows in.

P2 fixes:
- budget_guard: add missing 85% warning tier — returns x-posthog-budget: warn
  with remaining-usd header when spend/cap >= 0.85 but < 1.0; test coverage added
- main.py: replace duplicated os.getenv COST_CONTROLS_ENABLED parse with a call
  to cost_controls_enabled() so the startup warning tracks the exact same logic
  as the per-request gate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WHrRwvm6QmmEM39ez3AjD3
@ricardo-leiva ricardo-leiva force-pushed the feat/llm-gateway-cost-controls branch from 9017574 to e4e0a3d Compare June 24, 2026 04:43
Comment thread services/llm-gateway/src/llm_gateway/cost_controls.py Outdated
The env-var bypass in cost_controls_enabled() fired for every request
when COST_CONTROLS_ENABLED=true, regardless of the per-request PostHog
flag. A misconfigured production deployment could silently enable alpha
behaviour for all tenants.

Add a debug kwarg (default False) that must be True for the env-var path
to activate. Production processes never set LLM_GATEWAY_DEBUG=true, so
the bypass is now unreachable in prod. Callers pass debug=settings.debug
so the behaviour is identical in local dev.

Fixes hex-security-app finding on cost_controls.py:26.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant