Skip to content

fix(router): increase inference validation token budget#432

Open
geelen wants to merge 1 commit intoNVIDIA:mainfrom
geelen:codex/increase-inference-validation-token-budget
Open

fix(router): increase inference validation token budget#432
geelen wants to merge 1 commit intoNVIDIA:mainfrom
geelen:codex/increase-inference-validation-token-budget

Conversation

@geelen
Copy link

@geelen geelen commented Mar 18, 2026

Summary

Increase the inference validation probe token budget from 1 to 32 so OpenAI-compatible backends that reject extremely small output budgets can still pass verification.

Related Issue

N/A

Changes

  • Increased the validation probe token budget from 1 to 32 for chat completions, completions, Anthropic messages, and responses probes
  • Updated the router-side validation test to expect the new probe budget
  • Updated the server-side inference verification test to match the new probe request shape

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@geelen geelen requested a review from a team as a code owner March 18, 2026 09:53
@github-actions
Copy link

github-actions bot commented Mar 18, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@github-actions
Copy link

Thank you for your interest in contributing to OpenShell, @geelen.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

  1. Open a Vouch Request discussion.
  2. Describe what you want to change and why.
  3. Write in your own words — do not have an AI generate the request.
  4. A maintainer will comment /vouch if approved.
  5. Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

@github-actions github-actions bot closed this Mar 18, 2026
@geelen
Copy link
Author

geelen commented Mar 18, 2026

I have read the DCO document and I hereby sign the DCO.

@drew drew reopened this Mar 18, 2026
@github-actions github-actions bot closed this Mar 18, 2026
@drew drew requested a review from pimlock March 18, 2026 16:08
@pimlock pimlock reopened this Mar 18, 2026
@NVIDIA NVIDIA deleted a comment from github-actions bot Mar 18, 2026
@pimlock pimlock added the test:e2e Requires end-to-end coverage label Mar 18, 2026
@geelen
Copy link
Author

geelen commented Mar 18, 2026

FYI I have now tested this against the particular endpoint and it does indeed pass validation automatically. Also the value of 32 was just plucked out of thin air, but seemed like a safe default (my endpoint returned 11 tokens in response).

@pimlock
Copy link
Collaborator

pimlock commented Mar 18, 2026

FYI I have now tested this against the particular endpoint and it does indeed pass validation automatically. Also the value of 32 was just plucked out of thin air, but seemed like a safe default (my endpoint returned 11 tokens in response).

I think 32-ish makes sense and shouldn't impact the time it takes for response to come back too much. Flakiness/potential timeouts, etc. was a reason to include the --no-verify flag, so the check is not a blocker.

I just checked how the openclaw does verification and they also use 1 for max_tokens: https://github.com/openclaw/openclaw/blob/757c2cc2deb9a1157a0b5685eaff33bd4bb70485/src/commands/onboard-custom.ts#L269


Out of curiosity - what's the validation on the inference-api side? I'm assuming this is some kind of default that litellm is enforcing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants