fix(router): increase inference validation token budget by geelen · Pull Request #432 · NVIDIA/OpenShell

geelen · 2026-03-18T09:53:25Z

Summary

Increase the inference validation probe token budget from 1 to 32 so OpenAI-compatible backends that reject extremely small output budgets can still pass verification.

Related Issue

N/A

Changes

Increased the validation probe token budget from 1 to 32 for chat completions, completions, Anthropic messages, and responses probes
Updated the router-side validation test to expect the new probe budget
Updated the server-side inference verification test to match the new probe request shape

Testing

mise run pre-commit passes
Unit tests added/updated
E2E tests added/updated (if applicable)

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture docs updated (if applicable)

github-actions · 2026-03-18T09:53:40Z

All contributors have signed the DCO ✍️ ✅
_{Posted by the DCO Assistant Lite bot.}

github-actions · 2026-03-18T09:53:41Z

Thank you for your interest in contributing to OpenShell, @geelen.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

Open a Vouch Request discussion.
Describe what you want to change and why.
Write in your own words — do not have an AI generate the request.
A maintainer will comment /vouch if approved.
Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

geelen · 2026-03-18T10:09:18Z

I have read the DCO document and I hereby sign the DCO.

geelen · 2026-03-18T23:10:29Z

FYI I have now tested this against the particular endpoint and it does indeed pass validation automatically. Also the value of 32 was just plucked out of thin air, but seemed like a safe default (my endpoint returned 11 tokens in response).

pimlock · 2026-03-18T23:18:38Z

FYI I have now tested this against the particular endpoint and it does indeed pass validation automatically. Also the value of 32 was just plucked out of thin air, but seemed like a safe default (my endpoint returned 11 tokens in response).

I think 32-ish makes sense and shouldn't impact the time it takes for response to come back too much. Flakiness/potential timeouts, etc. was a reason to include the --no-verify flag, so the check is not a blocker.

I just checked how the openclaw does verification and they also use 1 for max_tokens: https://github.com/openclaw/openclaw/blob/757c2cc2deb9a1157a0b5685eaff33bd4bb70485/src/commands/onboard-custom.ts#L269

Out of curiosity - what's the validation on the inference-api side? I'm assuming this is some kind of default that litellm is enforcing?

pimlock · 2026-03-19T18:36:28Z

@geelen did more research on this and tried different models and depending on the model I got the error or not. I looped through all the models and 5 was enough to pass the check for all of them.

I'd say - let's update this to 5 and merge? This way the check would be faster and less risk of running into timeout (in case someone uses it with a super slow setup, would have to be <1 tps).

fix(router): increase inference validation token budget

4ad74a5

geelen requested a review from a team as a code owner March 18, 2026 09:53

github-actions bot closed this Mar 18, 2026

drew reopened this Mar 18, 2026

github-actions bot closed this Mar 18, 2026

drew requested a review from pimlock March 18, 2026 16:08

pimlock reopened this Mar 18, 2026

NVIDIA deleted a comment from github-actions bot Mar 18, 2026

pimlock added the test:e2e Requires end-to-end coverage label Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(router): increase inference validation token budget#432

fix(router): increase inference validation token budget#432
geelen wants to merge 1 commit intoNVIDIA:mainfrom
geelen:codex/increase-inference-validation-token-budget

geelen commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

geelen commented Mar 18, 2026

Uh oh!

geelen commented Mar 18, 2026

Uh oh!

pimlock commented Mar 18, 2026

Uh oh!

pimlock commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

geelen commented Mar 18, 2026

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

github-actions bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

geelen commented Mar 18, 2026

Uh oh!

geelen commented Mar 18, 2026

Uh oh!

pimlock commented Mar 18, 2026

Uh oh!

pimlock commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 18, 2026 •

edited

Loading