fix: graceful fallback for newer model families in count_message_tokens()#235
fix: graceful fallback for newer model families in count_message_tokens()#235ArivunidhiA wants to merge 1 commit intoAgentOps-AI:mainfrom
Conversation
…ns() Replace raise KeyError with graceful fallback using default token-per-message overhead (tokens_per_message=3, tokens_per_name=1) for models not in the hardcoded if/elif chain. This matches the format used by all modern OpenAI chat models since gpt-4-0613. Fixes AgentOps-AI#228, AgentOps-AI#231
There was a problem hiding this comment.
Pull request overview
This PR updates count_message_tokens() to avoid raising KeyError for unknown/newer chat model identifiers by falling back to the modern default message overhead, and expands tests to cover the new behavior (addressing #228 and #231).
Changes:
- Replace the unknown-model
KeyErrorincount_message_tokens()with a warning + default overhead (tokens_per_message=3,tokens_per_name=1). - Update existing tests that expected
KeyErrorand add new coverage for newer/unknown model names and prefixes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
tokencost/costs.py |
Adds a graceful fallback path for unknown/new model token counting instead of raising KeyError. |
tests/test_costs.py |
Adjusts expectations for unknown models and adds new tests covering fallback behavior for newer model families. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # For newer model families (gpt-5, gpt-4.1, gpt-4.5, gemini, etc.) | ||
| # and any unknown models, use the modern default token format. | ||
| # All recent OpenAI chat models use tokens_per_message=3, tokens_per_name=1. | ||
| logger.warning( | ||
| f"Warning: model {model} is not explicitly handled for token counting. " | ||
| "Using default token-per-message overhead." | ||
| ) | ||
| tokens_per_message = 3 | ||
| tokens_per_name = 1 |
There was a problem hiding this comment.
The new fallback block is intended to cover model families like gpt-4.1 / gpt-4.5-*, but those strings currently match the earlier elif "gpt-4" in model: branch and get redirected to gpt-4-0613. That bypasses this fallback and can also force the wrong tiktoken encoding for gpt-4.1/gpt-4.5 if they use a different encoding than gpt-4-0613. Consider handling gpt-4.1 / gpt-4.5 explicitly before the generic "gpt-4" in model check, or tightening that check (e.g., only gpt-4 + optional - suffixes).
| logger.warning( | ||
| f"Warning: model {model} is not explicitly handled for token counting. " | ||
| "Using default token-per-message overhead." | ||
| ) |
There was a problem hiding this comment.
logger.warning(...) already conveys severity; prefixing the message itself with "Warning:" makes logs noisier and inconsistent with other warnings in this module (some include the prefix, some don’t). Consider removing the redundant "Warning:" prefix and keeping the message content focused on the condition and fallback being used.
| """Empty input (not a list) should raise an error during encoding""" | ||
| # Empty string is not a valid message list; the function will attempt | ||
| # to iterate over it and fail or produce 0 tokens. | ||
| # With the graceful fallback, an empty string iterates as zero messages. | ||
| result = count_message_tokens("", "") | ||
| # Empty string iterates as zero messages, plus 3 for reply priming | ||
| assert result == 3 | ||
|
|
||
|
|
There was a problem hiding this comment.
This test now passes a str for messages (despite the function contract being List[Dict[str, str]]) and asserts a numeric result. That’s brittle: an empty string happens to iterate to zero messages and returns 3, but any non-empty string will iterate characters and then crash at message.items(). It would be better to have count_message_tokens() validate messages is a list of dicts and raise TypeError for invalid inputs, and update this test to assert that error instead of relying on the empty-string edge case.
| """Empty input (not a list) should raise an error during encoding""" | |
| # Empty string is not a valid message list; the function will attempt | |
| # to iterate over it and fail or produce 0 tokens. | |
| # With the graceful fallback, an empty string iterates as zero messages. | |
| result = count_message_tokens("", "") | |
| # Empty string iterates as zero messages, plus 3 for reply priming | |
| assert result == 3 | |
| """Empty input (not a list) should raise a TypeError.""" | |
| # Empty string is not a valid message list; count_message_tokens should | |
| # validate the input type and raise a TypeError for invalid inputs. | |
| with pytest.raises(TypeError): | |
| count_message_tokens("", "") |
| assert isinstance(result, int) | ||
| assert result == 4 # "Hello, world!" = 4 tokens in cl100k_base |
There was a problem hiding this comment.
The docstring/comment for this new test says the new models work “via cl100k_base fallback”, but count_string_tokens() will use tiktoken.encoding_for_model(model) when available (no fallback). To keep this test accurate and resilient to future tiktoken model mappings, consider deriving the expected token count using the same encoding-resolution logic as the production function (try encoding_for_model, else cl100k_base) rather than hard-coding 4.
| assert isinstance(result, int) | |
| assert result == 4 # "Hello, world!" = 4 tokens in cl100k_base | |
| import tiktoken | |
| try: | |
| encoding = tiktoken.encoding_for_model(model) | |
| except KeyError: | |
| encoding = tiktoken.get_encoding("cl100k_base") | |
| expected = len(encoding.encode(STRING)) | |
| assert isinstance(result, int) | |
| assert result == expected |
Fixes #228, #231
Problem
count_message_tokens()raisesKeyErrorfor any model not in its hardcoded if/elif chain — includinggpt-5-nano,gpt-4.1,gpt-4.5-preview, Gemini models, etc.Fix
Replace
raise KeyErrorwith a graceful fallback using the same token overhead all modern OpenAI models use (tokens_per_message=3, tokens_per_name=1). A warning is logged for visibility. Encoding is already handled correctly by tiktoken's own fallback.Before:
After:
Changes
Problem
count_message_tokens()raisesKeyErrorfor any model not in its hardcoded if/elif chain — includinggpt-5-nano,gpt-4.1,gpt-4.5-preview, Gemini models, etc.Fix
Replace `raise Ke133 tests pass. Zero new lint errors.