Skip to content

fix: graceful fallback for newer model families in count_message_tokens()#235

Open
ArivunidhiA wants to merge 1 commit intoAgentOps-AI:mainfrom
ArivunidhiA:fix/handle-new-model-families-228-231
Open

fix: graceful fallback for newer model families in count_message_tokens()#235
ArivunidhiA wants to merge 1 commit intoAgentOps-AI:mainfrom
ArivunidhiA:fix/handle-new-model-families-228-231

Conversation

@ArivunidhiA
Copy link

Fixes #228, #231

Problem

count_message_tokens() raises KeyError for any model not in its hardcoded if/elif chain — including gpt-5-nano, gpt-4.1, gpt-4.5-preview, Gemini models, etc.

Fix

Replace raise KeyError with a graceful fallback using the same token overhead all modern OpenAI models use (tokens_per_message=3, tokens_per_name=1). A warning is logged for visibility. Encoding is already handled correctly by tiktoken's own fallback.

Before:

count_message_tokens([{"role": "user", "content": "Hello"}], "gpt-5-nano")
# KeyError: num_tokens_from_messages() is not implemented for model gpt-5-nano

After:

count_message_tokens([{"role": "user", "content": "Hello"}], "gpt-5-nano")
# 8

Changes

  • `tokencost/costs

Problem

count_message_tokens() raises KeyError for any model not in its hardcoded if/elif chain — including gpt-5-nano, gpt-4.1, gpt-4.5-preview, Gemini models, etc.

Fix

Replace `raise Ke133 tests pass. Zero new lint errors.

…ns()

Replace raise KeyError with graceful fallback using default token-per-message
overhead (tokens_per_message=3, tokens_per_name=1) for models not in the
hardcoded if/elif chain. This matches the format used by all modern OpenAI
chat models since gpt-4-0613.

Fixes AgentOps-AI#228, AgentOps-AI#231
Copilot AI review requested due to automatic review settings March 22, 2026 06:46
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates count_message_tokens() to avoid raising KeyError for unknown/newer chat model identifiers by falling back to the modern default message overhead, and expands tests to cover the new behavior (addressing #228 and #231).

Changes:

  • Replace the unknown-model KeyError in count_message_tokens() with a warning + default overhead (tokens_per_message=3, tokens_per_name=1).
  • Update existing tests that expected KeyError and add new coverage for newer/unknown model names and prefixes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
tokencost/costs.py Adds a graceful fallback path for unknown/new model token counting instead of raising KeyError.
tests/test_costs.py Adjusts expectations for unknown models and adds new tests covering fallback behavior for newer model families.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +263 to +271
# For newer model families (gpt-5, gpt-4.1, gpt-4.5, gemini, etc.)
# and any unknown models, use the modern default token format.
# All recent OpenAI chat models use tokens_per_message=3, tokens_per_name=1.
logger.warning(
f"Warning: model {model} is not explicitly handled for token counting. "
"Using default token-per-message overhead."
)
tokens_per_message = 3
tokens_per_name = 1
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new fallback block is intended to cover model families like gpt-4.1 / gpt-4.5-*, but those strings currently match the earlier elif "gpt-4" in model: branch and get redirected to gpt-4-0613. That bypasses this fallback and can also force the wrong tiktoken encoding for gpt-4.1/gpt-4.5 if they use a different encoding than gpt-4-0613. Consider handling gpt-4.1 / gpt-4.5 explicitly before the generic "gpt-4" in model check, or tightening that check (e.g., only gpt-4 + optional - suffixes).

Copilot uses AI. Check for mistakes.
Comment on lines +266 to 269
logger.warning(
f"Warning: model {model} is not explicitly handled for token counting. "
"Using default token-per-message overhead."
)
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.warning(...) already conveys severity; prefixing the message itself with "Warning:" makes logs noisier and inconsistent with other warnings in this module (some include the prefix, some don’t). Consider removing the redundant "Warning:" prefix and keeping the message content focused on the condition and fallback being used.

Copilot uses AI. Check for mistakes.
Comment on lines +97 to 105
"""Empty input (not a list) should raise an error during encoding"""
# Empty string is not a valid message list; the function will attempt
# to iterate over it and fail or produce 0 tokens.
# With the graceful fallback, an empty string iterates as zero messages.
result = count_message_tokens("", "")
# Empty string iterates as zero messages, plus 3 for reply priming
assert result == 3


Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test now passes a str for messages (despite the function contract being List[Dict[str, str]]) and asserts a numeric result. That’s brittle: an empty string happens to iterate to zero messages and returns 3, but any non-empty string will iterate characters and then crash at message.items(). It would be better to have count_message_tokens() validate messages is a list of dicts and raise TypeError for invalid inputs, and update this test to assert that error instead of relying on the empty-string edge case.

Suggested change
"""Empty input (not a list) should raise an error during encoding"""
# Empty string is not a valid message list; the function will attempt
# to iterate over it and fail or produce 0 tokens.
# With the graceful fallback, an empty string iterates as zero messages.
result = count_message_tokens("", "")
# Empty string iterates as zero messages, plus 3 for reply priming
assert result == 3
"""Empty input (not a list) should raise a TypeError."""
# Empty string is not a valid message list; count_message_tokens should
# validate the input type and raise a TypeError for invalid inputs.
with pytest.raises(TypeError):
count_message_tokens("", "")

Copilot uses AI. Check for mistakes.
Comment on lines +370 to +371
assert isinstance(result, int)
assert result == 4 # "Hello, world!" = 4 tokens in cl100k_base
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring/comment for this new test says the new models work “via cl100k_base fallback”, but count_string_tokens() will use tiktoken.encoding_for_model(model) when available (no fallback). To keep this test accurate and resilient to future tiktoken model mappings, consider deriving the expected token count using the same encoding-resolution logic as the production function (try encoding_for_model, else cl100k_base) rather than hard-coding 4.

Suggested change
assert isinstance(result, int)
assert result == 4 # "Hello, world!" = 4 tokens in cl100k_base
import tiktoken
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
expected = len(encoding.encode(STRING))
assert isinstance(result, int)
assert result == expected

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

'num_tokens_from_messages() is not implemented for model gpt-5-nano

2 participants