Open
Conversation
Implements a comprehensive CLI tool that addresses issue openai#473 by enabling token counting directly from the command line. Features: - Count tokens in files and directories - Support for all OpenAI model encodings - Recursive directory processing with glob patterns - Multiple output formats (text, JSON, CSV) - Per-file breakdowns and summary statistics - Integration-friendly for CI/CD pipelines Usage examples: tiktoken count file.txt tiktoken count --model gpt-4o document.txt tiktoken count --recursive --glob "*.py" ./project/ tiktoken count --json --summary ./codebase/ Implementation details: - Added tiktoken/cli.py with full CLI implementation - Updated setup.py with console_scripts entry point - Added comprehensive tests in tests/test_cli.py - Created CLI.md with usage documentation and examples Benefits: - Quick token estimation for context window planning - Cost estimation for API usage - CI/CD integration for token budget enforcement - Shell script integration - No Python code required for basic token counting Closes openai#473
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements a comprehensive command-line interface for tiktoken that enables token counting directly from the shell, addressing the feature request in #473.
🎯 Problem Solved
Currently, tiktoken requires writing Python code to count tokens. This PR adds a
tiktokenCLI command that allows users to:✨ Features
Basic Usage
Directory Operations
tiktoken count --recursive ./src/ tiktoken count --glob "*.py" --recursive ./project/Output Formats
📝 Implementation
Files Added/Modified
tiktoken/cli.py (new)
setup.py (modified)
tiktoken = tiktoken.cli:maintiktokencommand after pip installtests/test_cli.py (new)
CLI.md (new)
🎓 Use Cases
1. Context Window Estimation
$ tiktoken count --model gpt-4-turbo --recursive ./codebase/ Total tokens: 45,230 # Result: Fits in GPT-4 Turbo's 128k context2. Cost Estimation
3. CI/CD Integration
4. Documentation Analysis
$ tiktoken count --glob "*.md" --per-file --recursive ./docs/ docs/README.md: 1,250 tokens docs/API.md: 3,420 tokens docs/GUIDE.md: 2,150 tokens🧪 Testing
Syntax Validation
✅ All Python files compile successfully
Test Suite
Integration Ready
📦 What's Included
🚀 Benefits
⚙️ Design Decisions
🎯 Closes
Closes #473