bug fixes by eeea2222 · Pull Request #507 · openai/tiktoken

eeea2222 · 2026-03-14T00:50:58Z

There is Cpu worker performance loss and i fix it.
and i found a unused placeholder and i fix it too

- Fix is_special_token: reference self._special_tokens.values() instead of undefined self._special_token_values (AttributeError bug) - Fix encode_to_numpy: add missing UnicodeEncodeError handling for surrogate pairs - Fix encode_batch: make disallowed_special frozenset wrapping consistent with encode() - Fix registry.py: replace assert with proper RuntimeError for python -O compatibility - Fix _encode_only_native_bpe: rename misleading _unused_pat variable to pat - Improve CPU utilization: use os.cpu_count() for default thread count in batch methods - Fix typo in Rust doc comment (gauranteed -> guaranteed) - Add tests for is_special_token, _MAX_THREADS default, and python -O compatibility Co-authored-by: eeea2222 <209839587+eeea2222@users.noreply.github.com>

Added mention of CPU performance enhancements.

Updated README to include project description and enhancements.

Updated README formatting and improved clarity.

eeea2222 · 2026-03-14T00:52:25Z

please review my bug fixes because its very important if you five into it you can see it.

Copilot

Pull request overview

This PR aims to address performance/robustness issues in tiktoken by tuning default threading for batch operations, improving error handling when loading encodings, and cleaning up a few correctness/documentation issues.

Changes:

Update batch encode/decode defaults to use a CPU-based thread count cap and add Unicode-surrogate fallback for encode_to_numpy.
Replace assert-based constructor assumptions in the registry with explicit runtime errors (and add related tests).
Fix a special-token check bug, remove an unused regex placeholder, and correct minor spelling/docs text.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tiktoken/registry.py	Replaces `assert` on constructor loading with explicit error handling.
tiktoken/core.py	Adds `_MAX_THREADS`, updates batch defaults, adds `encode_to_numpy` Unicode fallback, fixes `is_special_token`, removes unused regex placeholder.
tests/test_misc.py	Adds tests for special-token detection, `_MAX_THREADS`, and running with `-O`.
src/lib.rs	Fixes spelling in a doc comment.
README.md	Adds an extra top-level heading describing the repo as a fork.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -1,5 +1,7 @@
 # ⏳ tiktoken

+# **Tiktoken Fork for Test and Bug Fixes, CPU Performance Enhancement and More.**


+assert len(names) > 0
+assert "gpt2" in names


    def is_special_token(self, token: int) -> bool:
        assert isinstance(token, int)
-        return token in self._special_token_values
+        return token in self._special_tokens.values()


+        try:
+            buffer = self._core_bpe.encode_to_tiktoken_buffer(text, allowed_special)
+        except UnicodeEncodeError:
+            text = text.encode("utf-16", "surrogatepass").decode("utf-16", "replace")
+            buffer = self._core_bpe.encode_to_tiktoken_buffer(text, allowed_special)
        return np.frombuffer(buffer, dtype=np.uint32)


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI and others added 8 commits March 12, 2026 19:52

Add description for tiktoken fork

33b60fc

Update README.md

4492e42

Update README to include CPU performance enhancements

2e7e3d1

Added mention of CPU performance enhancements.

Enhance README with project details and improvements

d824f07

Updated README to include project description and enhancements.

Refactor README content and formatting

7b3728a

Updated README formatting and improved clarity.

Fix formatting issues in README.md

8b30769

Update README title for clarity and formatting

199a6f2

Copilot AI review requested due to automatic review settings March 14, 2026 00:50

Copilot started reviewing on behalf of eeea2222 March 14, 2026 00:51 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Potential fix for pull request finding

23acf07

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug fixes#507

bug fixes#507
eeea2222 wants to merge 9 commits intoopenai:mainfrom
eeea2222:main

eeea2222 commented Mar 14, 2026

Uh oh!

eeea2222 commented Mar 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -1,5 +1,7 @@
		# ⏳ tiktoken

		# Tiktoken Fork for Test and Bug Fixes, CPU Performance Enhancement and More.

Conversation

eeea2222 commented Mar 14, 2026

Uh oh!

eeea2222 commented Mar 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants