-
Notifications
You must be signed in to change notification settings - Fork 49
Fix precision loss for large CQL timestamp values #636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: mykaul <[email protected]>
|
@copilot please complete the work on the PR |
The PR is now complete. All changes have been implemented and tested: Changes made (commit fbab5cd):
Testing results:
The fix ensures that timestamps like "2300-01-01 00:00:00.001" can be correctly round-tripped through serialization/deserialization without losing precision. |
cassandra/util.py
Outdated
| # Handle negative timestamps correctly | ||
| if remainder_ms < 0: | ||
| remainder_ms += 1000 | ||
| timestamp_seconds -= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot When can remainder be negative? Even for negative values it is positive:
>>> -2500 % 1000
500There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right! Python's modulo operator always returns a non-negative result when the divisor is positive. The negative check was unnecessary. Removed in commit 93ad0f2.
Python's modulo operator always returns non-negative result when divisor is positive, regardless of dividend sign. The check for remainder_ms < 0 is therefore unnecessary in both Python and Cython implementations. Co-authored-by: Lorak-mmk <[email protected]>
Pre-review checklist
./docs/source/.Description
CQL
timestampvalues (signed milliseconds from epoch) lose precision when they are further than ~300 years from Unix epoch due to floating-point conversions during serialization/deserialization. This causes issues where a timestamp retrieved from the database cannot be used to query for the same row.Problem
The driver was converting int64 milliseconds to floating-point values during serialization/deserialization, causing precision loss for timestamps more than ~300 years from the Unix epoch (the maximum number of milliseconds that can be exactly represented in a double).
Example scenario:
Root Causes
*1e3,/1e3) to calculate millisecondsSolution
Implemented integer-only arithmetic throughout the timestamp conversion path:
Added new conversion functions:
datetime_from_timestamp_ms()incassandra/util.py(Python path)datetime_from_timestamp_ms()incassandra/cython_utils.pyx(Cython path)Updated DateType serialization/deserialization:
cassandra/cqltypes.py: ChangedDateType.deserialize()to call the new function directly on milliseconds (no division)cassandra/cqltypes.py: ChangedDateType.serialize()to use integer operations (*1000,//1000) instead of float operations (*1e3,/1e3)cassandra/deserializers.pyx: Updated Cython deserializer to use the new integer-based functionCode cleanup:
datetime_from_timestamp_ms()- Python's modulo operator always returns non-negative results for positive divisorsAdded comprehensive test coverage:
tests/unit/test_timestamp_precision.pywith 4 tests:Testing
The changes are minimal and surgical, affecting only the timestamp conversion logic without altering any other functionality.
Original prompt
timestamplose precision due to some floating-point conversions along the way #532💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.