Fix precision loss for large CQL timestamp values #636

Copilot · 2025-12-27T11:24:52Z

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.

Description

CQL timestamp values (signed milliseconds from epoch) lose precision when they are further than ~300 years from Unix epoch due to floating-point conversions during serialization/deserialization. This causes issues where a timestamp retrieved from the database cannot be used to query for the same row.

Problem

The driver was converting int64 milliseconds to floating-point values during serialization/deserialization, causing precision loss for timestamps more than ~300 years from the Unix epoch (the maximum number of milliseconds that can be exactly represented in a double).

Example scenario:

Insert timestamp "2300-01-01 00:00:00.001"
Retrieve it back: "2300-01-01 00:00:00.000999" (precision lost)
Use retrieved value in a query: fails because it's serialized as "2300-01-01 00:00:00.000"

Root Causes

Deserialization divides int64 milliseconds by 1000.0 (float), losing precision
Serialization uses floating-point arithmetic (*1e3, /1e3) to calculate milliseconds
Both Python and Cython implementations had this issue

Solution

Implemented integer-only arithmetic throughout the timestamp conversion path:

Added new conversion functions:
- datetime_from_timestamp_ms() in cassandra/util.py (Python path)
- datetime_from_timestamp_ms() in cassandra/cython_utils.pyx (Cython path)
- Both functions use integer arithmetic exclusively to avoid precision loss
Updated DateType serialization/deserialization:
- cassandra/cqltypes.py: Changed DateType.deserialize() to call the new function directly on milliseconds (no division)
- cassandra/cqltypes.py: Changed DateType.serialize() to use integer operations (*1000, //1000) instead of float operations (*1e3, /1e3)
- cassandra/deserializers.pyx: Updated Cython deserializer to use the new integer-based function
Code cleanup:
- Removed unnecessary negative remainder check from datetime_from_timestamp_ms() - Python's modulo operator always returns non-negative results for positive divisors
Added comprehensive test coverage:
- Created tests/unit/test_timestamp_precision.py with 4 tests:
  - Large timestamp roundtrip (year 2300)
  - Various large timestamps including negative values (year 1640 BCE)
  - Small timestamps to ensure no regression
  - The exact scenario from the issue report

Testing

✅ All new tests pass (4/4)
✅ All existing DateType tests pass
✅ All unit type tests pass (62 passed, 1 skipped)
✅ No regressions detected
✅ Issue scenario validated - timestamps like "2300-01-01 00:00:00.001" now maintain full precision through serialization/deserialization cycles
✅ Negative timestamp handling verified to work correctly

The changes are minimal and surgical, affecting only the timestamp conversion logic without altering any other functionality.

Original prompt

This section details on the original issue you should resolve

<issue_title>Large values of CQL type timestamp lose precision due to some floating-point conversions along the way</issue_title>
<issue_description>timestamp CQL values are integers (signed milliseconds from epoch), but they get converted to double somewhere along the way, so they lose precision if they are further than ~300 years from the Unix epoch. (~300 years is the max number of milliseconds which can be exactly represented in a double).

In particular, you can e.g. select a timestamp key from a table, then use the obtained key to query the table for that particular key, and the query will return an empty result.

Consider:
    keyspace_name = "ks"
    table_name = "ks"
    session.execute(f"DROP KEYSPACE IF EXISTS {keyspace_name}")
    session.execute(f"""
        CREATE KEYSPACE IF NOT EXISTS {keyspace_name}
        WITH replication = {{'class': 'SimpleStrategy', 'replication_factor': 1}}
    """)
    session.execute(f"""
        CREATE TABLE {keyspace_name}.{table_name} (
            pk timestamp PRIMARY KEY
        )
    """)
    test_timestamp = "2300-01-01 00:00:00.001"
    insert = session.execute(f"INSERT INTO {keyspace_name}.{table_name}(pk) VALUES('{test_timestamp}')")

    scan_result = session.execute(f"SELECT * FROM {keyspace_name}.{table_name}")
    scan_rows = list(scan_result)
    assert len(scan_rows) == 1

    retrieved_timestamp = scan_rows[0].pk
    logger.debug(f"Original timestamp: {test_timestamp}")
    logger.debug(f"Retrieved timestamp: {retrieved_timestamp}")

    key_rows = list(session.execute(f"SELECT * FROM {keyspace_name}.{table_name} WHERE pk = '{test_timestamp}'"))
    assert len(key_rows) == 1

    key_select = session.prepare(f"SELECT * FROM {keyspace_name}.{table_name} WHERE pk = ?")
    key_rows = list(session.execute(key_select, [retrieved_timestamp]))
    assert len(key_rows) == 1 # FAIL

    session.execute(f"DROP KEYSPACE {keyspace_name}")
This will give the log output:
2025-09-11 23:12:40,117 - DEBUG - Original timestamp: 2300-01-01 00:00:00.001
2025-09-11 23:12:40,117 - DEBUG - Retrieved timestamp: 2300-01-01 00:00:00.000999
And the test will fail on the assert marked with the FAIL comment, because the actual key is 2300-01-01 00:00:00.001, the key converted to Python native types is 2300-01-01 00:00:00.001 - epsilon, and the key sent back with the last SELECT is 2300-01-01 00:00:00.000.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Large values of CQL type timestamp lose precision due to some floating-point conversions along the way #532

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: mykaul <[email protected]>

mykaul · 2025-12-27T12:18:06Z

@copilot please complete the work on the PR

Copilot · 2025-12-27T12:23:00Z

@copilot please complete the work on the PR

The PR is now complete. All changes have been implemented and tested:

Changes made (commit fbab5cd):

Added integer-based timestamp conversion functions to avoid floating-point precision loss
Updated serialization/deserialization in both Python and Cython code paths
Added comprehensive test suite that validates the fix

Testing results:

New timestamp precision tests: ✅ All 4 tests pass
Existing DateType tests: ✅ Pass
All unit type tests (63 tests): ✅ 62 passed, 1 skipped
Issue scenario validation: ✅ Timestamps far from epoch now maintain full precision

The fix ensures that timestamps like "2300-01-01 00:00:00.001" can be correctly round-tripped through serialization/deserialization without losing precision.

Lorak-mmk · 2026-01-02T14:24:48Z

cassandra/util.py

+    # Handle negative timestamps correctly
+    if remainder_ms < 0:
+        remainder_ms += 1000
+        timestamp_seconds -= 1


@copilot When can remainder be negative? Even for negative values it is positive:

>>> -2500 % 1000 500

You're absolutely right! Python's modulo operator always returns a non-negative result when the divisor is positive. The negative check was unnecessary. Removed in commit 93ad0f2.

Python's modulo operator always returns non-negative result when divisor is positive, regardless of dividend sign. The check for remainder_ms < 0 is therefore unnecessary in both Python and Cython implementations. Co-authored-by: Lorak-mmk <[email protected]>

Initial plan

e803c16

Copilot AI assigned Copilot and mykaul Dec 27, 2025

Copilot started work on behalf of mykaul December 27, 2025 11:25 View session

Changes before error encountered

fbab5cd

Co-authored-by: mykaul <[email protected]>

Copilot AI requested a review from mykaul December 27, 2025 11:32

Copilot started work on behalf of mykaul December 27, 2025 12:18 View session

Copilot AI changed the title ~~[WIP] Fix precision loss for large CQL timestamp values~~ Fix precision loss for large CQL timestamp values Dec 27, 2025

Copilot finished work on behalf of mykaul December 27, 2025 12:23

Lorak-mmk reviewed Jan 2, 2026

View reviewed changes

Copilot started work on behalf of Lorak-mmk January 2, 2026 14:29 View session

Copilot AI requested a review from Lorak-mmk January 2, 2026 14:33

Copilot finished work on behalf of Lorak-mmk January 2, 2026 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix precision loss for large CQL timestamp values #636

Fix precision loss for large CQL timestamp values #636

Uh oh!

Copilot AI commented Dec 27, 2025 •

edited

Loading

Uh oh!

mykaul commented Dec 27, 2025

Uh oh!

Copilot AI commented Dec 27, 2025

Uh oh!

Lorak-mmk Jan 2, 2026

Uh oh!

Copilot AI Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix precision loss for large CQL timestamp values #636

Are you sure you want to change the base?

Fix precision loss for large CQL timestamp values #636

Uh oh!

Conversation

Copilot AI commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-review checklist

Description

Problem

Root Causes

Solution

Testing

Comments on the Issue (you are @copilot in this section)

Uh oh!

mykaul commented Dec 27, 2025

Uh oh!

Copilot AI commented Dec 27, 2025

Uh oh!

Lorak-mmk Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Dec 27, 2025 •

edited

Loading