fix(catalog/memory): relax strict ms inequality in test_update_table by Kurtiscwright · Pull Request #2395 · apache/iceberg-rust

Kurtiscwright · 2026-05-01T23:12:54Z

While running make unit-test for verifying the RC the update_table test in /src/memory/catalog.rs was failing because it checks for the table and updated_table to have different updated timestamps down to the ms, but on a fast enough machine the test can update the updated_table in the same ms as the table causing the test to incorrectly fail.

Which issue does this PR close?

Closes Fix Memory/Catalog update_table Test Flakiness #2393 .

What changes are included in this PR?

Updates the last updated ms check to be less than or equal to between table and updated_table as its possible for both to be updated in the same ms

Are these changes tested?

Yes and they all pass.

CTTY · 2026-05-04T21:59:38Z


        // Assert the table doesn't contain the update yet
        assert!(!table.metadata().properties().contains_key("key"));



how about just sleep 2 ms here?

Allowing last_updated_ms to be unchanged does not make sense to me

toutane · 2026-05-05T09:43:12Z

Hey, one thing worth considering imho, the root issue is in TableMetadataBuilder::build(), which uses Utc::now() when no timestamp is set explicitly. An alternative that fixes it at the source rather than relaxing the assertion:

self.metadata.last_updated_ms = self.last_updated_ms.unwrap_or_else(|| {
    chrono::Utc::now()
        .timestamp_millis()
        .max(self.metadata.last_updated_ms + 1)
});

Kurtiscwright · 2026-05-05T15:05:02Z

@CTTY & @toutane thank you both for commenting. I will send out a follow up commit later today. I agree philosophically with the proposal from @toutane, but I think both solutions leave an implementation detail that could be missed by future use cases. I don't like adding "latency" to something that is supposed to be fine grained. I want to understand why no timestamp is not set explicitly then propose a solution from there.

Otherwise I think last_updated_ms should be renamed to something that expresses this built "latency" mechanism.

Kurtiscwright · 2026-05-06T05:03:14Z

@CTTY @toutane I traced the full call and checked the spec + Java implementation, I'm feel strongly that we should keep the current <= approach.

The spec defines last-updated-ms as a wall-clock observation, not a logical clock.

From the Table Metadata Fields (https://iceberg.apache.org/spec/#table-metadata-fields) section:

│ last-updated-ms Timestamp in milliseconds from the unix epoch when the table was last updated. Each table metadata file should update this field just before writing.

The spec makes no monotonicity guarantee. Two updates within the same millisecond producing equal values is spec-compliant behavior.

The Java implementation has the same behavior.

In TableMetadata.Builder.build() (https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadata.java), the Java code does:

if (lastUpdatedMillis == null) {
    this.lastUpdatedMillis = System.currentTimeMillis();
}

No +1, no max(), no monotonicity enforcement. It is identical to what the Rust implementation does with Utc::now().timestamp_millis().

The Java project explicitly chose tolerance over strict ordering.

Issue #1109 (apache/iceberg#1109) documents a real-world case where lastUpdatedMillis ended up lower than a previous metadata-log entry due to multi-machine clock skew. The fix (PR #1110 (apache/iceberg#1110)) added a 1-minute tolerance with the comment:

// commits can happen concurrently from different machines.
// A tolerance helps us avoid failure for small clock skew
lastUpdatedMillis - last.timestampMillis() >= -ONE_MINUTE

Equal timestamps pass this validation trivially (difference = 0, which is >= -60000).

last-updated-ms is not used for any operational decision.

No TableRequirement variant checks last-updated-ms. Conflict detection uses UUID, schema ID, snapshot ref, spec ID, sort order ID, and field IDs. Metadata version ordering is determined by the filename (v1.metadata.json → v2.metadata.json), not timestamps.

Why not sleep(2ms) or max(now, previous + 1)?

sleep adds artificial latency to a field that the spec defines as informational. It also doesn't solve the problem for real multi-machine deployments where clock skew can exceed any fixed sleep.
max(now, previous + 1) silently mutates the timestamp away from wall-clock truth. Future callers of the builder would get a value that doesn't represent when the update actually happened, with no indication it was adjusted. The field's name and spec definition say "when was this last updated" — making it lie
about that to satisfy a test assertion inverts the relationship between spec and test.

The original < assertion was over-specifying an invariant the system doesn't depend on. The <= correctly reflects the spec's semantics: the timestamp records when the update happened, and two updates in the same millisecond are valid.

toutane · 2026-05-06T08:38:17Z

Hey @Kurtiscwright, thanks for the explanation, if this is how Java does it then I'm fine with it.

blackmwk · 2026-05-06T09:49:02Z

@CTTY @toutane I traced the full call and checked the spec + Java implementation, I'm feel strongly that we should keep the current <= approach.

The spec defines last-updated-ms as a wall-clock observation, not a logical clock.

From the Table Metadata Fields (https://iceberg.apache.org/spec/#table-metadata-fields) section:

│ last-updated-ms Timestamp in milliseconds from the unix epoch when the table was last updated. Each table metadata file should update this field just before writing.

The spec makes no monotonicity guarantee. Two updates within the same millisecond producing equal values is spec-compliant behavior.

The Java implementation has the same behavior.

In TableMetadata.Builder.build() (https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadata.java), the Java code does:
if (lastUpdatedMillis == null) {
    this.lastUpdatedMillis = System.currentTimeMillis();
}
No +1, no max(), no monotonicity enforcement. It is identical to what the Rust implementation does with Utc::now().timestamp_millis().

The Java project explicitly chose tolerance over strict ordering.

Issue #1109 (apache/iceberg#1109) documents a real-world case where lastUpdatedMillis ended up lower than a previous metadata-log entry due to multi-machine clock skew. The fix (PR #1110 (apache/iceberg#1110)) added a 1-minute tolerance with the comment:
// commits can happen concurrently from different machines.
// A tolerance helps us avoid failure for small clock skew
lastUpdatedMillis - last.timestampMillis() >= -ONE_MINUTE
Equal timestamps pass this validation trivially (difference = 0, which is >= -60000).

last-updated-ms is not used for any operational decision.

No TableRequirement variant checks last-updated-ms. Conflict detection uses UUID, schema ID, snapshot ref, spec ID, sort order ID, and field IDs. Metadata version ordering is determined by the filename (v1.metadata.json → v2.metadata.json), not timestamps.

Why not sleep(2ms) or max(now, previous + 1)?

sleep adds artificial latency to a field that the spec defines as informational. It also doesn't solve the problem for real multi-machine deployments where clock skew can exceed any fixed sleep.

max(now, previous + 1) silently mutates the timestamp away from wall-clock truth. Future callers of the builder would get a value that doesn't represent when the update actually happened, with no indication it was adjusted. The field's name and spec definition say "when was this last updated" — making it lie
about that to satisfy a test assertion inverts the relationship between spec and test.

The original < assertion was over-specifying an invariant the system doesn't depend on. The <= correctly reflects the spec's semantics: the timestamp records when the update happened, and two updates in the same millisecond are valid.

Then how about removing the asseration? Since it's not used for any purpose, in theory anything can happen due to clock skew.

sungwy

I ran into the same issue and @mbutrovich kindly pointed out that this PR was already in progress (#2409)

This change LGTM

Kurtiscwright · 2026-05-06T15:57:18Z

@CTTY @toutane I traced the full call and checked the spec + Java implementation, I'm feel strongly that we should keep the current <= approach.
The spec defines last-updated-ms as a wall-clock observation, not a logical clock.
From the Table Metadata Fields (https://iceberg.apache.org/spec/#table-metadata-fields) section:
│ last-updated-ms Timestamp in milliseconds from the unix epoch when the table was last updated. Each table metadata file should update this field just before writing.
The spec makes no monotonicity guarantee. Two updates within the same millisecond producing equal values is spec-compliant behavior.
The Java implementation has the same behavior.
In TableMetadata.Builder.build() (https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadata.java), the Java code does:
if (lastUpdatedMillis == null) {
    this.lastUpdatedMillis = System.currentTimeMillis();
}
No +1, no max(), no monotonicity enforcement. It is identical to what the Rust implementation does with Utc::now().timestamp_millis().
The Java project explicitly chose tolerance over strict ordering.
Issue #1109 (apache/iceberg#1109) documents a real-world case where lastUpdatedMillis ended up lower than a previous metadata-log entry due to multi-machine clock skew. The fix (PR #1110 (apache/iceberg#1110)) added a 1-minute tolerance with the comment:
// commits can happen concurrently from different machines.
// A tolerance helps us avoid failure for small clock skew
lastUpdatedMillis - last.timestampMillis() >= -ONE_MINUTE
Equal timestamps pass this validation trivially (difference = 0, which is >= -60000).
last-updated-ms is not used for any operational decision.
No TableRequirement variant checks last-updated-ms. Conflict detection uses UUID, schema ID, snapshot ref, spec ID, sort order ID, and field IDs. Metadata version ordering is determined by the filename (v1.metadata.json → v2.metadata.json), not timestamps.
Why not sleep(2ms) or max(now, previous + 1)?

sleep adds artificial latency to a field that the spec defines as informational. It also doesn't solve the problem for real multi-machine deployments where clock skew can exceed any fixed sleep.

max(now, previous + 1) silently mutates the timestamp away from wall-clock truth. Future callers of the builder would get a value that doesn't represent when the update actually happened, with no indication it was adjusted. The field's name and spec definition say "when was this last updated" — making it lie
about that to satisfy a test assertion inverts the relationship between spec and test.

The original < assertion was over-specifying an invariant the system doesn't depend on. The <= correctly reflects the spec's semantics: the timestamp records when the update happened, and two updates in the same millisecond are valid.
Then how about removing the asseration? Since it's not used for any purpose, in theory anything can happen due to clock skew.

While its correct that clock skew in distributed systems and spread across hosts can cause any combination of results. I think that's the wrong take away. This test is validating that at worst case on the same host an update request lands in the same ms as another update or create call for that same table. This gives us some assurance that we didn't create a scenario where an update can happen before another update or before a table is created when using a single node environment.

The Java example gives us historical background on the last-updated-ms isn't being used for ordering validation but is an informative field so long as the clock skew only causes a new update to land within 1 minute less than the previous update. I think we keep the test as its not wrong and after my research yesterday I think the <= assertion makes the test stronger.

Kurtiscwright · 2026-05-06T15:58:15Z

@sungwy @blackmwk @toutane thank you all for the additional reviews, discussion and feedback!

While running make unit-test for verifying the RC the update_table test in /src/memory/catalog.rs was failing because it checks for the table and updated_table to have different updated timestamps down to the ms, but on a fast enough machine the test can update the updated_table in the same ms as the table causing the test to incorrectly fail.

CTTY · 2026-05-06T23:19:46Z

After a second thought, I think we should just remove the assertion.

Due to clock skew, it doesn't make sense to use last_updated_ms to confirm whether the table has been updated. A newer last_updated_ms can technically be smaller than the previous ts.

The existing test uses metadata_location and snapshot_log to confirm the table is updated, and it is already sufficient:

        assert_ne!(table.metadata_location(), updated_table.metadata_location());

        assert!(
            table.metadata().metadata_log().len() < updated_table.metadata().metadata_log().len()
        );

CTTY reviewed May 4, 2026

View reviewed changes

This was referenced May 6, 2026

test: update flaky test assertion #2409

Closed

fix(tests): update assertion to allow equal last updated timestamps #2408

Closed

sungwy approved these changes May 6, 2026

View reviewed changes

Kurtiscwright force-pushed the fix/memory-catalog-test-update-table-flake branch from 8edf32d to efa0ab6 Compare May 6, 2026 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(catalog/memory): relax strict ms inequality in test_update_table#2395

fix(catalog/memory): relax strict ms inequality in test_update_table#2395
Kurtiscwright wants to merge 1 commit intoapache:mainfrom
Kurtiscwright:fix/memory-catalog-test-update-table-flake

Kurtiscwright commented May 1, 2026 •

edited

Loading

Uh oh!

CTTY May 4, 2026

Uh oh!

blackmwk May 6, 2026

Uh oh!

toutane commented May 5, 2026 •

edited

Loading

Uh oh!

Kurtiscwright commented May 5, 2026

Uh oh!

Kurtiscwright commented May 6, 2026

Uh oh!

toutane commented May 6, 2026 •

edited

Loading

Uh oh!

blackmwk commented May 6, 2026

Uh oh!

sungwy left a comment

Uh oh!

Kurtiscwright commented May 6, 2026

Uh oh!

Kurtiscwright commented May 6, 2026

Uh oh!

CTTY commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		// Assert the table doesn't contain the update yet
		assert!(!table.metadata().properties().contains_key("key"));

Conversation

Kurtiscwright commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

CTTY May 4, 2026

Choose a reason for hiding this comment

Uh oh!

blackmwk May 6, 2026

Choose a reason for hiding this comment

Uh oh!

toutane commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kurtiscwright commented May 5, 2026

Uh oh!

Kurtiscwright commented May 6, 2026

Uh oh!

toutane commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blackmwk commented May 6, 2026

Uh oh!

sungwy left a comment

Choose a reason for hiding this comment

Uh oh!

Kurtiscwright commented May 6, 2026

Uh oh!

Kurtiscwright commented May 6, 2026

Uh oh!

CTTY commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Kurtiscwright commented May 1, 2026 •

edited

Loading

toutane commented May 5, 2026 •

edited

Loading

toutane commented May 6, 2026 •

edited

Loading