Skip to content

Conversation

@rionmonster
Copy link
Contributor

@rionmonster rionmonster commented Dec 27, 2025

Purpose

Linked issue: close #2262

Per Issue #2262, this pull request addresses a race condition that could sometimes occur and result in the IcebergRewriteITCase.testLogTableCompaction test case failing (particularly during CI builds).

Brief change log

This change updates the testLogTableCompaction test with an additional call to the existing assertReplicaStatus() function to ensure that the associated tiering job had fully processed the previous rewrites to avoid the race condition prior to the subsequent assertions.

Tests

The IcebergRewriteITCase.testLogTableCompaction was initially updated to use an iterative approach (e.g., repeat 50 times) as mentioned in the original issue to reproduce the issue. After this was repeatably reproducible, the proposed fix was introduced to verify the test would repeatedly pass through all of the iterations successfully and repeatably.

API and Format

N/A

Documentation

N/A

Reviewer(s) Requested

@swuferhong (as original reporter), @beryllw (as original author)

checkFileStatusInIcebergTable(t1, 3, false);

// Ensure tiering job has fully processed the previous writes
assertReplicaStatus(t1Bucket, i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about it. If the files count already is 3 in iceberg, shouldn't it also mean that the tiering has already tiered all records since we only write 3 records.

Copy link
Contributor Author

@rionmonster rionmonster Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luoyuxia

I agree that it’s a bit odd. The additional replica assertion seems to alleviate any race conditions within this specific test case that we were seeing arise within the failing CI instances (at least based on my local testing to reproduce the issue).

It may simply be a testing artifact as opposed to a legitimate issue, but if it doesn’t seem like a fix, we can explore a few other avenues. It seemed like one of those common race conditions, so leveraging an existing function to help seemed like a decent approach. Happy to explore some additional avenues though, if we feel that we need a bit more exhaustive checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rionmonster I tried to reporduce it in my local. But fails. I think it'll be better to explore the root cause for I'm afraid of it is caused by another critical issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luoyuxia

Good to know! It seemed to be pretty consistent regarding pass/fail on my end, but clearly there’s something else at play.

I’ll do some more exploration and see what I find. Thanks for the feedback!

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rionmonster Thanks for the pr. Only one question

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unstable test IcebergRewriteITCase.testLogTableCompaction

2 participants