Fix reachMinorInterval() starvation for cold partitions by lintingbin · Pull Request #4179 · apache/amoro

lintingbin · 2026-04-13T11:29:41Z

What changes were proposed in this pull request?

reachMinorInterval() uses the table-level lastMinorOptimizingTime, which is frequently reset by high-traffic partitions. This causes partitions with fewer small files (below the minor trigger file count threshold) to never get minor optimized, even when their files accumulate over time.

Root Cause

The lastMinorOptimizingTime is a single table-level timestamp shared across all partitions. When a high-traffic partition triggers minor optimization, the timestamp is reset for the entire table. Cold partitions (with few files, not reaching minor-trigger-file-count) rely on reachMinorInterval() to trigger optimization, but the interval check never passes because the table-level timestamp keeps getting refreshed.

Solution

Added a cross-day fallback mechanism in reachMinorInterval():

When minorLeastInterval is less than one day (which is the typical configuration, default is 1 hour), and the last minor optimization happened on a different calendar day, reachMinorInterval() returns true.
This ensures every partition gets at least one minor optimization attempt per day, preventing starvation of cold partitions.
The guard minorLeastInterval < 1 day avoids semantic conflict when users intentionally set a longer interval (e.g., 2 days).

Also added Javadoc to SELF_OPTIMIZING_MINOR_TRIGGER_INTERVAL to document the expected range and the cross-day fallback behavior.

How was this patch tested?

Existing TestOptimizingEvaluator tests pass. The change is minimal and additive — it only adds a fallback path that activates when the normal interval check fails and a day boundary has been crossed.

Closes #4055

reachMinorInterval() uses table-level lastMinorOptimizingTime which is frequently reset by high-traffic partitions. This causes partitions with fewer small files (below minor trigger file count threshold) to never get minor optimized, even when their files accumulate over time. Add a cross-day fallback: when minorLeastInterval is less than one day and the last minor optimizing happened on a different calendar day, reachMinorInterval() returns true. This ensures every partition gets at least one minor optimization attempt per day. Also add Javadoc to SELF_OPTIMIZING_MINOR_TRIGGER_INTERVAL clarifying that it should be less than one day for the cross-day fallback to work. Closes apache#4055

…to pass spotless check

…-iceberg tables on Iceberg 1.7.x Found while investigating the CI failure in apache#4179. Iceberg 1.7.x introduced a breaking change in HadoopTableOperations.commit(): it now uses reference equality (==) to compare the `base` argument against the cached currentMetadata. Previously, newTableOperations() called ops.current() to obtain the current metadata, but versionAndMetadata() inside commit() may refresh the internal state and return a different object instance. When the two references differ, commit() throws CommitFailedException("Cannot commit changes based on stale table metadata") even though the metadata content is identical, causing table loading to fail. Fix: replace ops.current() with ops.refresh() so that the returned TableMetadata reference is the same object stored in ops' internal cache. When commit() then calls versionAndMetadata(), it finds the version unchanged on disk and returns the same cached reference, satisfying the reference-equality check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-iceberg tables on Iceberg 1.7.2 (#4182) [AMORO-4163][ams] Fix CommitFailedException when loading legacy mixed-iceberg tables on Iceberg 1.7.x Found while investigating the CI failure in #4179. Iceberg 1.7.x introduced a breaking change in HadoopTableOperations.commit(): it now uses reference equality (==) to compare the `base` argument against the cached currentMetadata. Previously, newTableOperations() called ops.current() to obtain the current metadata, but versionAndMetadata() inside commit() may refresh the internal state and return a different object instance. When the two references differ, commit() throws CommitFailedException("Cannot commit changes based on stale table metadata") even though the metadata content is identical, causing table loading to fail. Fix: replace ops.current() with ops.refresh() so that the returned TableMetadata reference is the same object stored in ops' internal cache. When commit() then calls versionAndMetadata(), it finds the version unchanged on disk and returns the same cached reference, satisfying the reference-equality check. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov-commenter · 2026-04-20T03:35:22Z

Codecov Report

❌ Patch coverage is 9.09091% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 22.69%. Comparing base (b7f7de3) to head (0a43f0e).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
...moro/optimizing/plan/CommonPartitionEvaluator.java	9.09%	8 Missing and 2 partials ⚠️

❗ There is a different number of reports uploaded between BASE (b7f7de3) and HEAD (0a43f0e). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (b7f7de3) HEAD (0a43f0e)

core 1 0

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #4179      +/-   ##
============================================
- Coverage     29.75%   22.69%   -7.06%     
+ Complexity     4258     2621    -1637     
============================================
  Files           677      461     -216     
  Lines         54744    42563   -12181     
  Branches       6968     6002     -966     
============================================
- Hits          16288     9659    -6629     
+ Misses        37246    32065    -5181     
+ Partials       1210      839     -371

Flag	Coverage Δ
core	`?`
trino	`22.69% <9.09%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-05-21T00:25:32Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@amoro.apache.org list. Thank you for your contributions.

lintingbin and others added 3 commits April 13, 2026 19:28

Remove extra blank line in TableProperties for style consistency

0cb2b13

fix: restore blank line after SELF_OPTIMIZING_MINOR_TRIGGER_INTERVAL …

4e4a405

…to pass spotless check

lintingbin mentioned this pull request Apr 15, 2026

[AMORO-4163][ams] Fix CommitFailedException when loading legacy mixed-iceberg tables on Iceberg 1.7.2 #4182

Merged

8 tasks

Merge branch 'master' into fix-4055-reachMinorInterval

0a43f0e

github-actions Bot added the module:ams-server Ams server module label Apr 16, 2026

lintingbin force-pushed the fix-4055-reachMinorInterval branch from 767b166 to 0a43f0e Compare April 20, 2026 03:28

github-actions Bot removed the module:ams-server Ams server module label Apr 20, 2026

Merge branch 'master' into fix-4055-reachMinorInterval

66c2e90

github-actions Bot added the stale label May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reachMinorInterval() starvation for cold partitions#4179

Fix reachMinorInterval() starvation for cold partitions#4179
lintingbin wants to merge 5 commits into
apache:masterfrom
lintingbin:fix-4055-reachMinorInterval

lintingbin commented Apr 13, 2026

Uh oh!

codecov-commenter commented Apr 20, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lintingbin commented Apr 13, 2026

What changes were proposed in this pull request?

Root Cause

Solution

How was this patch tested?

Uh oh!

codecov-commenter commented Apr 20, 2026

Codecov Report

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants