Skip to content

Conversation

@navinko
Copy link
Contributor

@navinko navinko commented Jan 11, 2026

What changes were proposed in this pull request?

Avoid collecting keys in memory during parallel OM table processing.

Please describe your PR in detail:

  • The new implementation keeps the iterator thread pool but removes the value-executor pool and in-memory batching.
  • Each table iterator is now owned by a single worker thread and scans only its assigned key range.
  • Each table iterator now runs on single thread and validated it, works as it is with ByteArrayCode .
  • There will be another PR for replacing ByteArrayCodec with CodecBufferCodec under ParallelTableOperation.
    https://issues.apache.org/jira/browse/HDDS-14155
  • Added unit test case for validating new flow .
  • Fixed findbugs comments.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14400

How was this patch tested?

CI:
https://github.com/navinko/ozone/actions/runs/20884674236
Validated with junit test and tested the flow by populating data to fileTable and validated the parallel processing for individual table in debug mode and normal.
bash-5.1$ ozone debug ldb --db=/data/metadata/om.db scan --column_family=fileTable --count
23916

Screenshot 2026-01-10 at 5 46 55 PM

Recon log

 Run mode with 21627 key uploaded to fileTable followed by reprocessing.

2026-01-10T15:42:37.870472510Z 2026-01-10 15:42:37,870 [ReconTaskThread-0] INFO tasks.ReconTaskControllerImpl: Task OmTableInsightTask started execution on thread ReconTaskThread-0
2026-01-10T15:42:37.870724094Z 2026-01-10 15:42:37,870 [ReconTaskThread-0] INFO tasks.OmTableInsightTask: OmTableInsightTask: Starting reprocess
2026-01-10T15:42:37.878627094Z 2026-01-10 15:42:37,878 [ReconTaskThread-0] INFO tasks.OmTableInsightTask: OmTableInsightTask: Processing table dTokenTable sequentially (non-String keys)
2026-01-10T15:42:37.888022094Z 2026-01-10 15:42:37,887 [ReconTaskThread-0] INFO util.ParallelTableIteratorOperation: OmTableInsightTask: Parallel iteration completed - Total keys processed: 2
2026-01-10T15:42:37.888184677Z 2026-01-10 15:42:37,888 [ReconTaskThread-0] INFO tasks.OmTableInsightTask: OmTableInsightTask: Processing table s3SecretTable sequentially (non-String keys)
2026-01-10T15:42:37.899993135Z 2026-01-10 15:42:37,899 [ReconTaskThread-0] INFO util.ParallelTableIteratorOperation: OmTableInsightTask: Parallel iteration completed - Total keys processed: 3
2026-01-10T15:42:37.944590219Z 2026-01-10 15:42:37,944 [ReconTaskThread-0] INFO util.ParallelTableIteratorOperation: OmTableInsightTask: Parallel iteration completed - Total keys processed: 21627
2026-01-10T15:42:37.947238802Z 2026-01-10 15:42:37,947 [ReconTaskThread-0] INFO tasks.OmTableInsightTask: OmTableInsightTask: Reprocess completed in 76 ms
2026-01-10T15:42:37.947249094Z 2026-01-10 15:42:37,947 [ReconTaskThread-0] INFO tasks.ReconTaskControllerImpl: Task OmTableInsightTask completed execution

@navinko
Copy link
Contributor Author

navinko commented Jan 11, 2026

Hi @swamirishi,
As suggested created a new PR - Avoid collecting keys in memory during parallel OM table processing #9624
Kindly review .

@jojochuang
Copy link
Contributor

@rnblough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants