Skip to content

BR: Fix GC blkID race causing stale CORRUPTED data#416

Merged
koujl merged 2 commits into
eBay:stable/v4.xfrom
koujl:br-crpt
Jun 1, 2026
Merged

BR: Fix GC blkID race causing stale CORRUPTED data#416
koujl merged 2 commits into
eBay:stable/v4.xfrom
koujl:br-crpt

Conversation

@koujl
Copy link
Copy Markdown
Contributor

@koujl koujl commented May 27, 2026

During PG snapshot, PGBlobIterator captures blob pbas in cur_blob_list_. If GC runs before the actual reads, all blobs in the shard are moved to new blkIDs, causing verify_blob to fail and incorrectly marking blobs as CORRUPTED with invalid stale data.

Fix: on verify_blob failure, re-check the index table. If pbas changed, signal STALE_BLKID, refresh the entire shard's blob list, and trigger NuRaft resend.
Add an infinite recursive loop to load_blob_data to re-read the latest index and reload if blkID changes to ensure we get the correct data.

Comment thread src/lib/homestore_backend/pg_blob_iterator.cpp
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 29, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (stable/v4.x@708a3ef). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/lib/homestore_backend/pg_blob_iterator.cpp 66.66% 2 Missing and 3 partials ⚠️
...lib/homestore_backend/snapshot_receive_handler.cpp 0.00% 0 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@              Coverage Diff               @@
##             stable/v4.x     #416   +/-   ##
==============================================
  Coverage               ?   53.36%           
==============================================
  Files                  ?       36           
  Lines                  ?     5350           
  Branches               ?      670           
==============================================
  Hits                   ?     2855           
  Misses                 ?     2198           
  Partials               ?      297           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@koujl koujl marked this pull request as ready for review May 29, 2026 09:58
xiaoxichen
xiaoxichen previously approved these changes May 29, 2026
Copy link
Copy Markdown
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@koujl koujl requested a review from yuwmao June 1, 2026 01:18
Comment thread src/lib/homestore_backend/pg_blob_iterator.cpp Outdated
During PG snapshot, PGBlobIterator captures blob pbas in cur_blob_list_.
If GC runs before the actual reads, all blobs in the shard are moved to
new blkIDs, causing verify_blob to fail and incorrectly marking blobs as
CORRUPTED with invalid stale data.

Add an infinite recursive loop to load_blob_data to re-read the latest
index and reload if blkID changes to ensure we get the correct data.
@JacksonYao287
Copy link
Copy Markdown
Collaborator

LGTM. pls rebase this PR and update conan version

@koujl koujl merged commit 6cabe0f into eBay:stable/v4.x Jun 1, 2026
26 checks passed
@koujl koujl deleted the br-crpt branch June 1, 2026 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants