HDDS-15209. Stabilize TestRocksDBCheckpointDiffer#testDifferWithDB for RocksDB layout variance#10363
Open
arunsarin85 wants to merge 5 commits into
Open
HDDS-15209. Stabilize TestRocksDBCheckpointDiffer#testDifferWithDB for RocksDB layout variance#10363arunsarin85 wants to merge 5 commits into
arunsarin85 wants to merge 5 commits into
Conversation
Comment on lines
+1064
to
+1072
| List<String> baselineDiffFileNames = | ||
| differ.getSSTDiffList( | ||
| new DifferSnapshotVersion(src, 0, fullTableToLookUp), | ||
| new DifferSnapshotVersion(snap, 0, fullTableToLookUp), | ||
| null, fullTableToLookUp, true) | ||
| .orElse(Collections.emptyList()) | ||
| .stream() | ||
| .map(SstFileInfo::getFileName) | ||
| .collect(Collectors.toList()); |
Contributor
There was a problem hiding this comment.
Thanks for the patch @arunsarin85. Constructing the baselineDiffFileNames(from which expectedDiffFiles are derived) using the same method differ.getSSTDiffList() that the actualNames is constructed changes the intention of the test.
This test is meant to verify the correctness of differ.getSSTDiffList() so the expected diff list needs to be built independently of differ.getSSTDiffList()
Maybe a better approach would be:
Pseudocode:
setup:
open active DB with auto-compactions disabled. cfOpts.setDisableAutoCompactions(true);
write_and_checkpoint():
// First snapshot
Insert keys
snap-1 = create_checkpoint()
snap_1_files = sst_file_list(snap_1) // grouped by key, file, directorytable column families
// For the rest of the snapshots
Insert keys
// Before taking the snapshot flush all txns, track all uncompacted SST files, run manual compaction then take a snapshot
activeRocksDB.get().flush()
snap_n_files = sst_file_list(activeRocksDB) // grouped by key, file, directorytable column families
compactOptions.setBottommostLevelCompaction(kForce)
activeRocksDB.get().compactRange(key/file/directoryTable, null, null, compactOptions)
snap-n = create_checkpoint()
build_expected_diffs_between_adjacent_snaps():
expected_diff_between_snap-2_snap-1 = snap_2_files - snap_1_files
expected_diff_between_snap-3_snap-2 = snap_3_files - snap_2_files
...
expected_diff_between_snap-n_snap-n-1 = snap_n_files - snap_n-1_files
diffAllSnapshots:
build_expected_diffs_between_adjacent_snaps()
for each snap_n and snap_prev from (snap_1 .. snap_n-1);
actual_diff = differ.getSSTDiffList(snap_n, snap_prev)
expected_diff = expected_diff_between_snap-n_snap-n-1 + expected_diff_between_snap-n-1_snap-n-2 + .... + expected_diff_between_snap-prev+1_expected_diff_between_snap-prev
assert expected_diff == actual_diff
| "000024.sst", "000026.sst", "000029.sst")); | ||
| List<String> backupFiles = | ||
| sstPathStream.map(p -> p.getFileName().toString()).sorted().collect(Collectors.toList()); | ||
| assertEquals(7, backupFiles.size(), "Unexpected compaction SST backup count"); |
Contributor
There was a problem hiding this comment.
Asserting the count here may still be flaky depending on how RocksDB compacts on different platforms. A more concrete check can be
- Get the diff SST files between the last snapshot and the first snapshot using
differ.getSSTDiffList(). (The correctness ofgetSSTDiffList()should already be verified indiffAllSnapshots(). See comment below) - Check whether these diff list of files are in the last snapshot.
- Otherwise they should be in SST backup directory (they should not be pruned as they will correspond to the leaf node in the compaction DAG).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Please describe your PR in detail:
TestRocksDBCheckpointDiffer#testDifferWithDB was flaky on CI because RocksDB does not always assign the same SST file numbers or compaction shape across runs and OSes. The test mixed timing (DAG not fully updated), hard-coded SST names for diff expectations and for SST backup links, and a misleading backup assertion where the filesystem listing was passed as expected and compared to a static list.
This change makes the test derive what it can from the same code under test (getSSTDiffList, compaction node map, backup dir contents) instead of pinning to one successful run’s numeric ids. It keeps the intent: compaction tracking runs, DAG-based diffs behave consistently across column-family subsets, and compaction-input backups exist and correspond to tracked SSTs.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15209
How was this patch tested?
flaky-test-check workflow with submodule=rocksdb-checkpoint-differ, test-class=org.apache.ozone.rocksdiff.TestRocksDBCheckpointDiffer, test-name=testDifferWithDB
https://github.com/arunsarin85/ozone/actions/runs/26268683979