Fix race condition in JournalVirtualDev truncation (#603)#888
Open
OmDoshi13 wants to merge 2 commits into
Open
Fix race condition in JournalVirtualDev truncation (#603)#888OmDoshi13 wants to merge 2 commits into
OmDoshi13 wants to merge 2 commits into
Conversation
Concurrent truncate() and read/write operations on JournalVirtualDev::Descriptor had no synchronization, causing crashes when truncate() erased chunks from m_journal_chunks while readers iterated the same vector. Changes: - Add mutable std::shared_mutex m_chunk_mutex to Descriptor; truncate() and append_chunk() acquire unique_lock, all reader methods acquire shared_lock, and process_pwrite_offset() now acquires shared_lock to protect the pwrite path against concurrent truncation - Change m_data_start_offset from plain off_t to std::atomic<off_t> with release/acquire semantics, eliminating undefined behaviour from unlocked reads in tail_offset() and alloc_next_append_blk() - Make update_data_start_offset() private; add public init_data_start_offset() (acquires unique_lock) for safe use from log_dev.cpp during init/recovery - Update log_dev.cpp and test_journal_vdev.cpp to call init_data_start_offset() Fixes eBay#603
process_pwrite_offset() now holds shared_lock on m_chunk_mutex (added in the eBay#603 fix). The LOGTRACEMOD call at line 319 was invoking to_string(), which also acquires shared_lock on the same mutex from the same thread. std::shared_mutex does not allow recursive same-thread locking — this would deadlock whenever journalvdev TRACE logging is enabled. Replace to_string() with to_string_nolock() since the lock is already held.
77fdc51 to
0c58654
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the concurrency bug reported in #603 where
truncate()modifiedm_journal_chunksandm_data_start_offsetconcurrently with read and write operations, with no synchronization, causing crashes when a chunk was erased mid-iteration.mutable std::shared_mutex m_chunk_mutextoDescriptor; writers (truncate,append_chunk) holdunique_lock, all reader methods holdshared_lock, andprocess_pwrite_offset(the pwrite path) now also holdsshared_lockm_data_start_offsetfromoff_ttostd::atomic<off_t>(release/acquire) to eliminate UB from unlocked reads intail_offset()andalloc_next_append_blk()update_data_start_offset()private; add publicinit_data_start_offset()(acquiresunique_lock) for use bylog_dev.cppduring init/recoveryFiles changed
src/lib/device/journal_vdev.hppm_data_start_offsetatomic, addinit_data_start_offset()src/lib/device/journal_vdev.cppinit_data_start_offset()impl, use.store()/.load()src/lib/logstore/log_dev.cppupdate_data_start_offset()→init_data_start_offset()src/tests/test_journal_vdev.cppTest plan
test_journal_vdevpasses (compile fix at line 119 verified)ctest --output-on-failureinside Docker buildRelated
Closes #603