fix(profiler): close onThreadEnd teardown race#552
Conversation
Block SIGPROF/SIGVTALRM around unregisterThread + ProfiledThread::release in onThreadEnd so a signal cannot sample a partially-torn-down thread. Also move updateThreadName before the SignalBlocker scope to keep JVMTI allocation outside the masked region.
T-05 and T-09 were crash-only guards: SIG_IGN discarded every signal before any code under test ran. Replace with counting handlers that call currentSignalSafe() and assert the handler was actually invoked. Remove the per-worker SigGuard+SIG_IGN from t09_worker so injector signals reach the handler instead of being discarded.
|
CI Test ResultsRun: #26586407194 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-05-28 16:19:54 UTC |
There was a problem hiding this comment.
Pull request overview
This PR closes a profiler thread teardown race by ensuring profiling signals cannot observe a ProfiledThread after teardown has begun but before engine unregistration and TLS release complete.
Changes:
- Moves thread-name update before TLS release in
Profiler::onThreadEnd(). - Wraps CPU/wall engine unregistration and
ProfiledThread::release()in aSignalBlocker. - Strengthens teardown safety tests by installing real SIGVTALRM handlers and asserting invocation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
ddprof-lib/src/main/cpp/profiler.cpp |
Blocks profiling signals across engine unregistration and TLS release during thread end cleanup. |
ddprof-lib/src/test/cpp/thread_teardown_safety_ut.cpp |
Updates T-05 and T-09 to exercise currentSignalSafe() from installed signal handlers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What does this PR do?:
Closes a signal-safety gap in
Profiler::onThreadEnd()where aSIGPROForSIGVTALRMdelivered between
ProfiledThread::release()and_wall_engine->unregisterThread(tid)couldcall
currentSignalSafe()and dereference a freedProfiledThread(SIGSEGV inJavaThread::thread_main_inneron JDK 25.0.3).Also upgrades T-05 and T-09 in
thread_teardown_safety_ut.cppfrom crash-only guards(signals discarded via
SIG_IGN) to active guards: a real handler now callscurrentSignalSafe()and asserts the handler was actually invoked.Motivation:
SIGSEGV crash reproduced on JDK 25.0.3. The teardown sequence in
onThreadEnd()had a window betweenProfiledThread::release()freeing TLS andunregisterThread()removing the thread from the wall-clock filter; a profiling signalin that window could sample the freed thread.
Additional Notes:
updateThreadName()is kept before theSignalBlockerscope because JVMTIGetThreadInfoallocates memory. Under ASAN, JVMTI allocation acquires the sameallocator lock a signal handler would try to acquire, causing deadlock.
ProfiledThreadis still live at that point, so concurrent signals are safe.
libraryPatcher_linux.cpp(unregister_and_release,PROF-14603).
How to test the change?:
thread_teardown_safety_utT-05 and T-09 now exercise the handler path and will fail(rather than silently pass) if signals are discarded or the handler is never invoked.
./gradlew :ddprof-lib:testDebug(Linux).For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.