Skip to content

fix(profiler): close onThreadEnd teardown race#552

Merged
jbachorik merged 3 commits into
mainfrom
worktree-fix+onthread-end-teardown-gap
May 28, 2026
Merged

fix(profiler): close onThreadEnd teardown race#552
jbachorik merged 3 commits into
mainfrom
worktree-fix+onthread-end-teardown-gap

Conversation

@jbachorik
Copy link
Copy Markdown
Collaborator

@jbachorik jbachorik commented May 28, 2026

What does this PR do?:
Closes a signal-safety gap in Profiler::onThreadEnd() where a SIGPROF or SIGVTALRM
delivered between ProfiledThread::release() and _wall_engine->unregisterThread(tid) could
call currentSignalSafe() and dereference a freed ProfiledThread (SIGSEGV in
JavaThread::thread_main_inner on JDK 25.0.3).

Also upgrades T-05 and T-09 in thread_teardown_safety_ut.cpp from crash-only guards
(signals discarded via SIG_IGN) to active guards: a real handler now calls
currentSignalSafe() and asserts the handler was actually invoked.

Motivation:
SIGSEGV crash reproduced on JDK 25.0.3. The teardown sequence in
onThreadEnd() had a window between ProfiledThread::release() freeing TLS and
unregisterThread() removing the thread from the wall-clock filter; a profiling signal
in that window could sample the freed thread.

Additional Notes:

  • updateThreadName() is kept before the SignalBlocker scope because JVMTI
    GetThreadInfo allocates memory. Under ASAN, JVMTI allocation acquires the same
    allocator lock a signal handler would try to acquire, causing deadlock. ProfiledThread
    is still live at that point, so concurrent signals are safe.
  • Pattern mirrors the prior art in libraryPatcher_linux.cpp (unregister_and_release,
    PROF-14603).

How to test the change?:

  • thread_teardown_safety_ut T-05 and T-09 now exercise the handler path and will fail
    (rather than silently pass) if signals are discarded or the handler is never invoked.
  • Existing T-01 through T-10 cover the broader teardown contract.
  • On debug builds: ./gradlew :ddprof-lib:testDebug (Linux).

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-14674

jbachorik added 2 commits May 28, 2026 10:41
Block SIGPROF/SIGVTALRM around unregisterThread + ProfiledThread::release
in onThreadEnd so a signal cannot sample a partially-torn-down thread.
Also move updateThreadName before the SignalBlocker scope to keep JVMTI
allocation outside the masked region.
T-05 and T-09 were crash-only guards: SIG_IGN discarded every signal
before any code under test ran. Replace with counting handlers that call
currentSignalSafe() and assert the handler was actually invoked.

Remove the per-worker SigGuard+SIG_IGN from t09_worker so injector
signals reach the handler instead of being discarded.
@jbachorik jbachorik added the AI label May 28, 2026
@datadog-datadog-prod-us1
Copy link
Copy Markdown
Contributor

datadog-datadog-prod-us1 Bot commented May 28, 2026

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 4 Pipeline jobs failed

DataDog/java-profiler | gtest-asan-amd64   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). LeakSanitizer detected memory leaks: Direct leak of 808 bytes and indirect leaks of 262144 and 32768 bytes in thread_teardown_safety_ut.

DataDog/java-profiler | gtest-asan-arm64   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). Memory leak detected: 295720 byte(s) leaked in 3 allocation(s) in thread_teardown_safety_ut.cpp at line 110

DataDog/java-profiler | gtest-tsan-amd64   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). Segmentation fault during execution of test binary.

View all 4 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 76a6744 | Docs | Datadog PR Page | Give us feedback!

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 28, 2026

CI Test Results

Run: #26586407194 | Commit: 32cac2c | Duration: 12m 31s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-05-28 16:19:54 UTC

@jbachorik jbachorik changed the title fix(profiler): close onThreadEnd teardown race (PROF-14674) fix(profiler): close onThreadEnd teardown race May 28, 2026
@jbachorik jbachorik marked this pull request as ready for review May 28, 2026 16:02
@jbachorik jbachorik requested a review from a team as a code owner May 28, 2026 16:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR closes a profiler thread teardown race by ensuring profiling signals cannot observe a ProfiledThread after teardown has begun but before engine unregistration and TLS release complete.

Changes:

  • Moves thread-name update before TLS release in Profiler::onThreadEnd().
  • Wraps CPU/wall engine unregistration and ProfiledThread::release() in a SignalBlocker.
  • Strengthens teardown safety tests by installing real SIGVTALRM handlers and asserting invocation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
ddprof-lib/src/main/cpp/profiler.cpp Blocks profiling signals across engine unregistration and TLS release during thread end cleanup.
ddprof-lib/src/test/cpp/thread_teardown_safety_ut.cpp Updates T-05 and T-09 to exercise currentSignalSafe() from installed signal handlers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@zhengyu123 zhengyu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine.

@jbachorik jbachorik merged commit 810fc4b into main May 28, 2026
107 checks passed
@jbachorik jbachorik deleted the worktree-fix+onthread-end-teardown-gap branch May 28, 2026 19:43
@github-actions github-actions Bot added this to the 1.44.0 milestone May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants