Add evalution timeout for expressions by jpbempel · Pull Request #11741 · DataDog/dd-trace-java

jpbempel · 2026-06-25T09:51:37Z

What Does This Do

Refactor TimeoutChecker to be an interface. The former concrete class
is now an interface with a static create(Config, Duration) factory
that returns either CpuTimeoutChecker or WallTimeoutChecker based on a new config flag
(DD_INTERNAL_DYNAMIC_INSTRUMENTATION_TIMEOUT_CHECKER_MODE). Wall clock time can be problematic in case of long GC pauses or sleep or constraint resource (container with less than a CPU). A cpu time driven timeout is more resilient for this. But by default we are using wall clock time for the time being and the cpu time is just an alternative.

Refactor Expression evaluation API: All Expression/BooleanExpression implementations now accept a new EvalContext class that bundles ValueReferenceResolver + TimeoutChecker into a single context object. That way we can check the timeout in any expression.

NB: for some special expressions (contains on strings or List) we have a per item count limit as we cannot introduce the timeout check for those invocations

new config option for the evaluation timeout
(DD_DYNAMIC_INSTRUMENTATION_EVALUATION_TIMEOUT), default is 50ms

The ValueScript json serialization used for tests was revisited to use a visitor.

Tests updated/added across all expression tests,
plus new smoke test (LogProbesIntegrationTest) and a configurable TimeoutCheckerTest.

Motivation

Conditions and expression values (log templates and span decorations) are not capped by a timeout to avoid spending too much time for complex expressions.

Additional Notes

Contributor Checklist

Format the title according to the contribution guidelines
Assign the type: and (comp: or inst:) labels in addition to any other useful labels
Avoid using close, fix, or any linking keywords when referencing an issue
Use solves instead, and assign the PR milestone to the issue
Update the CODEOWNERS file on source file addition, migration, or deletion
Update public documentation with any new configuration flags or behaviors
Add your completed PR to the merge queue by commenting /merge. You can also:
- Customize the commit message associated with the merge with /merge --commit-message "..."
- Remove your PR from the merge queue with /merge -c
- Skip all merge queue checks with /merge -f --reason "reason"; please use this judiciously, as some checks do not run at the PR-level (note: the PR still needs to be mergeable, this will only skip the pre-merge build)
- Get more information in this doc

Jira ticket: [PROJ-IDENT]

pr-commenter · 2026-06-25T10:14:42Z

Debugger benchmarks

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
ci_job_date	1782998010	1782998357
end_time	2026-07-02T13:14:57	2026-07-02T13:20:42
git_branch	master	jpbempel/cond-eval-timeout
git_commit_sha	`d3398ea`	`17c6669`
start_time	2026-07-02T13:13:31	2026-07-02T13:19:17

See matching parameters

	Baseline	Candidate
ci_job_id	1824705449	1824705449
ci_pipeline_id	122352419	122352419
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
git_commit_date	1782997228	1782997228

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 6 unstable metrics.

See unchanged results

scenario	Δ mean agg_http_req_duration_min	Δ mean agg_http_req_duration_p50	Δ mean agg_http_req_duration_p75	Δ mean agg_http_req_duration_p99	Δ mean throughput
scenario:noprobe	unstable [-57.514µs; +77.921µs] or [-18.612%; +25.216%]	unstable [-74.599µs; +107.653µs] or [-20.967%; +30.257%]	unstable [-163.349µs; +386.500µs] or [-43.770%; +103.565%]	unstable [-347.825µs; +619.817µs] or [-27.813%; +49.562%]	same
scenario:basic	same	same	same	unstable [-36.736µs; +265.806µs] or [-3.602%; +26.062%]	unstable [-251.251op/s; +129.300op/s] or [-10.050%; +5.172%]
scenario:loop	same	same	same	same	same

Request duration reports for reports

gantt
    title reports - request duration [CI 0.99] : candidate=None, baseline=None
    dateFormat X
    axisFormat %s
section baseline
noprobe (355.799 µs) : 300, 412
.   : milestone, 356,
basic (297.133 µs) : 290, 304
.   : milestone, 297,
loop (8.982 ms) : 8977, 8987
.   : milestone, 8982,
section candidate
noprobe (372.326 µs) : 263, 481
.   : milestone, 372,
basic (300.866 µs) : 289, 313
.   : milestone, 301,
loop (8.978 ms) : 8970, 8986
.   : milestone, 8978,

baseline results

Scenario	Request median duration [CI 0.99]
noprobe	355.799 µs [299.968 µs, 411.631 µs]
basic	297.133 µs [290.296 µs, 303.97 µs]
loop	8.982 ms [8.977 ms, 8.987 ms]

candidate results

Scenario	Request median duration [CI 0.99]
noprobe	372.326 µs [263.289 µs, 481.364 µs]
basic	300.866 µs [288.873 µs, 312.859 µs]
loop	8.978 ms [8.97 ms, 8.986 ms]

dd-octo-sts · 2026-06-25T10:17:49Z

🟡 Java Benchmark SLOs — Performance SLO warning (near threshold)

Suite	Status
Startup	🟡 warning

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results

Scenario	Candidate	master	Δ (95% CI of mean)
startup:insecure-bank:iast:Agent	13.96 s	14.01 s	[-1.3%; +0.6%] (no difference)
startup:insecure-bank:tracing:Agent	12.95 s	12.98 s	[-1.1%; +0.5%] (no difference)
startup:petclinic:appsec:Agent	16.33 s	16.67 s	[-6.4%; +2.3%] (no difference)
startup:petclinic:iast:Agent	16.82 s	16.48 s	[-2.4%; +6.6%] (no difference)
startup:petclinic:profiling:Agent	16.75 s	16.81 s	[-1.6%; +0.8%] (no difference)
startup:petclinic:sca:Agent	16.76 s	16.66 s	[-0.4%; +1.6%] (no difference)
startup:petclinic:tracing:Agent	15.56 s	16.05 s	[-7.0%; +0.9%] (no difference)

Commit: 17c66699 · CI Pipeline · Benchmarking Platform UI

Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

dd-octo-sts · 2026-06-26T14:28:43Z

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 23714d72cd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-26T14:32:14Z

+  public Duration getTimeOut() {
+    return null;


Return the configured timeout from CPU checker

When DD_INTERNAL_DYNAMIC_INSTRUMENTATION_TIMEOUT_CHECKER_MODE=CPU is enabled and an expression actually times out, ExpressionHelper.checkTimeout calls checker.getTimeOut().toMillis() to build the EvaluationException; returning null here turns that path into a NullPointerException instead of the expected timeout error, so CPU mode reports unexpected/null evaluation errors rather than a proper timeout.

Useful? React with 👍 / 👎.

Conditions and expression values (log templates and span decorations) are not capped by a timeout to avoid spending too much time for complex expressions. Refactor TimeoutChecker to be an interface. The former concrete class is now an interface with a static create(Config, Duration) factory that returns either CpuTimeoutChecker or WallTimeoutChecker based on a new config flag (DD_INTERNAL_DYNAMIC_INSTRUMENTATION_TIMEOUT_CHECKER_MODE). Wall clock time can be problematic in case of long GC pauses or sleep or constraint resource (container with less than a CPU). A cpu time driven timeout is more resilient for this. But by default we are using wall clock time for the time being and the cpu time is just an alternative. Refactor Expression evaluation API: All Expression/BooleanExpression implementations now accept a new EvalContext class that bundles ValueReferenceResolver + TimeoutChecker into a single context object. That way we can check the timeout in any expression. NB: for some special expressions (contains on strings or List) we have a per item count limit as we cannot introduce the timeout check for those invocations new config option for the evaluation timeout (DD_DYNAMIC_INSTRUMENTATION_EVALUATION_TIMEOUT), default is 50ms The ValueScript json serialization used for tests was revisited to use a visitor. Tests updated/added across all expression tests, plus new smoke test (LogProbesIntegrationTest) and a configurable TimeoutCheckerTest.

datadog-datadog-prod-us1 · 2026-07-01T21:09:15Z

🎯 Code Coverage (details)
• Patch Coverage: 51.07%
• Overall Coverage: 56.91% (-0.04%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 17c6669 | Docs | Datadog PR Page | Give us feedback!}

evanchooly

a few minor nits/questions but lgtm

evanchooly · 2026-07-02T13:58:47Z


-  private final long start;
-  private final Duration timeOut;
+  String CPU = "CPU";


gross. enum? 😄

The config is coming from Config and put an enum in Config is complicated for nothing. so the config is a String.

evanchooly · 2026-07-02T14:17:39Z

    ComparisonExpression expression =
        new ComparisonExpression(new NumericValue(1, INT), ValueExpression.UNDEFINED, EQ);
-    assertFalse(expression.evaluate(NoopResolver.INSTANCE));
+    assertFalse(expression.evaluate(new EvalContext(NoopResolver.INSTANCE, null)));


why the inconsistent creation of EvalContexts? some are factory calls, others ctors.

I have not put a helper method for all creation of EvalContext depending on the arguments

evanchooly · 2026-07-02T14:20:15Z

+  private List<Integer> largeList = new ArrayList<>();
+
+  {
+    for (int i = 0; i < 1_000_001; i++) {


I know this is "just a test class" but a proper, named constructor would be cleaner here.

initialization code is closer to the field
for static field it would be the same.
the named constructor does not bring any value here.

evanchooly · 2026-07-02T14:29:46Z


+  @Test
+  void testLargeList() {
+    List<Integer> largeList = new ArrayList<>();


because code golfing is fun, you could use Collections.nCopies(1_000_001, 0) here.

But that aside, I'd consider using the max constant above to make sure the test stays in sync with the max value.

this could be interesting for optimization, but unfortunately What I am testing here is the fact that iterating over the list is taking too long and hit the timeout. With the nCopies it will be harder :D

evanchooly · 2026-07-02T14:35:44Z

            // from line: System.out.println("fullMethod");
            // to line: + String.join(",", argVar);
-            .where(MAIN_CLASS_NAME, 88, 97)
+            .where(MAIN_CLASS_NAME, 95, 104)


didn't we add a way to annotate/comment the line in the source and have the test find it to avoid having to maintain line numbers like this?

I have something for unit tests, but for integrations tests it's harder to inject. Not worth the pain as I am modifying rarely this

dd-octo-sts · 2026-07-02T15:23:27Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-07-02T15:23:31Z

View all feedbacks in Devflow UI.

2026-07-02 15:23:30 UTC ℹ️ Start processing command /merge

2026-07-02 15:23:36 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 2h (p90).

2026-07-02 17:24:13 UTC ❌ MergeQueue: The build pipeline has timeout

The merge request has been interrupted because the build 1624692089015068691 took longer than expected. The current limit for the base branch 'master' is 120 minutes.

jpbempel force-pushed the jpbempel/cond-eval-timeout branch 4 times, most recently from 09e48fe to 23714d7 Compare June 26, 2026 08:19

jpbempel marked this pull request as ready for review June 26, 2026 14:28

jpbempel requested review from a team as code owners June 26, 2026 14:28

jpbempel requested review from PerfectSlayer and evanchooly and removed request for a team June 26, 2026 14:28

chatgpt-codex-connector Bot reviewed Jun 26, 2026

View reviewed changes

jpbempel added type: enhancement Enhancements and improvements comp: debugger Dynamic Instrumentation labels Jun 26, 2026

jpbempel added 2 commits July 1, 2026 22:50

fix CpuTimeoutChecker

b500d56

jpbempel force-pushed the jpbempel/cond-eval-timeout branch from 0f02315 to 09404b0 Compare July 1, 2026 20:50

fix config options

255d479

jpbempel force-pushed the jpbempel/cond-eval-timeout branch from 09404b0 to 255d479 Compare July 2, 2026 08:51

jpbempel added 2 commits July 2, 2026 11:12

fix supported config

b22a840

update supported config

17c6669

evanchooly approved these changes Jul 2, 2026

View reviewed changes

jpbempel enabled auto-merge July 2, 2026 15:20

jpbempel added this pull request to the merge queue Jul 2, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jul 2, 2026

Uh oh!

Conversation

jpbempel commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

Motivation

Additional Notes

Contributor Checklist

Uh oh!

pr-commenter Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Debugger benchmarks

Parameters

Summary

Uh oh!

dd-octo-sts Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟡 Java Benchmark SLOs — Performance SLO warning (near threshold)

Uh oh!

dd-octo-sts Bot commented Jun 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

datadog-datadog-prod-us1 Bot commented Jul 1, 2026 • edited by datadog-prod-us1-3 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evanchooly left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpbempel Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dd-octo-sts Bot commented Jul 2, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jpbempel commented Jun 25, 2026 •

edited

Loading

pr-commenter Bot commented Jun 25, 2026 •

edited

Loading

dd-octo-sts Bot commented Jun 25, 2026 •

edited

Loading

datadog-datadog-prod-us1 Bot commented Jul 1, 2026 •

edited by datadog-prod-us1-3 Bot

Loading

jpbempel Jul 2, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 Bot commented Jul 2, 2026 •

edited

Loading