Add evalution timeout for expressions#11741
Conversation
Debugger benchmarksParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 6 unstable metrics. See unchanged results
Request duration reports for reportsgantt
title reports - request duration [CI 0.99] : candidate=None, baseline=None
dateFormat X
axisFormat %s
section baseline
noprobe (355.799 µs) : 300, 412
. : milestone, 356,
basic (297.133 µs) : 290, 304
. : milestone, 297,
loop (8.982 ms) : 8977, 8987
. : milestone, 8982,
section candidate
noprobe (372.326 µs) : 263, 481
. : milestone, 372,
basic (300.866 µs) : 289, 313
. : milestone, 301,
loop (8.978 ms) : 8970, 8986
. : milestone, 8978,
|
🟡 Java Benchmark SLOs — Performance SLO warning (near threshold)
PR vs. master results
Commit: Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion. |
09e48fe to
23714d7
Compare
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 23714d72cd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| public Duration getTimeOut() { | ||
| return null; |
There was a problem hiding this comment.
Return the configured timeout from CPU checker
When DD_INTERNAL_DYNAMIC_INSTRUMENTATION_TIMEOUT_CHECKER_MODE=CPU is enabled and an expression actually times out, ExpressionHelper.checkTimeout calls checker.getTimeOut().toMillis() to build the EvaluationException; returning null here turns that path into a NullPointerException instead of the expected timeout error, so CPU mode reports unexpected/null evaluation errors rather than a proper timeout.
Useful? React with 👍 / 👎.
Conditions and expression values (log templates and span decorations) are not capped by a timeout to avoid spending too much time for complex expressions. Refactor TimeoutChecker to be an interface. The former concrete class is now an interface with a static create(Config, Duration) factory that returns either CpuTimeoutChecker or WallTimeoutChecker based on a new config flag (DD_INTERNAL_DYNAMIC_INSTRUMENTATION_TIMEOUT_CHECKER_MODE). Wall clock time can be problematic in case of long GC pauses or sleep or constraint resource (container with less than a CPU). A cpu time driven timeout is more resilient for this. But by default we are using wall clock time for the time being and the cpu time is just an alternative. Refactor Expression evaluation API: All Expression/BooleanExpression implementations now accept a new EvalContext class that bundles ValueReferenceResolver + TimeoutChecker into a single context object. That way we can check the timeout in any expression. NB: for some special expressions (contains on strings or List) we have a per item count limit as we cannot introduce the timeout check for those invocations new config option for the evaluation timeout (DD_DYNAMIC_INSTRUMENTATION_EVALUATION_TIMEOUT), default is 50ms The ValueScript json serialization used for tests was revisited to use a visitor. Tests updated/added across all expression tests, plus new smoke test (LogProbesIntegrationTest) and a configurable TimeoutCheckerTest.
0f02315 to
09404b0
Compare
|
🎯 Code Coverage (details) 🔗 Commit SHA: 17c6669 | Docs | Datadog PR Page | Give us feedback! |
09404b0 to
255d479
Compare
evanchooly
left a comment
There was a problem hiding this comment.
a few minor nits/questions but lgtm
|
|
||
| private final long start; | ||
| private final Duration timeOut; | ||
| String CPU = "CPU"; |
There was a problem hiding this comment.
The config is coming from Config and put an enum in Config is complicated for nothing. so the config is a String.
| ComparisonExpression expression = | ||
| new ComparisonExpression(new NumericValue(1, INT), ValueExpression.UNDEFINED, EQ); | ||
| assertFalse(expression.evaluate(NoopResolver.INSTANCE)); | ||
| assertFalse(expression.evaluate(new EvalContext(NoopResolver.INSTANCE, null))); |
There was a problem hiding this comment.
why the inconsistent creation of EvalContexts? some are factory calls, others ctors.
There was a problem hiding this comment.
I have not put a helper method for all creation of EvalContext depending on the arguments
| private List<Integer> largeList = new ArrayList<>(); | ||
|
|
||
| { | ||
| for (int i = 0; i < 1_000_001; i++) { |
There was a problem hiding this comment.
I know this is "just a test class" but a proper, named constructor would be cleaner here.
There was a problem hiding this comment.
initialization code is closer to the field
for static field it would be the same.
the named constructor does not bring any value here.
|
|
||
| @Test | ||
| void testLargeList() { | ||
| List<Integer> largeList = new ArrayList<>(); |
There was a problem hiding this comment.
because code golfing is fun, you could use Collections.nCopies(1_000_001, 0) here.
But that aside, I'd consider using the max constant above to make sure the test stays in sync with the max value.
There was a problem hiding this comment.
this could be interesting for optimization, but unfortunately What I am testing here is the fact that iterating over the list is taking too long and hit the timeout. With the nCopies it will be harder :D
| // from line: System.out.println("fullMethod"); | ||
| // to line: + String.join(",", argVar); | ||
| .where(MAIN_CLASS_NAME, 88, 97) | ||
| .where(MAIN_CLASS_NAME, 95, 104) |
There was a problem hiding this comment.
didn't we add a way to annotate/comment the line in the source and have the test find it to avoid having to maintain line numbers like this?
There was a problem hiding this comment.
I have something for unit tests, but for integrations tests it's harder to inject. Not worth the pain as I am modifying rarely this
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
The merge request has been interrupted because the build 1624692089015068691 took longer than expected. The current limit for the base branch 'master' is 120 minutes. |
What Does This Do
Refactor TimeoutChecker to be an interface. The former concrete class
is now an interface with a static create(Config, Duration) factory
that returns either CpuTimeoutChecker or WallTimeoutChecker based on a new config flag
(
DD_INTERNAL_DYNAMIC_INSTRUMENTATION_TIMEOUT_CHECKER_MODE). Wall clock time can be problematic in case of long GC pauses or sleep or constraint resource (container with less than a CPU). A cpu time driven timeout is more resilient for this. But by default we are using wall clock time for the time being and the cpu time is just an alternative.Refactor Expression evaluation API: All Expression/BooleanExpression implementations now accept a new EvalContext class that bundles ValueReferenceResolver + TimeoutChecker into a single context object. That way we can check the timeout in any expression.
NB: for some special expressions (contains on strings or List) we have a per item count limit as we cannot introduce the timeout check for those invocations
new config option for the evaluation timeout
(
DD_DYNAMIC_INSTRUMENTATION_EVALUATION_TIMEOUT), default is 50msThe ValueScript json serialization used for tests was revisited to use a visitor.
Tests updated/added across all expression tests,
plus new smoke test (LogProbesIntegrationTest) and a configurable TimeoutCheckerTest.
Motivation
Conditions and expression values (log templates and span decorations) are not capped by a timeout to avoid spending too much time for complex expressions.
Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issue/merge. You can also:/merge --commit-message "..."/merge -c/merge -f --reason "reason"; please use this judiciously, as some checks do not run at the PR-level (note: the PR still needs to be mergeable, this will only skip the pre-merge build)Jira ticket: [PROJ-IDENT]