-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Gc benchmarking #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
stanbrub
wants to merge
22
commits into
deephaven:main
Choose a base branch
from
stanbrub:gc-benchmarking
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
9d490f5
Disabled some benchmarks and scaled
stanbrub 47f066f
Scaled up basic math combo
stanbrub dea74d7
Merge branch 'deephaven:main' into gc-benchmarking
stanbrub 15cf1f4
Added a Local Parquet Generator as opposed to going through Kafka
stanbrub 8604111
Added local parquet generator and 1st training test
stanbrub 83b1c11
Added more train benchmarks. Improved Local Parquet Generator
stanbrub c552c01
Revert BasicMathCombo
stanbrub 62aa96a
Revert BasicMathCombo
stanbrub f78ca22
Reverted scale and disabled for pre-train standard tests used for pre…
stanbrub e5412e7
Parallelized local parquet. worked around directory link failures
stanbrub ff4d891
Added 1st pass at benchmark even retrieval with JFR
stanbrub f35ab4f
Merge branch 'deephaven:main' into gc-benchmarking
stanbrub 25629cc
Added jfr events
stanbrub 254cca0
Merge branch 'deephaven:main' into gc-benchmarking
stanbrub 528c365
Added UGP events
stanbrub bd5ff02
Rescaled only static trained for 120 secs
stanbrub 75449bb
Updated adhoc for local parquet env variables
stanbrub ec2d95e
Open up dh data dir so local parquet can work
stanbrub a402a54
More logging for benchmark runs
stanbrub 4cf8357
Scaling back AggBy because of system lockup
stanbrub 8507794
Restrict the number of parquet threads and memory for the runner
stanbrub c0b5e7a
Fixed NaturalJoin OOM
stanbrub File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
41 changes: 41 additions & 0 deletions
41
src/it/java/io/deephaven/benchmark/tests/train/AggByTrainTest.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| /* Copyright (c) 2026-2026 Deephaven Data Labs and Patent Pending */ | ||
| package io.deephaven.benchmark.tests.train; | ||
|
|
||
| import org.junit.jupiter.api.*; | ||
|
|
||
| /** | ||
| * Training tests for the aggBy table operations that do aggregations (e.g. sum, std, min/max. var, avg). See | ||
| * <code>TrainTestRunner</code> for more information. | ||
| */ | ||
| public class AggByTrainTest { | ||
| final TrainTestRunner runner = new TrainTestRunner(this); | ||
|
|
||
| void setup(double rowFactor) { | ||
| runner.tables(rowFactor, "timed"); | ||
|
|
||
| var setupStr = """ | ||
| from deephaven import agg | ||
|
|
||
| aggs = [ | ||
| agg.sum_('Sum=num1'), agg.std('Std=num2'), agg.min_('Min=num1'), agg.max_('Max=num2'), | ||
| agg.avg('Avg=num1'), agg.var('Var=num2'), agg.count_('num1') | ||
| ] | ||
| """; | ||
| runner.addSetupQuery(setupStr); | ||
| } | ||
|
|
||
| @Test | ||
| void aggBy0Groups() { | ||
| setup(572); | ||
| var q = "timed.agg_by(aggs)"; | ||
| runner.test("AggBy- No Groups", 1, q, "num1", "num2"); | ||
| } | ||
|
|
||
| @Test | ||
| void aggBy2Groups() { | ||
| setup(66); | ||
| var q = "timed.agg_by(aggs, by=['key1', 'key2'])"; | ||
| runner.test("AggBy- 2 Groups 10K Unique Combos", 10100, q, "key1", "key2", "num1", "num2"); | ||
| } | ||
|
|
||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we can't use the timestamp from the file? I have a few worries about doing rowset calculation as part of the benchmark (to come up with ii).
For the actual test benchmarks, without a select we would also just prefer more/bigger parquet files to avoid the overhead of going through the merge data structures. We might even be able to get away with symlinks to have the data just repeate itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the "train" benchmarks, since we don't use Scale Factors, that section of code will not be hit. This is only used when we are doing merges to simulate larger data sets. So for the nightly runs, this will happen BEFORE the "select" into memory, which is not included in the measurement. But for the "train" benchmarks, we only read timestamps directly from the parquet file(s), and that only if they are used in the benchmark (like for rollingtime).