Skip to content

feat: add gasburner gas throughput benchmark#3108

Draft
chatton wants to merge 1 commit intocian/bench-refactorfrom
cian/bench-gasburner
Draft

feat: add gasburner gas throughput benchmark#3108
chatton wants to merge 1 commit intocian/bench-refactorfrom
cian/bench-gasburner

Conversation

@chatton
Copy link
Contributor

@chatton chatton commented Feb 25, 2026

Overview

Adds a TestGasBurner benchmark to the SpamoorSuite that measures gas throughput
using a deterministic gasburner workload. Reports seconds_per_gigagas (lower is better)
to the benchmark dashboard.

Builds on #3107 which introduced the benchmark suite infrastructure.

  • gasburner_test.go: spins up gasburner spammer, measures steady-state gas window
  • measureGasThroughput / waitForMetricTarget helpers in metrics.go
  • CI job to run the gasburner benchmark and publish results to gh-pages

- Add TestGasBurner to SpamoorSuite measuring seconds_per_gigagas
- Add measureGasThroughput and waitForMetricTarget helpers
- Add gasburner CI job with dashboard publishing
@github-actions
Copy link
Contributor

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 25, 2026, 12:18 PM

@claude
Copy link
Contributor

claude bot commented Feb 25, 2026

Claude finished @chatton's task in 2m 10s —— View job


Code Review

  • Read changed files
  • Review gasburner_test.go
  • Review metrics.go
  • Review benchmark.yml workflow changes
  • Check existing patterns for consistency
  • Compile review feedback

Overall this is a clean addition that follows the established patterns from the SpamoorSuite infrastructure. A few issues worth addressing before merging.


test/e2e/benchmark/gasburner_test.go

1. Hardcoded PR tag is fragile

gasburner_test.go:22

rethTag: "pr-142",

Both benchmark tests pin to PR-branch Docker tags ("pr-140" in smoke, "pr-142" here). When those PRs are merged/deleted, these tags disappear and the benchmarks silently break. Consider pinning to a semver tag or a commit SHA, or at minimum add a comment pointing to the upstream PR so maintainers know what to update. Fix this →

2. Missing ev-node span assertion

gasburner_test.go:73-86

evNodeSpans := s.collectServiceTraces(e, "ev-node-gasburner")
evRethSpans := s.collectServiceTraces(e, "ev-reth")
// ...
assertSpanNames(t, evRethSpans, []string{...}, "ev-reth")
// ^ no assertSpanNames for evNodeSpans

evNodeSpans are collected (and collectServiceTraces already asserts at least one trace exists), but unlike TestSpamoorSmoke, no specific span names are asserted for the ev-node service. If ev-node tracing regresses, this test won't catch it, even though the spans feed into the benchmark output. Consider adding an assertSpanNames call mirroring the smoke test's ev-node assertions. Fix this →

3. Warmup threshold may not cover wallet preparation

gasburner_test.go:48

const warmupTxs = 50

The config creates 500 wallets ("max_wallets": 500) and deploys a contract. With 50 transactions as the warmup sentinel, there's a risk the measurement window starts before all wallets are funded and the contract is deployed. The comment explains the intent clearly, but if spamoor batches wallet-prep differently this could introduce noise. Consider bumping the warmup sentinel closer to max_wallets (e.g. 500) or documenting why 50 is sufficient.

4. Map alignment inconsistency

gasburner_test.go:31

"throughput":         1000,

There's extra whitespace aligning this field differently from the others. gofmt typically doesn't align map literals; running make lint-fix should resolve this.


test/e2e/benchmark/metrics.go

5. Poll errors silently swallowed in waitForMetricTarget

metrics.go:74-79

v, err := poll()
if err == nil && v >= target {
    t.Logf("metric %s reached %.0f (target %.0f)", name, v, target)
    return
}
time.Sleep(2 * time.Second)

When poll() returns an error the function silently retries. After a timeout the fatal message only reports the target wasn't reached — with no indication of whether it was errors or just slow progress. A single t.Logf on error would help significantly during CI failures. Fix this →

6. No context cancellation in polling loops

metrics.go:73,90

for time.Now().Before(deadline) {
    // ...
    time.Sleep(2 * time.Second)
}

Both waitForMetricTarget and requireHTTP use a wall-clock deadline loop with time.Sleep, so they don't respect Go's test context cancellation. If the test is cancelled externally (e.g. via t.Context() being done), the goroutine keeps sleeping. A select { case <-ctx.Done(): ...; case <-time.After(interval): } pattern would be cleaner and correctly handles cancellation.

7. Silent fallback hides degenerate result

metrics.go:55-57

if elapsed == 0 {
    elapsed = 1 * time.Second
}

When firstGasBlock == lastGasBlock, elapsed is 0 and falls back to 1 second without any warning. This makes the reported gigagas/sec value meaningless (it would just be the gas of a single block). Consider adding a t.Logf warning and/or a require that the block range spans more than one block. Fix this →

8. Sequential RPC calls for block scanning

metrics.go:32-45

for i := startBlock + 1; i <= endBlock; i++ {
    header, err := client.HeaderByNumber(ctx, new(big.Int).SetUint64(i))

The block range is scanned sequentially with one RPC call per block. For a benchmark that produces endBlock - startBlock blocks during a 10k-tx run this could mean hundreds of round-trips. Not a correctness issue, but worth noting if scan latency ever becomes a pain point.


.github/workflows/benchmark.yml

9. gasburner-benchmark timeout may be tight

benchmark.yml:80

timeout-minutes: 30

The test itself has two waitForMetricTarget calls each with a 5-minute timeout (total ≤10min), plus Docker setup, binary build, and teardown. 30 minutes should be comfortable but keep an eye on it once the job actually runs on a shared runner.

10. fail-on-alert: false for gasburner

benchmark.yml:195

fail-on-alert: false

This is consistent with the Spamoor benchmark. Appropriate for a new benchmark without an established baseline, but consider switching to true after a few runs have set a meaningful baseline.

11. Duplicated Reset local gh-pages branch steps

The git branch -D gh-pages || true step appears three times in the publish job. This works, but a reusable composite step or a shell function would reduce duplication if more benchmarks are added. Not a blocker.


Summary

Severity Finding
Moderate Hardcoded unstable rethTag: "pr-142"
Moderate No ev-node span assertions in TestGasBurner
Moderate Poll errors silently swallowed in waitForMetricTarget
Minor Warmup threshold (50) vs wallet count (500) mismatch risk
Minor No context cancellation in polling loops
Minor elapsed == 0 fallback needs a warning log
Nit Map alignment inconsistency (gofmt)

@chatton chatton marked this pull request as draft February 25, 2026 12:28
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cian/bench-gasburner

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant