feat: adding spammoor test to benchmark by chatton · Pull Request #3105 · evstack/ev-node

chatton · 2026-02-24T13:12:23Z

Overview

Uploads the spamoor test results to https://ev.xyz/dev/bench/ on a per span level.

Summary by CodeRabbit

New Features
- Block Executor and Spamoor trace benchmarks are now collected and published automatically.
- Tests can optionally emit stable JSON benchmark outputs for external analysis.
Chores
- CI workflow updated to run multi-job benchmark aggregation with artifact uploads and conditional publishing.
- Improved gh-pages reset and unconditional result-storage steps to reduce race conditions.
Tests
- Added per-operation timing aggregation and optional benchmark output in e2e smoke tests.

coderabbitai · 2026-02-24T13:12:32Z

📝 Walkthrough

Walkthrough

Restructures benchmark CI into parallel jobs (evm-benchmark, spamoor-benchmark, block-executor) with artifact uploads and a consolidated publish-benchmarks job that aggregates and conditionally publishes results; adds per-operation span aggregation and JSON benchmark export from E2E tests (BENCH_JSON_OUTPUT).

Changes

Cohort / File(s)	Summary
Benchmark Workflow Orchestration `.github/workflows/benchmark.yml`	Adds pull_request trigger, splits benchmarking into multiple jobs (evm-benchmark, spamoor-benchmark, block-executor), uploads artifacts per job, adds publish-benchmarks job that downloads artifacts, resets local gh-pages before publishing, and conditionally auto-pushes/saves benchmark data depending on event type.
E2E Smoke Test — JSON Export `test/e2e/evm_spamoor_smoke_test.go`	Adds optional benchmark JSON output path controlled by `BENCH_JSON_OUTPUT`; when set, writes ev-node span aggregates to the specified JSON file after trace collection.
Span Aggregation & Reporting `test/e2e/evm_test_common.go`	Introduces `spanStats` and `aggregateSpanStats` for per-operation count/total/min/max aggregation, refactors printTraceReport to use aggregator, and adds `writeTraceBenchmarkJSON` to emit a stable customSmallerIsBetter JSON benchmark file.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant GH as GitHub Actions
participant EVM as evm-benchmark job
participant SPAM as spamoor-benchmark job
participant BE as block-executor job
participant ART as Artifact Storage
participant PUB as publish-benchmarks job
GH->>EVM: start evm-benchmark
GH->>SPAM: start spamoor-benchmark
GH->>BE: start block-executor
EVM->>ART: upload evm-benchmark artifact
SPAM->>ART: upload spamoor-bench artifact
BE->>ART: upload block-executor artifact
ART->>PUB: artifacts available
PUB->>ART: download artifacts
PUB->>PUB: reset local gh-pages, process and store benchmarks
PUB->>GitHub: conditional push/save to gh-pages

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

chore: gather spans from both ev-node and ev-reth in spamoor test #3099: Modifies the same E2E tracing/benchmarking files and span aggregation/reporting logic.

Suggested reviewers

alpe

Poem

🐰
I hopped through CI with a grin so wide,
Split the jobs and sent their logs to hide,
Spans I counted, JSON neat and small,
Artifacts gathered in a cozy stall,
Now benchmarks sing — rejoice, let's stride! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete; it only states the goal without providing context, background, rationale, or explaining the changes in the workflow architecture and new jobs.	Expand the description to explain the workflow changes, the new jobs (spamoor-benchmark, publish-benchmarks), artifact handling, and the motivation for this restructuring.
Title check	❓ Inconclusive	The title is vague and partially related; it mentions 'spammoor' and 'benchmark' but lacks clarity about the main change—multi-job benchmark aggregation, artifact handling, and results publishing.	Clarify the title to reflect the primary change, such as 'ci: restructure benchmark workflow with multi-job aggregation and publishing' or similar.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cian/experiment-with-granularspans

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-24T13:12:38Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Feb 24, 2026, 4:25 PM

claude · 2026-02-24T13:12:44Z

Claude finished @chatton's task in 2m 1s —— View job

Code Review

Gather context and read changed files
Review .github/workflows/benchmark.yml
Review test/e2e/evm_test_common.go
Review test/e2e/evm_spamoor_smoke_test.go
Post final review

Overall this is a clean, well-scoped PR. The fan-out/fan-in pattern for benchmarks is the right approach for avoiding gh-pages race conditions. A few issues to consider:

`.github/workflows/benchmark.yml`

if: always() on publish steps may produce confusing failures

All store steps in publish-benchmarks use if: always(). If the spamoor-benchmark job fails (e.g. test flake, Docker unavailable), the artifact won't be uploaded. The Download Spamoor benchmark results step will succeed vacuously, but Store Spamoor Trace result will then fail because test/e2e/spamoor_bench.json doesn't exist — yielding a cryptic action error rather than a clear "artifact not found" message.

Consider conditional execution for the spamoor store step:

- name: Store Spamoor Trace result
  if: needs.spamoor-benchmark.result == 'success'
  uses: benchmark-action/github-action-benchmark@...

And update the needs condition at the job level to allow partial success:

publish-benchmarks:
  needs: [evm-benchmark, spamoor-benchmark]
  if: always()
  runs-on: ubuntu-latest

Fix this →

needs doesn't include if: always() on the publish job itself

As written, if spamoor-benchmark fails, publish-benchmarks won't run at all (the default needs behavior is to skip on upstream failure). If the intent is to always publish whatever results are available, the job also needs if: always().

`test/e2e/evm_test_common.go`

aggregateSpanStats refactor is clean — good separation of concerns. No issues with the aggregation logic; the first-span initialisation (min: d, max: d) and subsequent comparisons are correct.

writeTraceBenchmarkJSON — only avg is emitted

The current implementation only writes average values per span operation. The benchmarkEntry struct supports a single Value float64, so this is a format constraint, not a bug. The last commit message (ci: only emit avg in benchmark JSON to reduce alert noise) confirms this is intentional.

One minor nit: t.Fatalf for the os.WriteFile error is appropriate in test infrastructure but will abort the entire test run if disk I/O fails. A t.Errorf + early return would be less disruptive if you want benchmark output failures to be non-fatal. Low priority given this is test code.

File permissions on benchmark JSON output

os.WriteFile(outputPath, data, 0644)

0644 is fine for a benchmark output file — readable by owner/group/other, writable only by owner.

`test/e2e/evm_spamoor_smoke_test.go`

Redundant nil-guard at call site

// line 169-171
if outputPath := os.Getenv("BENCH_JSON_OUTPUT"); outputPath != "" {
    writeTraceBenchmarkJSON(t, "SpamoorSmoke", toTraceSpans(evNodeSpans), outputPath)
}

writeTraceBenchmarkJSON already returns early if outputPath == "" (line 948-950 of evm_test_common.go). The guard here is harmless but redundant. Either pattern is fine — the explicit env-var check at the call site arguably makes the intent clearer to readers.

Only ev-node spans are exported

evRethSpans are not written to the benchmark JSON. Given the PR focus is ev-node performance, this seems intentional — worth a brief comment to make it explicit.

Summary

Area	Severity	Finding
`benchmark.yml`	Medium	`publish-benchmarks` will silently fail when `spamoor-benchmark` artifact is missing; add `if: always()` to job + conditional step
`evm_test_common.go`	Low	`t.Fatalf` for benchmark write errors is test-aborting; consider `t.Errorf`
`evm_spamoor_smoke_test.go`	Nit	Redundant env-var guard before calling `writeTraceBenchmarkJSON`
`evm_spamoor_smoke_test.go`	Nit	No comment explaining why only ev-node spans are exported

The core logic and architecture are sound. The main actionable item is the always() / conditional publishing concern in the workflow.

codecov · 2026-02-24T13:15:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.90%. Comparing base (67e18bd) to head (c1da279).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3105      +/-   ##
==========================================
- Coverage   60.91%   60.90%   -0.02%     
==========================================
  Files         113      113              
  Lines       11617    11617              
==========================================
- Hits         7077     7075       -2     
- Misses       3742     3743       +1     
- Partials      798      799       +1

Flag	Coverage Δ
combined	`60.90% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…recision

chatton · 2026-02-24T14:26:56Z

.github/workflows/benchmark.yml

-          go test -bench=BenchmarkProduceBlock -benchmem -run='^$' \
-            ./block/internal/executing/... > block_executor_output.txt
-      - name: Store Block Executor benchmark result
+      - name: Store Block Executor result


because there are now multiple jobs pushing benchmarks, in order to not hit race conditions, we can gather up all the results, then sequentially push them all one by one.

github-actions

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Spamoor Trace Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.

Benchmark suite	Current: `f4bf8e6`	Previous: `f4a09a2`	Ratio
`SpamoorSmoke - ForcedInclusionRetriever.RetrieveForcedIncludedTxs (max)`	`44` us	`20` us	`2.20`
`SpamoorSmoke - Sequencer.SubmitBatchTxs (max)`	`208` us	`132` us	`1.58`
`SpamoorSmoke - Store.NewBatch (max)`	`106` us	`70` us	`1.51`

This comment was automatically generated by workflow using github-action-benchmark.

chatton · 2026-02-24T15:58:36Z

.github/workflows/benchmark.yml

+          save-data-file: ${{ github.event_name != 'pull_request' }}
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          alert-threshold: '150%'
+          fail-on-alert: false


for now not failing on anything, just working on getting results pushed and visible.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/evm_test_common.go`:
- Around line 937-981: The avg computation in writeTraceBenchmarkJSON uses
s.total.Microseconds() which truncates sub-microsecond precision; change it to a
float-based conversion (e.g. use s.total.Seconds()*1e6 or
float64(s.total.Nanoseconds())/1e3) and divide by float64(s.count) so avg keeps
fractional microsecond values; update the avg assignment that references s.total
and s.count accordingly.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4bf8e6 and c1da279.

📒 Files selected for processing (1)

test/e2e/evm_test_common.go

coderabbitai · 2026-02-24T16:28:22Z

test/e2e/evm_test_common.go

+// benchmarkEntry matches the customSmallerIsBetter format for github-action-benchmark.
+type benchmarkEntry struct {
+	Name  string  `json:"name"`
+	Unit  string  `json:"unit"`
+	Value float64 `json:"value"`
+}
+
+// writeTraceBenchmarkJSON aggregates spans and writes a customSmallerIsBetter JSON file.
+// If outputPath is empty, the function is a no-op.
+func writeTraceBenchmarkJSON(t testing.TB, label string, spans []traceSpan, outputPath string) {
+	t.Helper()
+	if outputPath == "" {
+		return
+	}
+	m := aggregateSpanStats(spans)
+	if len(m) == 0 {
+		t.Logf("WARNING: no span stats to write for %s", label)
+		return
+	}
+
+	// sort by name for stable output
+	names := make([]string, 0, len(m))
+	for name := range m {
+		names = append(names, name)
+	}
+	sort.Strings(names)
+
+	var entries []benchmarkEntry
+	for _, name := range names {
+		s := m[name]
+		avg := float64(s.total.Microseconds()) / float64(s.count)
+		entries = append(entries,
+			benchmarkEntry{Name: fmt.Sprintf("%s - %s (avg)", label, name), Unit: "us", Value: avg},
+		)
+	}
+
+	data, err := json.MarshalIndent(entries, "", "  ")
+	if err != nil {
+		t.Fatalf("failed to marshal benchmark JSON: %v", err)
+	}
+	if err := os.WriteFile(outputPath, data, 0644); err != nil {
+		t.Fatalf("failed to write benchmark JSON to %s: %v", outputPath, err)
+	}
+	t.Logf("wrote %d benchmark entries to %s", len(entries), outputPath)
+}


⚠️ Potential issue | 🟡 Minor

Avoid microsecond truncation to preserve small-span accuracy.

Duration.Microseconds() truncates sub-micro values, which can collapse fast spans to 0. Use a float conversion to keep fractional microseconds.

🔧 Suggested fix

- avg := float64(s.total.Microseconds()) / float64(s.count) + avg := (float64(s.total) / float64(time.Microsecond)) / float64(s.count)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// benchmarkEntry matches the customSmallerIsBetter format for github-action-benchmark.

type benchmarkEntry struct {

Name string `json:"name"`

Unit string `json:"unit"`

Value float64 `json:"value"`

}

// writeTraceBenchmarkJSON aggregates spans and writes a customSmallerIsBetter JSON file.

// If outputPath is empty, the function is a no-op.

func writeTraceBenchmarkJSON(t testing.TB, label string, spans []traceSpan, outputPath string) {

t.Helper()

if outputPath == "" {

return

}

m := aggregateSpanStats(spans)

if len(m) == 0 {

t.Logf("WARNING: no span stats to write for %s", label)

return

}

// sort by name for stable output

names := make([]string, 0, len(m))

for name := range m {

names = append(names, name)

}

sort.Strings(names)

var entries []benchmarkEntry

for _, name := range names {

s := m[name]

avg := float64(s.total.Microseconds()) / float64(s.count)

entries = append(entries,

benchmarkEntry{Name: fmt.Sprintf("%s - %s (avg)", label, name), Unit: "us", Value: avg},

)

}

data, err := json.MarshalIndent(entries, "", " ")

if err != nil {

t.Fatalf("failed to marshal benchmark JSON: %v", err)

}

if err := os.WriteFile(outputPath, data, 0644); err != nil {

t.Fatalf("failed to write benchmark JSON to %s: %v", outputPath, err)

}

t.Logf("wrote %d benchmark entries to %s", len(entries), outputPath)

}

// benchmarkEntry matches the customSmallerIsBetter format for github-action-benchmark.

type benchmarkEntry struct {

Name string `json:"name"`

Unit string `json:"unit"`

Value float64 `json:"value"`

}

// writeTraceBenchmarkJSON aggregates spans and writes a customSmallerIsBetter JSON file.

// If outputPath is empty, the function is a no-op.

func writeTraceBenchmarkJSON(t testing.TB, label string, spans []traceSpan, outputPath string) {

t.Helper()

if outputPath == "" {

return

}

m := aggregateSpanStats(spans)

if len(m) == 0 {

t.Logf("WARNING: no span stats to write for %s", label)

return

}

// sort by name for stable output

names := make([]string, 0, len(m))

for name := range m {

names = append(names, name)

}

sort.Strings(names)

var entries []benchmarkEntry

for _, name := range names {

s := m[name]

avg := (float64(s.total) / float64(time.Microsecond)) / float64(s.count)

entries = append(entries,

benchmarkEntry{Name: fmt.Sprintf("%s - %s (avg)", label, name), Unit: "us", Value: avg},

)

}

data, err := json.MarshalIndent(entries, "", " ")

if err != nil {

t.Fatalf("failed to marshal benchmark JSON: %v", err)

}

if err := os.WriteFile(outputPath, data, 0644); err != nil {

t.Fatalf("failed to write benchmark JSON to %s: %v", outputPath, err)

}

t.Logf("wrote %d benchmark entries to %s", len(entries), outputPath)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/evm_test_common.go` around lines 937 - 981, The avg computation in writeTraceBenchmarkJSON uses s.total.Microseconds() which truncates sub-microsecond precision; change it to a float-based conversion (e.g. use s.total.Seconds()*1e6 or float64(s.total.Nanoseconds())/1e3) and divide by float64(s.count) so avg keeps fractional microsecond values; update the avg assignment that references s.total and s.count accordingly.

feat: adding spammoor test to benchmark

3eb06d0

github-actions bot assigned chatton Feb 24, 2026

chatton added 3 commits February 24, 2026 13:20

fix: use microseconds in benchmark JSON to preserve sub-millisecond p…

f4d87ea

…recision

ci: run benchmarks on PRs without updating baseline

f4a09a2

ci: fan-out/fan-in benchmark jobs to avoid gh-pages race condition

bcc455d

chatton commented Feb 24, 2026

View reviewed changes

chatton added 2 commits February 24, 2026 14:35

ci: reset local gh-pages between benchmark steps to avoid fetch conflict

a5f70aa

ci: fix spamoor benchmark artifact download path

c23e375

chatton changed the title ~~[WIP] feat: adding spammoor test to benchmark~~ feat: adding spammoor test to benchmark Feb 24, 2026

github-actions bot reviewed Feb 24, 2026

View reviewed changes

ci: isolate benchmark publish steps so failures don't cascade

f4bf8e6

chatton marked this pull request as ready for review February 24, 2026 15:56

chatton commented Feb 24, 2026

View reviewed changes

ci: only emit avg in benchmark JSON to reduce alert noise

c1da279

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

Comments

Conversation

chatton commented Feb 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

.github/workflows/benchmark.yml

test/e2e/evm_test_common.go

test/e2e/evm_spamoor_smoke_test.go

Summary

Uh oh!

codecov bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chatton Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

chatton Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chatton commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

github-actions bot commented Feb 24, 2026 •

edited

Loading

claude bot commented Feb 24, 2026 •

edited

Loading

`.github/workflows/benchmark.yml`

`test/e2e/evm_test_common.go`

`test/e2e/evm_spamoor_smoke_test.go`

codecov bot commented Feb 24, 2026 •

edited

Loading

github-actions bot left a comment •

edited

Loading