Conversation
📝 WalkthroughWalkthroughRestructures benchmark CI into parallel jobs (evm-benchmark, spamoor-benchmark, block-executor) with artifact uploads and a consolidated publish-benchmarks job that aggregates and conditionally publishes results; adds per-operation span aggregation and JSON benchmark export from E2E tests (BENCH_JSON_OUTPUT). Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @chatton's task in 2m 1s —— View job Code Review
Overall this is a clean, well-scoped PR. The fan-out/fan-in pattern for benchmarks is the right approach for avoiding gh-pages race conditions. A few issues to consider:
|
| Area | Severity | Finding |
|---|---|---|
benchmark.yml |
Medium | publish-benchmarks will silently fail when spamoor-benchmark artifact is missing; add if: always() to job + conditional step |
evm_test_common.go |
Low | t.Fatalf for benchmark write errors is test-aborting; consider t.Errorf |
evm_spamoor_smoke_test.go |
Nit | Redundant env-var guard before calling writeTraceBenchmarkJSON |
evm_spamoor_smoke_test.go |
Nit | No comment explaining why only ev-node spans are exported |
The core logic and architecture are sound. The main actionable item is the always() / conditional publishing concern in the workflow.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3105 +/- ##
==========================================
- Coverage 60.91% 60.90% -0.02%
==========================================
Files 113 113
Lines 11617 11617
==========================================
- Hits 7077 7075 -2
- Misses 3742 3743 +1
- Partials 798 799 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| go test -bench=BenchmarkProduceBlock -benchmem -run='^$' \ | ||
| ./block/internal/executing/... > block_executor_output.txt | ||
| - name: Store Block Executor benchmark result | ||
| - name: Store Block Executor result |
There was a problem hiding this comment.
because there are now multiple jobs pushing benchmarks, in order to not hit race conditions, we can gather up all the results, then sequentially push them all one by one.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Spamoor Trace Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.
| Benchmark suite | Current: f4bf8e6 | Previous: f4a09a2 | Ratio |
|---|---|---|---|
SpamoorSmoke - ForcedInclusionRetriever.RetrieveForcedIncludedTxs (max) |
44 us |
20 us |
2.20 |
SpamoorSmoke - Sequencer.SubmitBatchTxs (max) |
208 us |
132 us |
1.58 |
SpamoorSmoke - Store.NewBatch (max) |
106 us |
70 us |
1.51 |
This comment was automatically generated by workflow using github-action-benchmark.
| save-data-file: ${{ github.event_name != 'pull_request' }} | ||
| github-token: ${{ secrets.GITHUB_TOKEN }} | ||
| alert-threshold: '150%' | ||
| fail-on-alert: false |
There was a problem hiding this comment.
for now not failing on anything, just working on getting results pushed and visible.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/evm_test_common.go`:
- Around line 937-981: The avg computation in writeTraceBenchmarkJSON uses
s.total.Microseconds() which truncates sub-microsecond precision; change it to a
float-based conversion (e.g. use s.total.Seconds()*1e6 or
float64(s.total.Nanoseconds())/1e3) and divide by float64(s.count) so avg keeps
fractional microsecond values; update the avg assignment that references s.total
and s.count accordingly.
| // benchmarkEntry matches the customSmallerIsBetter format for github-action-benchmark. | ||
| type benchmarkEntry struct { | ||
| Name string `json:"name"` | ||
| Unit string `json:"unit"` | ||
| Value float64 `json:"value"` | ||
| } | ||
|
|
||
| // writeTraceBenchmarkJSON aggregates spans and writes a customSmallerIsBetter JSON file. | ||
| // If outputPath is empty, the function is a no-op. | ||
| func writeTraceBenchmarkJSON(t testing.TB, label string, spans []traceSpan, outputPath string) { | ||
| t.Helper() | ||
| if outputPath == "" { | ||
| return | ||
| } | ||
| m := aggregateSpanStats(spans) | ||
| if len(m) == 0 { | ||
| t.Logf("WARNING: no span stats to write for %s", label) | ||
| return | ||
| } | ||
|
|
||
| // sort by name for stable output | ||
| names := make([]string, 0, len(m)) | ||
| for name := range m { | ||
| names = append(names, name) | ||
| } | ||
| sort.Strings(names) | ||
|
|
||
| var entries []benchmarkEntry | ||
| for _, name := range names { | ||
| s := m[name] | ||
| avg := float64(s.total.Microseconds()) / float64(s.count) | ||
| entries = append(entries, | ||
| benchmarkEntry{Name: fmt.Sprintf("%s - %s (avg)", label, name), Unit: "us", Value: avg}, | ||
| ) | ||
| } | ||
|
|
||
| data, err := json.MarshalIndent(entries, "", " ") | ||
| if err != nil { | ||
| t.Fatalf("failed to marshal benchmark JSON: %v", err) | ||
| } | ||
| if err := os.WriteFile(outputPath, data, 0644); err != nil { | ||
| t.Fatalf("failed to write benchmark JSON to %s: %v", outputPath, err) | ||
| } | ||
| t.Logf("wrote %d benchmark entries to %s", len(entries), outputPath) | ||
| } |
There was a problem hiding this comment.
Avoid microsecond truncation to preserve small-span accuracy.
Duration.Microseconds() truncates sub-micro values, which can collapse fast spans to 0. Use a float conversion to keep fractional microseconds.
🔧 Suggested fix
- avg := float64(s.total.Microseconds()) / float64(s.count)
+ avg := (float64(s.total) / float64(time.Microsecond)) / float64(s.count)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // benchmarkEntry matches the customSmallerIsBetter format for github-action-benchmark. | |
| type benchmarkEntry struct { | |
| Name string `json:"name"` | |
| Unit string `json:"unit"` | |
| Value float64 `json:"value"` | |
| } | |
| // writeTraceBenchmarkJSON aggregates spans and writes a customSmallerIsBetter JSON file. | |
| // If outputPath is empty, the function is a no-op. | |
| func writeTraceBenchmarkJSON(t testing.TB, label string, spans []traceSpan, outputPath string) { | |
| t.Helper() | |
| if outputPath == "" { | |
| return | |
| } | |
| m := aggregateSpanStats(spans) | |
| if len(m) == 0 { | |
| t.Logf("WARNING: no span stats to write for %s", label) | |
| return | |
| } | |
| // sort by name for stable output | |
| names := make([]string, 0, len(m)) | |
| for name := range m { | |
| names = append(names, name) | |
| } | |
| sort.Strings(names) | |
| var entries []benchmarkEntry | |
| for _, name := range names { | |
| s := m[name] | |
| avg := float64(s.total.Microseconds()) / float64(s.count) | |
| entries = append(entries, | |
| benchmarkEntry{Name: fmt.Sprintf("%s - %s (avg)", label, name), Unit: "us", Value: avg}, | |
| ) | |
| } | |
| data, err := json.MarshalIndent(entries, "", " ") | |
| if err != nil { | |
| t.Fatalf("failed to marshal benchmark JSON: %v", err) | |
| } | |
| if err := os.WriteFile(outputPath, data, 0644); err != nil { | |
| t.Fatalf("failed to write benchmark JSON to %s: %v", outputPath, err) | |
| } | |
| t.Logf("wrote %d benchmark entries to %s", len(entries), outputPath) | |
| } | |
| // benchmarkEntry matches the customSmallerIsBetter format for github-action-benchmark. | |
| type benchmarkEntry struct { | |
| Name string `json:"name"` | |
| Unit string `json:"unit"` | |
| Value float64 `json:"value"` | |
| } | |
| // writeTraceBenchmarkJSON aggregates spans and writes a customSmallerIsBetter JSON file. | |
| // If outputPath is empty, the function is a no-op. | |
| func writeTraceBenchmarkJSON(t testing.TB, label string, spans []traceSpan, outputPath string) { | |
| t.Helper() | |
| if outputPath == "" { | |
| return | |
| } | |
| m := aggregateSpanStats(spans) | |
| if len(m) == 0 { | |
| t.Logf("WARNING: no span stats to write for %s", label) | |
| return | |
| } | |
| // sort by name for stable output | |
| names := make([]string, 0, len(m)) | |
| for name := range m { | |
| names = append(names, name) | |
| } | |
| sort.Strings(names) | |
| var entries []benchmarkEntry | |
| for _, name := range names { | |
| s := m[name] | |
| avg := (float64(s.total) / float64(time.Microsecond)) / float64(s.count) | |
| entries = append(entries, | |
| benchmarkEntry{Name: fmt.Sprintf("%s - %s (avg)", label, name), Unit: "us", Value: avg}, | |
| ) | |
| } | |
| data, err := json.MarshalIndent(entries, "", " ") | |
| if err != nil { | |
| t.Fatalf("failed to marshal benchmark JSON: %v", err) | |
| } | |
| if err := os.WriteFile(outputPath, data, 0644); err != nil { | |
| t.Fatalf("failed to write benchmark JSON to %s: %v", outputPath, err) | |
| } | |
| t.Logf("wrote %d benchmark entries to %s", len(entries), outputPath) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/evm_test_common.go` around lines 937 - 981, The avg computation in
writeTraceBenchmarkJSON uses s.total.Microseconds() which truncates
sub-microsecond precision; change it to a float-based conversion (e.g. use
s.total.Seconds()*1e6 or float64(s.total.Nanoseconds())/1e3) and divide by
float64(s.count) so avg keeps fractional microsecond values; update the avg
assignment that references s.total and s.count accordingly.
Overview
Uploads the spamoor test results to https://ev.xyz/dev/bench/ on a per span level.
Summary by CodeRabbit
New Features
Chores
Tests