Skip to content

feat(vortex-bench): wire SpatialBench into the bench orchestrator#8607

Open
HarukiMoriarty wants to merge 3 commits into
developfrom
nemo/spatial-wire-vx-bench
Open

feat(vortex-bench): wire SpatialBench into the bench orchestrator#8607
HarukiMoriarty wants to merge 3 commits into
developfrom
nemo/spatial-wire-vx-bench

Conversation

@HarukiMoriarty

@HarukiMoriarty HarukiMoriarty commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Wires SpatialBench into the vx-bench / bench-orchestrator pipeline so it can be run end-to-end like the other benchmarks (datagen → Parquet → Vortex conversion → query). It builds on the WKB datagen landed in #8598.

Running command:

uv run --project bench-orchestrator vx-bench run spatialbench --engine duckdb --format parquet,vortex --opt scale-factor=N --queries 1,2,3,4,5,6,7,8,9 --iterations 3

Limitation

  • DuckDB-only. For now SpatialBench queries use DuckDB-specific ST_* spatial SQL that DataFusion has no functions for yet. There is a a single ad-hoc entry in BENCHMARK_ENGINES = { SPATIALBENCH: {DUCKDB} }.

  • No dictionary encoding / compaction on the WKB column. WKB geometry blobs are large and effectively unique, so running the dictionary builder over them balloons memory (tens of GB) for zero compression gain. The normal compaction path is preserved for every other column on every other benchmark.

  • Queries 10, 11, 12 is timeout simply because DuckDB poorly support on Spatial index.

Performance

SF=1.0

┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Query ┃ duckdb:parquet (base) ┃   duckdb:vortex ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ 1     │                39.1ms │  14.6ms (0.37x) │
│ 2     │                72.1ms │  24.9ms (0.35x) │
│ 3     │                57.7ms │  20.1ms (0.35x) │
│ 4     │               113.1ms │  70.6ms (0.62x) │
│ 5     │               354.2ms │ 288.4ms (0.81x) │
│ 6     │               169.5ms │  91.6ms (0.54x) │
│ 7     │               156.7ms │  71.5ms (0.46x) │
│ 8     │               196.5ms │  80.3ms (0.41x) │
│ 9     │                20.3ms │  18.7ms (0.92x) │
└───────┴───────────────────────┴─────────────────┘

SF=3

┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Query ┃ duckdb:parquet (base) ┃   duckdb:vortex ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ 1     │                50.7ms │  31.5ms (0.62x) │
│ 2     │               126.2ms │  60.3ms (0.48x) │
│ 3     │                71.9ms │  42.9ms (0.60x) │
│ 4     │               539.9ms │  64.9ms (0.12x) │
│ 5     │               948.7ms │ 874.5ms (0.92x) │
│ 6     │               656.2ms │ 121.7ms (0.19x) │
│ 7     │               256.6ms │ 232.1ms (0.90x) │
│ 8     │               273.8ms │ 244.6ms (0.89x) │
│ 9     │                35.7ms │  27.9ms (0.78x) │
└───────┴───────────────────────┴─────────────────┘

SF=10

┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Query ┃ duckdb:parquet (base) ┃   duckdb:vortex ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ 1     │               158.6ms │ 114.4ms (0.72x) │
│ 2     │               255.1ms │ 219.3ms (0.86x) │
│ 3     │               229.2ms │ 181.5ms (0.79x) │
│ 4     │               184.5ms │ 134.3ms (0.73x) │
│ 5     │                 3.30s │   3.08s (0.93x) │
│ 6     │               476.4ms │ 348.9ms (0.73x) │
│ 7     │               918.2ms │ 961.2ms (1.05x) │
│ 8     │               980.6ms │ 926.7ms (0.94x) │
│ 9     │                33.7ms │  33.6ms (1.00x) │
└───────┴───────────────────────┴─────────────────┘

@codspeed-hq

codspeed-hq Bot commented Jun 26, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 11.93%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
✅ 1594 untouched benchmarks
⏩ 4 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation bitwise_not_vortex_buffer_mut[128] 273.6 ns 244.4 ns +11.93%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing nemo/spatial-wire-vx-bench (8fa6ee0) with develop (0a45777)

Open in CodSpeed

Footnotes

  1. 4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@myrrc myrrc requested a review from AdamGS June 29, 2026 13:27
Comment thread vortex-bench/src/spatialbench/datagen/wkb.rs Outdated

@myrrc myrrc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Base automatically changed from nemo/spatial-wkb to develop June 30, 2026 20:57
@HarukiMoriarty HarukiMoriarty requested a review from a team June 30, 2026 20:57
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
@HarukiMoriarty HarukiMoriarty force-pushed the nemo/spatial-wire-vx-bench branch from fdb0872 to b362551 Compare June 30, 2026 21:10
HarukiMoriarty and others added 2 commits July 1, 2026 10:09
DuckDB's GEOMETRY only accepts little-endian (NDR) WKB, but the externally
sourced zone table (Overture Maps via spatialbench-cli) is big-endian, so the
vortex lane failed spatial queries with "Only little-endian WKB is supported".
Re-encode geometry columns to little-endian during the parquet->vortex
conversion so the vortex file stores canonical little-endian WKB; columns that
are already little-endian pass through without a copy.

Also drop the best-effort warn around zone generation so data-generation
failures propagate.

Signed-off-by: Nemo Yu <zyu379@wisc.edu>
@HarukiMoriarty HarukiMoriarty enabled auto-merge (squash) July 1, 2026 14:12

def targets_from_axes(engine: str, format: str) -> tuple[list[BenchmarkTarget], list[str]]:
def targets_from_axes(
engine: str, format: str, benchmark: Benchmark | None = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't about this much, but why is this needed?

///
/// For SpatialBench (`skip_binary_dict`), the geometry blobs are large and
/// unique, so the dictionary builder balloons memory (tens of GB) for zero gain.
fn write_options_for(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is clunky, not sure I have a better way if doing that right now :/

Comment on lines +129 to +131
// Generate into a scratch dir so the CLI's `zone.parquet` name can't collide with the base
// tables, then move the produced parts into place as `zone_{part}.parquet`.
// Start from an empty scratch dir (clear any leftover from an interrupted run).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have code that handles idempotent datagen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants