feat(vortex-bench): wire SpatialBench into the bench orchestrator#8607
feat(vortex-bench): wire SpatialBench into the bench orchestrator#8607HarukiMoriarty wants to merge 3 commits into
Conversation
a0218b2 to
fdb0872
Compare
Merging this PR will improve performance by 11.93%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
273.6 ns | 244.4 ns | +11.93% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing nemo/spatial-wire-vx-bench (8fa6ee0) with develop (0a45777)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
fdb0872 to
b362551
Compare
DuckDB's GEOMETRY only accepts little-endian (NDR) WKB, but the externally sourced zone table (Overture Maps via spatialbench-cli) is big-endian, so the vortex lane failed spatial queries with "Only little-endian WKB is supported". Re-encode geometry columns to little-endian during the parquet->vortex conversion so the vortex file stores canonical little-endian WKB; columns that are already little-endian pass through without a copy. Also drop the best-effort warn around zone generation so data-generation failures propagate. Signed-off-by: Nemo Yu <zyu379@wisc.edu>
|
|
||
| def targets_from_axes(engine: str, format: str) -> tuple[list[BenchmarkTarget], list[str]]: | ||
| def targets_from_axes( | ||
| engine: str, format: str, benchmark: Benchmark | None = None |
There was a problem hiding this comment.
I don't about this much, but why is this needed?
| /// | ||
| /// For SpatialBench (`skip_binary_dict`), the geometry blobs are large and | ||
| /// unique, so the dictionary builder balloons memory (tens of GB) for zero gain. | ||
| fn write_options_for( |
There was a problem hiding this comment.
this is clunky, not sure I have a better way if doing that right now :/
| // Generate into a scratch dir so the CLI's `zone.parquet` name can't collide with the base | ||
| // tables, then move the produced parts into place as `zone_{part}.parquet`. | ||
| // Start from an empty scratch dir (clear any leftover from an interrupted run). |
There was a problem hiding this comment.
We already have code that handles idempotent datagen.
Summary
Wires SpatialBench into the
vx-bench/bench-orchestratorpipeline so it can be run end-to-end like the other benchmarks (datagen → Parquet → Vortex conversion → query). It builds on the WKB datagen landed in #8598.Running command:
Limitation
DuckDB-only. For now SpatialBench queries use DuckDB-specific ST_* spatial SQL that DataFusion has no functions for yet. There is a a single ad-hoc entry in
BENCHMARK_ENGINES = { SPATIALBENCH: {DUCKDB} }.No dictionary encoding / compaction on the WKB column. WKB geometry blobs are large and effectively unique, so running the dictionary builder over them balloons memory (tens of GB) for zero compression gain. The normal compaction path is preserved for every other column on every other benchmark.
Queries 10, 11, 12 is timeout simply because DuckDB poorly support on Spatial index.
Performance
SF=1.0
SF=3
SF=10