Skip to content

perf: collect nested struct addresses once in field-major append#3661

Draft
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:perf/struct-collect-addrs-once
Draft

perf: collect nested struct addresses once in field-major append#3661
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:perf/struct-collect-addrs-once

Conversation

@andygrove
Copy link
Member

Summary

  • In append_struct_fields_field_major, the first pass now collects nested struct addresses and sizes alongside the null bitmap
  • The per-field second pass uses these pre-collected addresses via point_to() instead of re-reading from parent row pointer arrays (read_row_at!) and calling get_struct() for every field of every row
  • Same optimization applied to the Binary, Utf8, Decimal128, nested Struct, and List/Map field cases

Rationale

Previously, for a struct with F fields and N rows, the code performed NF pointer dereferences into the parent row address/size arrays plus NF get_struct() calls (each involving get_offset_and_len which reads an i64 and does bit manipulation). After this change, parent row reads and get_struct calls happen only N times total in the first pass, and the second pass uses cheap point_to() calls with the cached addresses.

Test plan

  • cargo clippy --all-targets --workspace -- -D warnings passes
  • Existing struct row-to-columnar tests cover these code paths

In append_struct_fields_field_major, the first pass now collects nested
struct addresses and sizes alongside the null bitmap. The per-field
second pass uses these pre-collected addresses instead of re-reading
from the parent row pointer arrays and calling get_struct for every
field of every row.

For a struct with F fields and N rows, this reduces parent row pointer
dereferences from N*F to N, and get_struct calls from N*F to N.
@andygrove andygrove marked this pull request as draft March 11, 2026 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant