perf: use direct pointer reads in SparkUnsafeObject accessors by andygrove · Pull Request #3658 · apache/datafusion-comet

andygrove · 2026-03-10T23:59:51Z

Summary

Replace from_raw_parts + try_into().unwrap() + from_le_bytes() with direct pointer dereferences (*(addr as *const T)) in all SparkUnsafeObject trait accessors (get_byte, get_short, get_int, get_long, get_float, get_double, get_date, get_timestamp)
Apply the same optimization to SparkUnsafeArray::new element count read

Rationale

The previous from_raw_parts + try_into().unwrap() + from_le_bytes pattern added three layers of abstraction per read that compiled to unnecessary overhead. These accessors are on the hottest path in row-to-columnar conversion used by JVM shuffle.

Replace slice construction + try_into().unwrap() + from_le_bytes() with direct pointer dereferences in all SparkUnsafeObject trait accessors. Both SparkUnsafeRow and SparkUnsafeArray guarantee natural alignment for field access: UnsafeRow fields are at 8-byte aligned offsets (bitset width is multiple of 8, each slot is 8 bytes, JVM allocates aligned memory), and UnsafeArray elements are at naturally aligned offsets (header is 8-byte aligned, elements are at element_size stride). This eliminates three layers of abstraction per read (from_raw_parts, try_into().unwrap(), from_le_bytes) on the hottest path in row-to- columnar conversion.

SparkUnsafeArray elements may not be naturally aligned (e.g., i64 elements at 4-byte-aligned offsets). Use read_unaligned() instead of direct pointer dereferences for all multi-byte accessors. This is still faster than the original from_raw_parts + try_into + from_le_bytes chain while being safe for both SparkUnsafeRow (always aligned) and SparkUnsafeArray (potentially unaligned).

mbutrovich · 2026-03-11T01:09:09Z

Current code compiles down to a single load anyway ldr x9, [x0] so I suspect we won't see a performance difference but I think what you're proposing is more readable.

mbutrovich self-requested a review March 11, 2026 00:10

andygrove marked this pull request as draft March 11, 2026 00:53

andygrove changed the title ~~perf: use direct aligned pointer reads in SparkUnsafeObject accessors~~ perf: use direct pointer reads in SparkUnsafeObject accessors Mar 11, 2026

Merge branch 'main' into perf/spark-unsafe-aligned-reads

4274dcc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use direct pointer reads in SparkUnsafeObject accessors#3658

perf: use direct pointer reads in SparkUnsafeObject accessors#3658
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:perf/spark-unsafe-aligned-reads

andygrove commented Mar 10, 2026 •

edited

Loading

Uh oh!

mbutrovich commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andygrove commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Rationale

Uh oh!

mbutrovich commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andygrove commented Mar 10, 2026 •

edited

Loading