Skip to content

PatchedArray: basics and wiring#7002

Open
a10y wants to merge 17 commits intodevelopfrom
aduffy/patched-array
Open

PatchedArray: basics and wiring#7002
a10y wants to merge 17 commits intodevelopfrom
aduffy/patched-array

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Mar 17, 2026

Summary

This is the first PR in a series addressing the PatchedArray RFC: vortex-data/rfcs#27

This PR adds a new PatchedArray array variant, which is slated to only be used with FastLanes array types. The design is largely documented in the RFC, but briefly

  • Wraps another child array
  • Holds a buffer of offsets and patch values. They are both sorted by chunk/lane so there is efficient random as well as bulk access for CUDA

We are able to pushdown the following at reduce time:

  • Filters pushes through to the child, and we trim the patch indices to the range of chunks covered by the filter
  • Slicing that reduces the chunk range
  • Compare always pushes through to the child
  • Take pushes through to the inner then constructs a hashmap of the patches

There will be follow ups to add the wiring into CUDA and to update how BitPacked and ALP arrays are written.

Testing

There are unit tests for all of the reducers and kernels

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 17, 2026

Merging this PR will degrade performance by 15.51%

❌ 2 regressed benchmarks
✅ 1104 untouched benchmarks
⏩ 1522 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation bitwise_not_vortex_buffer_mut[1024] 477.2 ns 535.6 ns -10.89%
Simulation bitwise_not_vortex_buffer_mut[128] 317.8 ns 376.1 ns -15.51%

Comparing aduffy/patched-array (6b5c2f8) with develop (b524763)

Open in CodSpeed

Footnotes

  1. 1522 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@a10y a10y added the changelog/feature A new feature label Mar 18, 2026
@a10y a10y marked this pull request as ready for review March 18, 2026 21:21
@a10y a10y requested a review from joseph-isaacs March 18, 2026 21:21
Comment on lines +51 to +53
pub(super) indices: BufferHandle,
/// patch values corresponding to the indices. The ptype is specified by `values_ptype`.
pub(super) values: BufferHandle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think we should reconsider this.

This array (+ constant) should be used to replace the sparse array so I think we really want to store things other than PTypes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this array is well-suited for non-primitive types. How many lanes does a List<Struct{u32,utf8}> have? I was tempted to actually put this into the vortex-fastlanes crate tbh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've updated to use child array for values, which should give us some flexibility to change things in the future

@a10y a10y force-pushed the aduffy/patched-array branch from c3d7434 to 6c1d7aa Compare March 19, 2026 13:24
@a10y a10y force-pushed the aduffy/patched-array branch from 860cc64 to 042394b Compare March 24, 2026 19:17
@a10y a10y force-pushed the aduffy/patched-array branch 4 times, most recently from df8dbf5 to 862d814 Compare March 26, 2026 19:54
@a10y
Copy link
Contributor Author

a10y commented Mar 27, 2026

@claude review please

@claude

This comment was marked as resolved.

@a10y
Copy link
Contributor Author

a10y commented Mar 27, 2026

@claude i have addressed your main comments. please give another review.

@claude

This comment was marked as resolved.

a10y added 11 commits March 27, 2026 10:32
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
We call append_to_builder on the inner first, then just do a single pass
and overwrite what it just wrote.

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
a10y added 2 commits March 27, 2026 10:32
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/patched-array branch from daf258f to 578d83d Compare March 27, 2026 14:35
a10y added 2 commits March 27, 2026 11:12
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y
Copy link
Contributor Author

a10y commented Mar 27, 2026

@claude please review

@claude
Copy link
Contributor

claude bot commented Mar 27, 2026

Claude finished @a10y's task in 5m 26s —— View job


PR Review: PatchedArray basics and wiring

Tasks

  • Gather context and understand the request
  • Read core array implementation files
  • Read compute kernel implementations
  • Read vtable implementation
  • Review tests and compat fixtures
  • Post comprehensive review feedback

Overall this is a well-structured implementation of a new array type with good test coverage. The core transpose algorithm and the architecture are sound. I found a few bugs that affect sliced arrays (non-zero offset), and some style notes.

Bugs

1. take.rs: Incorrect index arithmetic for sliced arrays (offset > 0) Fix this →

In take_map, the hashmap is built with keys as inner-relative indices (chunk * 1024 + patch_idx), but the take indices are user-relative (0-indexed from the start of the PatchedArray). When offset > 0, these two coordinate systems diverge and all patch lookups miss:

// Current (buggy):
let index = chunk * 1024 + patch_idx as usize;
if index >= offset && index < offset + len {
    index_map.insert(index, patch_value);  // key in [offset, offset+len)
}
// ...
let index = index.as_();   // user-visible index in [0, len)
index_map.get(&index);     // always misses when offset > 0

Fix: store index - offset as the hashmap key.

Additionally, array.inner.take(indices.clone()) at line 34 uses the user-relative indices directly against inner, which is chunk-aligned (starts at chunk_start * 1024). When offset > 0, user index i corresponds to inner[i + offset], not inner[i]. The indices fed to inner's take should be adjusted by adding offset.

Neither case is caught by the existing tests since make_patched_array only creates arrays with offset = 0.


2. compare.rs: Missing upper-bound check in ApplyPatches::apply Fix this →

The lower-bound guard if bit_index < self.offset { continue; } is present, but there is no corresponding upper-bound guard. For a sliced array (e.g., offset = 5, len = 100 within a 1024-element chunk), a patch at chunk-position 200 is within the lane_offsets range but has bit_index = 200 - 5 = 195 >= 100 = len. Calling self.bits.set(195) on a BitBufferMut of length 100 would be out of bounds.

Compare apply_patches_primitive in vtable/mod.rs which correctly checks both bounds:

if index < offset || index >= offset + len {
    continue;
}

The fix is to add a corresponding upper-bound check in ApplyPatches::apply.


3. vtable/mod.rs append_to_builder: Wrong output slice when offset > 0 Fix this →

array.inner.append_to_builder(builder, ctx)? appends inner.len() elements (chunk-aligned), which can exceed array.len() when offset > 0. However:

let len = array.len();
array.inner.append_to_builder(builder, ctx)?;
// ...
let trailer = output.len() - len;  // wrong: should be output.len() - inner.len()
apply_patches_primitive::<V>(
    &mut output[trailer..],  // only `len` elements, indexed with chunk-global coords
    offset,
    len,
    ...
);

output[trailer..] has len elements, but apply_patches_primitive writes at positions chunk * 1024 + chunk_idx which range from offset to offset + len - 1. When offset > 0 these positions exceed len, causing an out-of-bounds write.

The execute path handles this correctly by operating on the full inner buffer and then slicing after; append_to_builder should do the same (operate on the inner.len()-sized window, then discard the offset prefix).


Nits / Style

  • filter.rs line 28-30: indexing slices[0] and slices[slices.len()-1] is fine but could use .first() and .last() for clarity, as previously suggested.

  • seek_to_lane (array.rs line 121-122): Uses assert! (panicking) rather than returning VortexResult. Given this is pub(crate) and called in hot paths, this is likely intentional, but worth a comment clarifying the precondition.

  • slice_chunks doesn't shrink indices/values: After slicing to a narrow chunk range, the full indices and values buffers are retained (only lane_offsets is sliced). The bounds checks in the apply functions ensure correctness, but this can retain significant memory. A comment explaining this tradeoff would help future readers.

  • compare.rs test_pos_neg_zero: The expected result (BoolArray::from_indices(10, [7])) documents that -0.0 == 0.0 is false under NativeValue ordering. This is intentional but worth a brief comment in the test noting this is different from IEEE 754 ==.

  • transpose #[allow(clippy::cognitive_complexity)]: The function could be split into a counting phase and a scatter phase to improve readability without changing the algorithm.

a10y added 2 commits March 27, 2026 11:29
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants