Fix list-of-struct nested projection#2
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant improvement by fixing nested projection for list<struct> types. This is achieved by adding a new shredded list layout (vortex.list) and extending get_item and the list reader to support this new structure. The implementation appears robust and is accompanied by thorough tests, including a new integration test that validates the end-to-end workflow. While the overall approach is sound, I've identified several areas with significant code duplication, particularly in get_item.rs, which could be refactored to enhance maintainability. My review includes suggestions for these refactorings.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively addresses nested projection for list<struct> types by introducing a new shredded list layout (vortex.list) and extending get_item to support it. The changes are comprehensive, including updates to the layout system, expression evaluation, and the addition of thorough tests. The implementation of the new list reader and writer strategies appears solid.
My review includes a couple of suggestions for refactoring in get_item.rs to reduce code duplication in the evaluate and execute methods, which would improve long-term maintainability. Overall, this is a great contribution that significantly enhances Vortex's capabilities for handling nested data.
d5a8576 to
87d9553
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
The pull request significantly enhances the handling of nested projections, particularly for List<Struct> types, by introducing a shredded layout (vortex.list). The new Map expression is a well-designed addition, seamlessly integrating into the expression simplification and evaluation framework. The vortex-layout changes correctly implement the ListLayout and its associated reader and writer strategies, which is crucial for efficient storage and retrieval of shredded list data. Furthermore, the updates in vortex-datafusion and vortex-duckdb ensure consistent application of expression optimization, leveraging the new Map expression's simplification capabilities. The new test cases provide thorough validation of the core functionality.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively addresses the nested projection for list<struct> types by introducing a new vortex.list shredded layout and a map expression. The changes correctly extend get_item and list reader pushdown logic to support this functionality. Comprehensive tests have been added to validate the new behavior and optimizations. The integration of the new ListStrategy into the TableStrategy and the strategic use of optimize_recursive across different modules ensure that the new simplification rules are applied, leading to a robust and efficient solution.
6439697 to
f63b38e
Compare
6a2a8f7 to
2973c55
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements for handling nested projections on list-of-struct and fixed-size-list-of-struct types. It achieves this by introducing a new GetItemList expression and a vortex.list layout encoding, which allows for shredding list arrays into their components (validity, offsets, elements) for more efficient processing and projection pushdown. The changes are well-tested with new unit and integration tests, and the dtype calculation logic in scan builders has been updated to account for the new simplification rules. The explicit marking of GetItemList as unstable and non-serializable is a clear indication of its current internal optimization role.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements for handling nested projections on lists of structs by implementing a shredded layout (vortex.list) and a new GetItemList expression. The changes correctly propagate nullability and enable pushdown optimizations for these complex types. New tests validate the functionality for both variable-size and fixed-size lists of structs.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements for handling nested projections involving lists of structs and fixed-size lists of structs. The core changes include a new GetItemList expression to specifically manage projections on list elements, and updates to the GetItem expression's simplify method to transform nested get_item calls into get_item_list when appropriate. The new ListLayout and ListStrategy provide robust mechanisms for reading and writing these complex data types, including improved split planning for FixedSizeList elements. Furthermore, the ScanBuilder and GpuScanBuilder have been updated to ensure that DType calculations correctly account for these simplified projections, preventing errors with nested field access. The added test cases thoroughly validate the new functionality and ensure proper behavior for various list-of-struct scenarios.
c9770e8 to
0b4b579
Compare
a7a6b7d to
da77c0c
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant feature: nested projection for lists of structs. This is achieved through a new get_item_list expression and a corresponding ListLayout. The changes are extensive and well-implemented, including improved split planning for lists and fixed-size lists.
A major refactoring from fields() to unmasked_fields() across the codebase enhances clarity regarding how struct validity is handled. Additionally, there are notable improvements in error handling, with panics being replaced by VortexResult in several places.
The PR also brings substantial additions to CUDA support, with new kernels for dictionary decoding and new benchmarks. The introduction of new fuzzing targets for compressed encodings is a great step towards improving the robustness of the library.
Overall, this is a high-quality contribution with well-thought-out changes and good test coverage. I have a couple of minor suggestions for code quality and documentation correctness.
| let ends_array = match ends_ptype { | ||
| PType::U8 => { | ||
| let ends_typed: Vec<u8> = ends | ||
| .iter() | ||
| .map(|&e| u8::try_from(e).vortex_expect("end value fits in u8")) | ||
| .collect(); | ||
| PrimitiveArray::new(Buffer::copy_from(ends_typed), Validity::NonNullable).into_array() | ||
| } | ||
| PType::U16 => { | ||
| let ends_typed: Vec<u16> = ends | ||
| .iter() | ||
| .map(|&e| u16::try_from(e).vortex_expect("end value fits in u16")) | ||
| .collect(); | ||
| PrimitiveArray::new(Buffer::copy_from(ends_typed), Validity::NonNullable).into_array() | ||
| } | ||
| PType::U32 => { | ||
| let ends_typed: Vec<u32> = ends | ||
| .iter() | ||
| .map(|&e| u32::try_from(e).vortex_expect("end value fits in u32")) | ||
| .collect(); | ||
| PrimitiveArray::new(Buffer::copy_from(ends_typed), Validity::NonNullable).into_array() | ||
| } | ||
| PType::U64 => { | ||
| PrimitiveArray::new(Buffer::copy_from(ends), Validity::NonNullable).into_array() | ||
| } | ||
| _ => unreachable!("Only unsigned integer types are valid for ends"), | ||
| }; |
There was a problem hiding this comment.
The logic for creating primitive arrays for different unsigned integer types is a bit repetitive. You can make this more concise by using PrimitiveArray::from_iter and avoiding the intermediate Vec collection. This will improve readability and maintainability.
let ends_array = match ends_ptype {
PType::U8 => PrimitiveArray::from_iter(
ends.iter()
.map(|&e| u8::try_from(e).vortex_expect("end value fits in u8")),
)
.into_array(),
PType::U16 => PrimitiveArray::from_iter(
ends.iter()
.map(|&e| u16::try_from(e).vortex_expect("end value fits in u16")),
)
.into_array(),
PType::U32 => PrimitiveArray::from_iter(
ends.iter()
.map(|&e| u32::try_from(e).vortex_expect("end value fits in u32")),
)
.into_array(),
PType::U64 => PrimitiveArray::from(ends).into_array(),
_ => unreachable!("Only unsigned integer types are valid for ends"),
};| } | ||
|
|
||
| impl<T: DeviceRepr + Send + Sync + 'static> DeviceBuffer for CudaDeviceBuffer<T> { | ||
| /// Returns the number of elements in the buffer of type T. |
There was a problem hiding this comment.
The documentation for len() is now incorrect after the implementation change. It states that it returns the number of elements, but the implementation now correctly returns the number of bytes to conform to the DeviceBuffer trait. The doc comment should be updated to reflect this.
| /// Returns the number of elements in the buffer of type T. | |
| /// Returns the number of bytes in the buffer. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant feature enhancement by enabling nested projection for list<struct> and fixed_size_list<struct> types. This is achieved through a new internal GetItemList expression and a simplification rule on GetItem. The changes are comprehensive, affecting expression analysis, execution, and the scan builder to accommodate the new simplification path. A new vortex.list layout with reader/writer implementations and improved split planning is also a key part of this update. The added regression tests are thorough and validate the new functionality well. The code is of high quality, and I have one suggestion for a minor refactoring to reduce code duplication in the GetItemList execution logic for better maintainability.
| match input.dtype() { | ||
| DType::List(..) => { | ||
| let list = input.execute::<ListViewArray>(args.ctx)?; | ||
| let struct_elems = list.elements().clone().execute::<StructArray>(args.ctx)?; | ||
|
|
||
| let field = struct_elems.unmasked_field_by_name(field_name)?.clone(); | ||
| let field = match struct_elems.dtype().nullability() { | ||
| Nullability::NonNullable => field, | ||
| Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?, | ||
| }; | ||
|
|
||
| ListViewArray::try_new( | ||
| field, | ||
| list.offsets().clone(), | ||
| list.sizes().clone(), | ||
| list.validity().clone(), | ||
| )? | ||
| .into_array() | ||
| .execute(args.ctx) | ||
| } | ||
| DType::FixedSizeList(..) => { | ||
| let list = input.execute::<FixedSizeListArray>(args.ctx)?; | ||
| let struct_elems = list.elements().clone().execute::<StructArray>(args.ctx)?; | ||
|
|
||
| let field = struct_elems.unmasked_field_by_name(field_name)?.clone(); | ||
| let field = match struct_elems.dtype().nullability() { | ||
| Nullability::NonNullable => field, | ||
| Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?, | ||
| }; | ||
|
|
||
| FixedSizeListArray::try_new( | ||
| field, | ||
| list.list_size(), | ||
| list.validity().clone(), | ||
| list.len(), | ||
| )? | ||
| .into_array() | ||
| .execute(args.ctx) | ||
| } | ||
| _ => Err(vortex_err!( | ||
| "Expected list scope for GetItemList execution, got {}", | ||
| input.dtype() | ||
| )), | ||
| } |
There was a problem hiding this comment.
The logic for handling DType::List and DType::FixedSizeList is very similar, particularly the part that applies the struct's validity mask to the projected field. To improve maintainability and reduce code duplication, consider refactoring the common logic.
For example, you could extract a helper function to apply the mask from the struct elements:
fn apply_struct_mask(field: crate::Array, struct_elems: &StructArray) -> VortexResult<crate::Array> {
match struct_elems.dtype().nullability() {
Nullability::NonNullable => Ok(field),
Nullability::Nullable => crate::compute::mask(&field, &struct_elems.validity_mask()?.not()),
}
}Using this helper in both match arms would make the execute method cleaner. You might find opportunities for further generalization to handle both list types in a more unified manner.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant feature: nested projection for list<struct> and fixed_size_list<struct> types. This is achieved by adding a new internal expression GetItemList and a corresponding ListLayout. The changes are well-implemented, with improved split planning and necessary adjustments to the scan builder logic. The addition of regression tests is also a great touch. My review focuses on a few opportunities to improve code clarity and enforce invariants.
| match &layout.dtype { | ||
| DType::List(..) => 2 + validity_children, // offsets + elements | ||
| DType::FixedSizeList(..) => 1 + validity_children, // elements | ||
| _ => 0, | ||
| } |
There was a problem hiding this comment.
The build function for ListLayout ensures that the dtype is always a List or FixedSizeList. Therefore, the _ arm in this match should be unreachable. Using unreachable! would make this invariant explicit and prevent silent failures if the layout is ever constructed incorrectly.
| match &layout.dtype { | |
| DType::List(..) => 2 + validity_children, // offsets + elements | |
| DType::FixedSizeList(..) => 1 + validity_children, // elements | |
| _ => 0, | |
| } | |
| match &layout.dtype { | |
| DType::List(..) => 2 + validity_children, // offsets + elements | |
| DType::FixedSizeList(..) => 1 + validity_children, // elements | |
| _ => unreachable!("ListLayout only supports List and FixedSizeList dtypes, got {}", layout.dtype), | |
| } |
| let idx = match self.dtype() { | ||
| DType::List(..) => { | ||
| if self.dtype().is_nullable() { | ||
| 2 | ||
| } else { | ||
| 1 | ||
| } | ||
| } | ||
| DType::FixedSizeList(..) => { | ||
| if self.dtype().is_nullable() { | ||
| 1 | ||
| } else { | ||
| 0 | ||
| } | ||
| } | ||
| _ => return Err(vortex_err!("Expected list dtype, got {}", self.dtype())), | ||
| }; |
There was a problem hiding this comment.
The logic to determine the index of the elements child can be simplified by using the boolean-to-integer cast of is_nullable().
let idx = match self.dtype() {
DType::List(..) => 1 + self.dtype().is_nullable() as usize,
DType::FixedSizeList(..) => self.dtype().is_nullable() as usize,
_ => return Err(vortex_err!("Expected list dtype, got {}", self.dtype())),
};| let offsets_slice_u64: VortexResult<Vec<u64>> = | ||
| match offsets.ptype() { | ||
| ptype if ptype.is_unsigned_int() => vortex_dtype::match_each_unsigned_integer_ptype!(ptype, |T| { | ||
| Ok(offsets | ||
| .as_slice::<T>() | ||
| .iter() | ||
| .map(|&v| v.to_u64()) | ||
| .collect()) | ||
| }), | ||
| ptype if ptype.is_signed_int() => { | ||
| vortex_dtype::match_each_signed_integer_ptype!( | ||
| ptype, | ||
| |T| { | ||
| offsets | ||
| .as_slice::<T>() | ||
| .iter() | ||
| .map(|&v| { | ||
| u64::try_from(v).map_err(|_| { | ||
| vortex_err!( | ||
| "List offsets must be convertible to u64" | ||
| ) | ||
| }) | ||
| }) | ||
| .collect() | ||
| } | ||
| ) | ||
| } | ||
| other => Err(vortex_err!( | ||
| "List offsets must be an integer type, got {other}" | ||
| )), | ||
| }; |
There was a problem hiding this comment.
830efb6 to
cc87962
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant feature by enabling nested projections on lists of structs through a new GetItemList expression. The implementation is comprehensive, adding new layouts, readers, and writers for list types, and correctly updating the scan builders to handle expression simplification. The code is well-structured and includes good test coverage for the new functionality. I've identified a few opportunities for refactoring to improve code clarity and reduce duplication, which are detailed in the specific comments.
| let field = struct_elems.unmasked_field_by_name(field_name)?.clone(); | ||
| let field = match struct_elems.dtype().nullability() { | ||
| Nullability::NonNullable => field, | ||
| Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?, | ||
| }; |
There was a problem hiding this comment.
The logic to get the projected field and apply the struct's validity mask can be simplified by using the existing StructArray::field_by_name method. This helper function encapsulates this exact logic, making the code more concise and consistent with other parts of the codebase.
| let field = struct_elems.unmasked_field_by_name(field_name)?.clone(); | |
| let field = match struct_elems.dtype().nullability() { | |
| Nullability::NonNullable => field, | |
| Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?, | |
| }; | |
| let field = struct_elems.field_by_name(field_name)?; |
| let field = struct_elems.unmasked_field_by_name(field_name)?.clone(); | ||
| let field = match struct_elems.dtype().nullability() { | ||
| Nullability::NonNullable => field, | ||
| Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?, | ||
| }; |
There was a problem hiding this comment.
Similar to the DType::List branch, this block can be simplified by using the StructArray::field_by_name helper method to get the projected field while correctly handling the struct's nullability.
| let field = struct_elems.unmasked_field_by_name(field_name)?.clone(); | |
| let field = match struct_elems.dtype().nullability() { | |
| Nullability::NonNullable => field, | |
| Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?, | |
| }; | |
| let field = struct_elems.field_by_name(field_name)?; |
| match dtype { | ||
| DType::List(_, list_nullability) => { | ||
| let offsets_reader = self.offsets()?.clone(); | ||
| let elements_reader = self.elements()?.clone(); | ||
| let row_range_clone = row_range.clone(); | ||
| let element_expr = element_expr.clone(); | ||
|
|
||
| Ok(Box::pin(async move { | ||
| let row_len = usize::try_from(row_range_clone.end - row_range_clone.start) | ||
| .vortex_expect("row range must fit in usize"); | ||
|
|
||
| let offsets_row_range = row_range_clone.start..row_range_clone.end + 1; | ||
| let offsets_len = row_len + 1; | ||
| let offsets_fut = offsets_reader.projection_evaluation( | ||
| &offsets_row_range, | ||
| &root(), | ||
| MaskFuture::new_true(offsets_len), | ||
| )?; | ||
|
|
||
| let (offsets, validity) = try_join!(offsets_fut, async move { | ||
| match validity_fut { | ||
| Some(v) => v.await.map(Some), | ||
| None => Ok(None), | ||
| } | ||
| })?; | ||
|
|
||
| let offsets = offsets.to_primitive(); | ||
| let offsets_slice = offsets.as_slice::<u64>(); | ||
| let base = *offsets_slice.first().unwrap_or(&0u64); | ||
| let end = *offsets_slice.last().unwrap_or(&base); | ||
|
|
||
| let elements_row_range = base..end; | ||
| let elements_len = usize::try_from(end - base) | ||
| .vortex_expect("element range must fit in usize"); | ||
| let elements = elements_reader.projection_evaluation( | ||
| &elements_row_range, | ||
| &element_expr, | ||
| MaskFuture::new_true(elements_len), | ||
| )?; | ||
|
|
||
| let elements = elements.await?; | ||
|
|
||
| let normalized_offsets = vortex_array::arrays::PrimitiveArray::from_iter( | ||
| offsets_slice.iter().map(|v| *v - base), | ||
| ) | ||
| .into_array(); | ||
|
|
||
| let validity = match (list_nullability, validity) { | ||
| (Nullability::NonNullable, _) => { | ||
| vortex_array::validity::Validity::NonNullable | ||
| } | ||
| (Nullability::Nullable, Some(v)) => { | ||
| vortex_array::validity::Validity::Array(v) | ||
| } | ||
| (Nullability::Nullable, None) => vortex_array::validity::Validity::AllValid, | ||
| }; | ||
|
|
||
| Ok(ListArray::try_new(elements, normalized_offsets, validity)?.into_array()) | ||
| })) | ||
| } | ||
| DType::FixedSizeList(_, list_size, list_nullability) => { | ||
| let elements_reader = self.elements()?.clone(); | ||
| let row_range_clone = row_range.clone(); | ||
| let element_expr = element_expr.clone(); | ||
|
|
||
| Ok(Box::pin(async move { | ||
| let row_len_u64 = row_range_clone.end - row_range_clone.start; | ||
| let row_len = | ||
| usize::try_from(row_len_u64).vortex_expect("row range must fit in usize"); | ||
|
|
||
| let list_size_u64 = u64::from(list_size); | ||
| let element_start = row_range_clone | ||
| .start | ||
| .checked_mul(list_size_u64) | ||
| .ok_or_else(|| vortex_err!("FixedSizeList element start overflow"))?; | ||
| let element_end = row_range_clone | ||
| .end | ||
| .checked_mul(list_size_u64) | ||
| .ok_or_else(|| vortex_err!("FixedSizeList element end overflow"))?; | ||
|
|
||
| let elements_row_range = element_start..element_end; | ||
| let elements_len = usize::try_from(element_end - element_start) | ||
| .vortex_expect("element range must fit in usize"); | ||
| let elements_fut = elements_reader.projection_evaluation( | ||
| &elements_row_range, | ||
| &element_expr, | ||
| MaskFuture::new_true(elements_len), | ||
| )?; | ||
|
|
||
| let (elements, validity) = try_join!(elements_fut, async move { | ||
| match validity_fut { | ||
| Some(v) => v.await.map(Some), | ||
| None => Ok(None), | ||
| } | ||
| })?; | ||
|
|
||
| let validity = match (list_nullability, validity) { | ||
| (Nullability::NonNullable, _) => { | ||
| vortex_array::validity::Validity::NonNullable | ||
| } | ||
| (Nullability::Nullable, Some(v)) => { | ||
| vortex_array::validity::Validity::Array(v) | ||
| } | ||
| (Nullability::Nullable, None) => vortex_array::validity::Validity::AllValid, | ||
| }; | ||
|
|
||
| Ok( | ||
| FixedSizeListArray::try_new(elements, list_size, validity, row_len)? | ||
| .into_array(), | ||
| ) | ||
| })) | ||
| } | ||
| _ => Err(vortex_err!("Expected list dtype, got {}", dtype)), | ||
| } |
There was a problem hiding this comment.
There's some repeated logic for handling validity_fut and constructing the final Validity in both branches of the match dtype. This could be refactored to reduce duplication.
You could move the async move block to wrap the whole function body and resolve validity_fut once at the beginning. Then, inside the match, you would only need to await the list-specific futures (offsets_fut or elements_fut). This would make the code cleaner and avoid repeating the validity handling logic.
| pub fn dtype(&self) -> VortexResult<DType> { | ||
| self.projection.return_dtype(self.layout_reader.dtype()) | ||
| // NOTE: `GetItem` may simplify into `GetItemList` for list-of-struct projections. | ||
| // To avoid rejecting valid nested projections (like `items.a`) we must simplify before | ||
| // validating the return dtype. | ||
| // | ||
| // Also, `row_idx` support is provided by `RowIdxLayoutReader`, so use the same reader | ||
| // enrichment as `prepare`. | ||
| let layout_reader = Arc::new(RowIdxLayoutReader::new( | ||
| self.row_offset, | ||
| self.layout_reader.clone(), | ||
| self.session.clone(), | ||
| )); | ||
| let projection = self.projection.optimize_recursive(layout_reader.dtype())?; | ||
| projection.return_dtype(layout_reader.dtype()) | ||
| } |
There was a problem hiding this comment.
The logic for creating the RowIdxLayoutReader and simplifying the projection is duplicated between this dtype method and the prepare method. This could be extracted into a private helper method to avoid code duplication and ensure consistency.
For example, you could have a prepare_projection_and_reader(&self) method that returns the simplified projection, its dtype, and the enriched layout reader. Both dtype() and prepare() could then call this helper.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and valuable feature: enabling nested projection for list<struct> and fixed_size_list<struct>. The implementation is well-designed, introducing a new get_item_list expression and a simplify rule on get_item to handle these projections effectively. The addition of a new ListLayout and improvements to split planning for FixedSizeList are also great enhancements. The code is clean, well-structured, and includes thorough regression tests. I have one suggestion regarding a potential missed optimization for List split planning.
| // TODO(a10y): Variable-size lists can only be split "naturally" based on elements, | ||
| // but mapping element splits back to row ranges requires offsets (or precomputed | ||
| // row->element boundary metadata) that we don't have at split-planning time. | ||
| DType::List(..) => {} |
There was a problem hiding this comment.
The PR description mentions improving split planning for List by splitting via offsets. However, this branch of register_splits is currently empty.
While the TODO comment correctly points out the difficulty of mapping element splits back to rows, we could still implement splitting based on the chunk boundaries of the offsets array itself. A split in the offsets array at index i would correspond to a split in the list array at row i.
Was this intentionally omitted? It seems like a useful optimization that would align with the PR's goals.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for nested projections on list<struct> and fixed_size_list<struct> types, which is a great enhancement. This is primarily achieved by adding a new internal get_item_list expression and implementing a pushdown mechanism in the new ListLayout. The changes also include improvements to split planning for list types and add comprehensive regression tests. The implementation is solid and the changes are well-tested. I have one minor suggestion to simplify the code in the new ListReader.
| for offsets_split in offsets_splits { | ||
| if offsets_split <= row_range.start { | ||
| continue; | ||
| } | ||
|
|
||
| let row_split = offsets_split - 1; | ||
| if row_split > row_range.start && row_split < row_range.end { | ||
| splits.insert(row_split); | ||
| } | ||
| } |
There was a problem hiding this comment.
The initial check if offsets_split <= row_range.start is redundant. The offsets_splits are generated for an offsets_range which starts at row_range.start, so offsets_split will always be >= row_range.start. The only case where the condition is true is offsets_split == row_range.start, which results in row_split = row_range.start - 1. This is already correctly handled by the subsequent if row_split > row_range.start check. Removing the redundant condition will simplify the loop.
for offsets_split in offsets_splits {
let row_split = offsets_split - 1;
if row_split > row_range.start && row_split < row_range.end {
splits.insert(row_split);
}
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively enables nested projection for list<struct> and fixed_size_list<struct> types. The introduction of the internal get_item_list expression is a clever and pragmatic solution to handle this without a full map expression implementation. The changes are well-integrated across the expression system, layout management, and scanning logic. The new ListLayout with its reader and writer components is well-designed and correctly handles component shredding. The improvements to split planning for list types are also a valuable addition. The added unit tests are comprehensive and provide good coverage for the new functionality. Overall, this is a high-quality contribution that significantly enhances Vortex's capabilities for handling nested data.
6c6326a to
3690d75
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements by enabling nested projection for list<struct> and fixed_size_list<struct> types. This is achieved through the addition of an internal get_item_list expression and enhanced split planning for list layouts. The changes are well-implemented, including new layout components, updated expression simplification logic, and comprehensive unit tests to validate the new functionality. The refactoring in scan_builder.rs also improves code organization.
Summary
Tests
Local (Windows):
Notes:
References