Fix list-of-struct nested projection by godnight10061 · Pull Request #2 · godnight10061/vortex

godnight10061 · 2026-01-20T07:47:48Z

Summary

Enable nested projection for list and fixed_size_list via internal get_item_list and list layout pushdown.
Improve split planning:
- FixedSizeList: map element splits back to row splits.
- List: map offsets splits back to row splits.
Add unit regressions for scan dtype simplification (list and fixed_size_list).

Tests

Local (Windows):

cargo +nightly fmt --all --check
cargo clippy --locked --workspace --all-targets --exclude vortex-test-e2e-cuda --exclude vortex-cuda --exclude vortex-cub --exclude vortex-nvcomp --exclude vortex-bench --exclude vortex-python --exclude vortex-duckdb --exclude vortex-fuzz --exclude duckdb-bench --exclude lance-bench --exclude datafusion-bench --exclude random-access-bench --exclude compress-bench -- -D warnings
cargo nextest run --locked --workspace --all-features --no-fail-fast --exclude vortex-test-e2e-cuda --exclude vortex-cuda --exclude vortex-cub --exclude vortex-nvcomp --exclude vortex-bench --exclude vortex-python --exclude vortex-duckdb --exclude vortex-fuzz --exclude duckdb-bench --exclude lance-bench --exclude datafusion-bench --exclude random-access-bench --exclude compress-bench

Notes:

setup-rust installs protoc; if running manually, ensure protoc is available on PATH (or PROTOC set).

References

Mirrors feat[layout]: list-of-struct nested projection vortex-data/vortex#6012

godnight10061 · 2026-01-20T09:38:19Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant improvement by fixing nested projection for list<struct> types. This is achieved by adding a new shredded list layout (vortex.list) and extending get_item and the list reader to support this new structure. The implementation appears robust and is accompanied by thorough tests, including a new integration test that validates the end-to-end workflow. While the overall approach is sound, I've identified several areas with significant code duplication, particularly in get_item.rs, which could be refactored to enhance maintainability. My review includes suggestions for these refactorings.

godnight10061 · 2026-01-20T11:19:33Z

/gemini review

gemini-code-assist

Code Review

This pull request effectively addresses nested projection for list<struct> types by introducing a new shredded list layout (vortex.list) and extending get_item to support it. The changes are comprehensive, including updates to the layout system, expression evaluation, and the addition of thorough tests. The implementation of the new list reader and writer strategies appears solid.

My review includes a couple of suggestions for refactoring in get_item.rs to reduce code duplication in the evaluate and execute methods, which would improve long-term maintainability. Overall, this is a great contribution that significantly enhances Vortex's capabilities for handling nested data.

godnight10061 · 2026-01-21T03:48:36Z

/gemini review

gemini-code-assist

Code Review

The pull request significantly enhances the handling of nested projections, particularly for List<Struct> types, by introducing a shredded layout (vortex.list). The new Map expression is a well-designed addition, seamlessly integrating into the expression simplification and evaluation framework. The vortex-layout changes correctly implement the ListLayout and its associated reader and writer strategies, which is crucial for efficient storage and retrieval of shredded list data. Furthermore, the updates in vortex-datafusion and vortex-duckdb ensure consistent application of expression optimization, leveraging the new Map expression's simplification capabilities. The new test cases provide thorough validation of the core functionality.

godnight10061 · 2026-01-21T04:08:40Z

/gemini review

gemini-code-assist

Code Review

This pull request effectively addresses the nested projection for list<struct> types by introducing a new vortex.list shredded layout and a map expression. The changes correctly extend get_item and list reader pushdown logic to support this functionality. Comprehensive tests have been added to validate the new behavior and optimizations. The integration of the new ListStrategy into the TableStrategy and the strategic use of optimize_recursive across different modules ensure that the new simplification rules are applied, leading to a robust and efficient solution.

godnight10061 · 2026-01-23T07:40:35Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces significant improvements for handling nested projections on list-of-struct and fixed-size-list-of-struct types. It achieves this by introducing a new GetItemList expression and a vortex.list layout encoding, which allows for shredding list arrays into their components (validity, offsets, elements) for more efficient processing and projection pushdown. The changes are well-tested with new unit and integration tests, and the dtype calculation logic in scan builders has been updated to account for the new simplification rules. The explicit marking of GetItemList as unstable and non-serializable is a clear indication of its current internal optimization role.

godnight10061 · 2026-01-23T08:18:22Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces significant improvements for handling nested projections on lists of structs by implementing a shredded layout (vortex.list) and a new GetItemList expression. The changes correctly propagate nullability and enable pushdown optimizations for these complex types. New tests validate the functionality for both variable-size and fixed-size lists of structs.

godnight10061 · 2026-01-23T11:43:42Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces significant improvements for handling nested projections involving lists of structs and fixed-size lists of structs. The core changes include a new GetItemList expression to specifically manage projections on list elements, and updates to the GetItem expression's simplify method to transform nested get_item calls into get_item_list when appropriate. The new ListLayout and ListStrategy provide robust mechanisms for reading and writing these complex data types, including improved split planning for FixedSizeList elements. Furthermore, the ScanBuilder and GpuScanBuilder have been updated to ensure that DType calculations correctly account for these simplified projections, preventing errors with nested field access. The added test cases thoroughly validate the new functionality and ensure proper behavior for various list-of-struct scenarios.

godnight10061 · 2026-01-25T12:30:08Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant feature: nested projection for lists of structs. This is achieved through a new get_item_list expression and a corresponding ListLayout. The changes are extensive and well-implemented, including improved split planning for lists and fixed-size lists.

A major refactoring from fields() to unmasked_fields() across the codebase enhances clarity regarding how struct validity is handled. Additionally, there are notable improvements in error handling, with panics being replaced by VortexResult in several places.

The PR also brings substantial additions to CUDA support, with new kernels for dictionary decoding and new benchmarks. The introduction of new fuzzing targets for compressed encodings is a great step towards improving the robustness of the library.

Overall, this is a high-quality contribution with well-thought-out changes and good test coverage. I have a couple of minor suggestions for code quality and documentation correctness.

gemini-code-assist · 2026-01-25T12:33:38Z

+    let ends_array = match ends_ptype {
+        PType::U8 => {
+            let ends_typed: Vec<u8> = ends
+                .iter()
+                .map(|&e| u8::try_from(e).vortex_expect("end value fits in u8"))
+                .collect();
+            PrimitiveArray::new(Buffer::copy_from(ends_typed), Validity::NonNullable).into_array()
+        }
+        PType::U16 => {
+            let ends_typed: Vec<u16> = ends
+                .iter()
+                .map(|&e| u16::try_from(e).vortex_expect("end value fits in u16"))
+                .collect();
+            PrimitiveArray::new(Buffer::copy_from(ends_typed), Validity::NonNullable).into_array()
+        }
+        PType::U32 => {
+            let ends_typed: Vec<u32> = ends
+                .iter()
+                .map(|&e| u32::try_from(e).vortex_expect("end value fits in u32"))
+                .collect();
+            PrimitiveArray::new(Buffer::copy_from(ends_typed), Validity::NonNullable).into_array()
+        }
+        PType::U64 => {
+            PrimitiveArray::new(Buffer::copy_from(ends), Validity::NonNullable).into_array()
+        }
+        _ => unreachable!("Only unsigned integer types are valid for ends"),
+    };


The logic for creating primitive arrays for different unsigned integer types is a bit repetitive. You can make this more concise by using PrimitiveArray::from_iter and avoiding the intermediate Vec collection. This will improve readability and maintainability.

let ends_array = match ends_ptype { PType::U8 => PrimitiveArray::from_iter( ends.iter() .map(|&e| u8::try_from(e).vortex_expect("end value fits in u8")), ) .into_array(), PType::U16 => PrimitiveArray::from_iter( ends.iter() .map(|&e| u16::try_from(e).vortex_expect("end value fits in u16")), ) .into_array(), PType::U32 => PrimitiveArray::from_iter( ends.iter() .map(|&e| u32::try_from(e).vortex_expect("end value fits in u32")), ) .into_array(), PType::U64 => PrimitiveArray::from(ends).into_array(), _ => unreachable!("Only unsigned integer types are valid for ends"), };

gemini-code-assist · 2026-01-25T12:33:38Z

 }

 impl<T: DeviceRepr + Send + Sync + 'static> DeviceBuffer for CudaDeviceBuffer<T> {
    /// Returns the number of elements in the buffer of type T.


The documentation for len() is now incorrect after the implementation change. It states that it returns the number of elements, but the implementation now correctly returns the number of bytes to conform to the DeviceBuffer trait. The doc comment should be updated to reflect this.

Suggested change

/// Returns the number of elements in the buffer of type T.

/// Returns the number of bytes in the buffer.

godnight10061 · 2026-01-25T12:38:36Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant feature enhancement by enabling nested projection for list<struct> and fixed_size_list<struct> types. This is achieved through a new internal GetItemList expression and a simplification rule on GetItem. The changes are comprehensive, affecting expression analysis, execution, and the scan builder to accommodate the new simplification path. A new vortex.list layout with reader/writer implementations and improved split planning is also a key part of this update. The added regression tests are thorough and validate the new functionality well. The code is of high quality, and I have one suggestion for a minor refactoring to reduce code duplication in the GetItemList execution logic for better maintainability.

gemini-code-assist · 2026-01-25T12:40:25Z

+        match input.dtype() {
+            DType::List(..) => {
+                let list = input.execute::<ListViewArray>(args.ctx)?;
+                let struct_elems = list.elements().clone().execute::<StructArray>(args.ctx)?;
+
+                let field = struct_elems.unmasked_field_by_name(field_name)?.clone();
+                let field = match struct_elems.dtype().nullability() {
+                    Nullability::NonNullable => field,
+                    Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?,
+                };
+
+                ListViewArray::try_new(
+                    field,
+                    list.offsets().clone(),
+                    list.sizes().clone(),
+                    list.validity().clone(),
+                )?
+                .into_array()
+                .execute(args.ctx)
+            }
+            DType::FixedSizeList(..) => {
+                let list = input.execute::<FixedSizeListArray>(args.ctx)?;
+                let struct_elems = list.elements().clone().execute::<StructArray>(args.ctx)?;
+
+                let field = struct_elems.unmasked_field_by_name(field_name)?.clone();
+                let field = match struct_elems.dtype().nullability() {
+                    Nullability::NonNullable => field,
+                    Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?,
+                };
+
+                FixedSizeListArray::try_new(
+                    field,
+                    list.list_size(),
+                    list.validity().clone(),
+                    list.len(),
+                )?
+                .into_array()
+                .execute(args.ctx)
+            }
+            _ => Err(vortex_err!(
+                "Expected list scope for GetItemList execution, got {}",
+                input.dtype()
+            )),
+        }


The logic for handling DType::List and DType::FixedSizeList is very similar, particularly the part that applies the struct's validity mask to the projected field. To improve maintainability and reduce code duplication, consider refactoring the common logic.

For example, you could extract a helper function to apply the mask from the struct elements:

fn apply_struct_mask(field: crate::Array, struct_elems: &StructArray) -> VortexResult<crate::Array> { match struct_elems.dtype().nullability() { Nullability::NonNullable => Ok(field), Nullability::Nullable => crate::compute::mask(&field, &struct_elems.validity_mask()?.not()), } }

Using this helper in both match arms would make the execute method cleaner. You might find opportunities for further generalization to handle both list types in a more unified manner.

godnight10061 · 2026-01-25T15:48:47Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant feature: nested projection for list<struct> and fixed_size_list<struct> types. This is achieved by adding a new internal expression GetItemList and a corresponding ListLayout. The changes are well-implemented, with improved split planning and necessary adjustments to the scan builder logic. The addition of regression tests is also a great touch. My review focuses on a few opportunities to improve code clarity and enforce invariants.

gemini-code-assist · 2026-01-25T15:51:53Z

+        match &layout.dtype {
+            DType::List(..) => 2 + validity_children, // offsets + elements
+            DType::FixedSizeList(..) => 1 + validity_children, // elements
+            _ => 0,
+        }


The build function for ListLayout ensures that the dtype is always a List or FixedSizeList. Therefore, the _ arm in this match should be unreachable. Using unreachable! would make this invariant explicit and prevent silent failures if the layout is ever constructed incorrectly.

Suggested change

match &layout.dtype {

DType::List(..) => 2 + validity_children, // offsets + elements

DType::FixedSizeList(..) => 1 + validity_children, // elements

_ => 0,

}

match &layout.dtype {

DType::List(..) => 2 + validity_children, // offsets + elements

DType::FixedSizeList(..) => 1 + validity_children, // elements

_ => unreachable!("ListLayout only supports List and FixedSizeList dtypes, got {}", layout.dtype),

}

gemini-code-assist · 2026-01-25T15:51:53Z

+        let idx = match self.dtype() {
+            DType::List(..) => {
+                if self.dtype().is_nullable() {
+                    2
+                } else {
+                    1
+                }
+            }
+            DType::FixedSizeList(..) => {
+                if self.dtype().is_nullable() {
+                    1
+                } else {
+                    0
+                }
+            }
+            _ => return Err(vortex_err!("Expected list dtype, got {}", self.dtype())),
+        };


The logic to determine the index of the elements child can be simplified by using the boolean-to-integer cast of is_nullable().

let idx = match self.dtype() { DType::List(..) => 1 + self.dtype().is_nullable() as usize, DType::FixedSizeList(..) => self.dtype().is_nullable() as usize, _ => return Err(vortex_err!("Expected list dtype, got {}", self.dtype())), };

gemini-code-assist · 2026-01-25T15:51:54Z

+                                        let offsets_slice_u64: VortexResult<Vec<u64>> =
+                                            match offsets.ptype() {
+                                                ptype if ptype.is_unsigned_int() => vortex_dtype::match_each_unsigned_integer_ptype!(ptype, |T| {
+                                                    Ok(offsets
+                                                        .as_slice::<T>()
+                                                        .iter()
+                                                        .map(|&v| v.to_u64())
+                                                        .collect())
+                                                }),
+                                                ptype if ptype.is_signed_int() => {
+                                                    vortex_dtype::match_each_signed_integer_ptype!(
+                                                        ptype,
+                                                        |T| {
+                                                        offsets
+                                                            .as_slice::<T>()
+                                                            .iter()
+                                                            .map(|&v| {
+                                                                u64::try_from(v).map_err(|_| {
+                                                                    vortex_err!(
+                                                                        "List offsets must be convertible to u64"
+                                                                    )
+                                                                })
+                                                            })
+                                                            .collect()
+                                                        }
+                                                    )
+                                                }
+                                                other => Err(vortex_err!(
+                                                    "List offsets must be an integer type, got {other}"
+                                                )),
+                                            };


This block for converting list offsets to Vec<u64> is quite complex and deeply nested. To improve readability and maintainability, consider extracting this logic into a separate helper function, for example fn offsets_to_u64(offsets: &PrimitiveArray) -> VortexResult<Vec<u64>>.

godnight10061 · 2026-01-29T07:40:00Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant feature by enabling nested projections on lists of structs through a new GetItemList expression. The implementation is comprehensive, adding new layouts, readers, and writers for list types, and correctly updating the scan builders to handle expression simplification. The code is well-structured and includes good test coverage for the new functionality. I've identified a few opportunities for refactoring to improve code clarity and reduce duplication, which are detailed in the specific comments.

gemini-code-assist · 2026-01-29T07:42:54Z

+                let field = struct_elems.unmasked_field_by_name(field_name)?.clone();
+                let field = match struct_elems.dtype().nullability() {
+                    Nullability::NonNullable => field,
+                    Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?,
+                };


The logic to get the projected field and apply the struct's validity mask can be simplified by using the existing StructArray::field_by_name method. This helper function encapsulates this exact logic, making the code more concise and consistent with other parts of the codebase.

Suggested change

let field = struct_elems.unmasked_field_by_name(field_name)?.clone();

let field = match struct_elems.dtype().nullability() {

Nullability::NonNullable => field,

Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?,

};

let field = struct_elems.field_by_name(field_name)?;

gemini-code-assist · 2026-01-29T07:42:54Z

+                let field = struct_elems.unmasked_field_by_name(field_name)?.clone();
+                let field = match struct_elems.dtype().nullability() {
+                    Nullability::NonNullable => field,
+                    Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?,
+                };


Similar to the DType::List branch, this block can be simplified by using the StructArray::field_by_name helper method to get the projected field while correctly handling the struct's nullability.

Suggested change

let field = struct_elems.unmasked_field_by_name(field_name)?.clone();

let field = match struct_elems.dtype().nullability() {

Nullability::NonNullable => field,

Nullability::Nullable => mask(&field, &struct_elems.validity_mask()?.not())?,

};

let field = struct_elems.field_by_name(field_name)?;

gemini-code-assist · 2026-01-29T07:42:54Z

+        match dtype {
+            DType::List(_, list_nullability) => {
+                let offsets_reader = self.offsets()?.clone();
+                let elements_reader = self.elements()?.clone();
+                let row_range_clone = row_range.clone();
+                let element_expr = element_expr.clone();
+
+                Ok(Box::pin(async move {
+                    let row_len = usize::try_from(row_range_clone.end - row_range_clone.start)
+                        .vortex_expect("row range must fit in usize");
+
+                    let offsets_row_range = row_range_clone.start..row_range_clone.end + 1;
+                    let offsets_len = row_len + 1;
+                    let offsets_fut = offsets_reader.projection_evaluation(
+                        &offsets_row_range,
+                        &root(),
+                        MaskFuture::new_true(offsets_len),
+                    )?;
+
+                    let (offsets, validity) = try_join!(offsets_fut, async move {
+                        match validity_fut {
+                            Some(v) => v.await.map(Some),
+                            None => Ok(None),
+                        }
+                    })?;
+
+                    let offsets = offsets.to_primitive();
+                    let offsets_slice = offsets.as_slice::<u64>();
+                    let base = *offsets_slice.first().unwrap_or(&0u64);
+                    let end = *offsets_slice.last().unwrap_or(&base);
+
+                    let elements_row_range = base..end;
+                    let elements_len = usize::try_from(end - base)
+                        .vortex_expect("element range must fit in usize");
+                    let elements = elements_reader.projection_evaluation(
+                        &elements_row_range,
+                        &element_expr,
+                        MaskFuture::new_true(elements_len),
+                    )?;
+
+                    let elements = elements.await?;
+
+                    let normalized_offsets = vortex_array::arrays::PrimitiveArray::from_iter(
+                        offsets_slice.iter().map(|v| *v - base),
+                    )
+                    .into_array();
+
+                    let validity = match (list_nullability, validity) {
+                        (Nullability::NonNullable, _) => {
+                            vortex_array::validity::Validity::NonNullable
+                        }
+                        (Nullability::Nullable, Some(v)) => {
+                            vortex_array::validity::Validity::Array(v)
+                        }
+                        (Nullability::Nullable, None) => vortex_array::validity::Validity::AllValid,
+                    };
+
+                    Ok(ListArray::try_new(elements, normalized_offsets, validity)?.into_array())
+                }))
+            }
+            DType::FixedSizeList(_, list_size, list_nullability) => {
+                let elements_reader = self.elements()?.clone();
+                let row_range_clone = row_range.clone();
+                let element_expr = element_expr.clone();
+
+                Ok(Box::pin(async move {
+                    let row_len_u64 = row_range_clone.end - row_range_clone.start;
+                    let row_len =
+                        usize::try_from(row_len_u64).vortex_expect("row range must fit in usize");
+
+                    let list_size_u64 = u64::from(list_size);
+                    let element_start = row_range_clone
+                        .start
+                        .checked_mul(list_size_u64)
+                        .ok_or_else(|| vortex_err!("FixedSizeList element start overflow"))?;
+                    let element_end = row_range_clone
+                        .end
+                        .checked_mul(list_size_u64)
+                        .ok_or_else(|| vortex_err!("FixedSizeList element end overflow"))?;
+
+                    let elements_row_range = element_start..element_end;
+                    let elements_len = usize::try_from(element_end - element_start)
+                        .vortex_expect("element range must fit in usize");
+                    let elements_fut = elements_reader.projection_evaluation(
+                        &elements_row_range,
+                        &element_expr,
+                        MaskFuture::new_true(elements_len),
+                    )?;
+
+                    let (elements, validity) = try_join!(elements_fut, async move {
+                        match validity_fut {
+                            Some(v) => v.await.map(Some),
+                            None => Ok(None),
+                        }
+                    })?;
+
+                    let validity = match (list_nullability, validity) {
+                        (Nullability::NonNullable, _) => {
+                            vortex_array::validity::Validity::NonNullable
+                        }
+                        (Nullability::Nullable, Some(v)) => {
+                            vortex_array::validity::Validity::Array(v)
+                        }
+                        (Nullability::Nullable, None) => vortex_array::validity::Validity::AllValid,
+                    };
+
+                    Ok(
+                        FixedSizeListArray::try_new(elements, list_size, validity, row_len)?
+                            .into_array(),
+                    )
+                }))
+            }
+            _ => Err(vortex_err!("Expected list dtype, got {}", dtype)),
+        }


There's some repeated logic for handling validity_fut and constructing the final Validity in both branches of the match dtype. This could be refactored to reduce duplication.

You could move the async move block to wrap the whole function body and resolve validity_fut once at the beginning. Then, inside the match, you would only need to await the list-specific futures (offsets_fut or elements_fut). This would make the code cleaner and avoid repeating the validity handling logic.

gemini-code-assist · 2026-01-29T07:42:54Z

    pub fn dtype(&self) -> VortexResult<DType> {
-        self.projection.return_dtype(self.layout_reader.dtype())
+        // NOTE: `GetItem` may simplify into `GetItemList` for list-of-struct projections.
+        // To avoid rejecting valid nested projections (like `items.a`) we must simplify before
+        // validating the return dtype.
+        //
+        // Also, `row_idx` support is provided by `RowIdxLayoutReader`, so use the same reader
+        // enrichment as `prepare`.
+        let layout_reader = Arc::new(RowIdxLayoutReader::new(
+            self.row_offset,
+            self.layout_reader.clone(),
+            self.session.clone(),
+        ));
+        let projection = self.projection.optimize_recursive(layout_reader.dtype())?;
+        projection.return_dtype(layout_reader.dtype())
    }


The logic for creating the RowIdxLayoutReader and simplifying the projection is duplicated between this dtype method and the prepare method. This could be extracted into a private helper method to avoid code duplication and ensure consistency.

For example, you could have a prepare_projection_and_reader(&self) method that returns the simplified projection, its dtype, and the enriched layout reader. Both dtype() and prepare() could then call this helper.

godnight10061 · 2026-01-29T08:50:07Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant and valuable feature: enabling nested projection for list<struct> and fixed_size_list<struct>. The implementation is well-designed, introducing a new get_item_list expression and a simplify rule on get_item to handle these projections effectively. The addition of a new ListLayout and improvements to split planning for FixedSizeList are also great enhancements. The code is clean, well-structured, and includes thorough regression tests. I have one suggestion regarding a potential missed optimization for List split planning.

gemini-code-assist · 2026-01-29T08:53:07Z

+            // TODO(a10y): Variable-size lists can only be split "naturally" based on elements,
+            // but mapping element splits back to row ranges requires offsets (or precomputed
+            // row->element boundary metadata) that we don't have at split-planning time.
+            DType::List(..) => {}


The PR description mentions improving split planning for List by splitting via offsets. However, this branch of register_splits is currently empty.

While the TODO comment correctly points out the difficulty of mapping element splits back to rows, we could still implement splitting based on the chunk boundaries of the offsets array itself. A split in the offsets array at index i would correspond to a split in the list array at row i.

Was this intentionally omitted? It seems like a useful optimization that would align with the PR's goals.

godnight10061 · 2026-01-29T13:23:02Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for nested projections on list<struct> and fixed_size_list<struct> types, which is a great enhancement. This is primarily achieved by adding a new internal get_item_list expression and implementing a pushdown mechanism in the new ListLayout. The changes also include improvements to split planning for list types and add comprehensive regression tests. The implementation is solid and the changes are well-tested. I have one minor suggestion to simplify the code in the new ListReader.

gemini-code-assist · 2026-01-29T13:25:15Z

+                for offsets_split in offsets_splits {
+                    if offsets_split <= row_range.start {
+                        continue;
+                    }
+
+                    let row_split = offsets_split - 1;
+                    if row_split > row_range.start && row_split < row_range.end {
+                        splits.insert(row_split);
+                    }
+                }


The initial check if offsets_split <= row_range.start is redundant. The offsets_splits are generated for an offsets_range which starts at row_range.start, so offsets_split will always be >= row_range.start. The only case where the condition is true is offsets_split == row_range.start, which results in row_split = row_range.start - 1. This is already correctly handled by the subsequent if row_split > row_range.start check. Removing the redundant condition will simplify the loop.

for offsets_split in offsets_splits { let row_split = offsets_split - 1; if row_split > row_range.start && row_split < row_range.end { splits.insert(row_split); } }

godnight10061 · 2026-01-29T19:36:20Z

/gemini review

gemini-code-assist

Code Review

This pull request effectively enables nested projection for list<struct> and fixed_size_list<struct> types. The introduction of the internal get_item_list expression is a clever and pragmatic solution to handle this without a full map expression implementation. The changes are well-integrated across the expression system, layout management, and scanning logic. The new ListLayout with its reader and writer components is well-designed and correctly handles component shredding. The improvements to split planning for list types are also a valuable addition. The added unit tests are comprehensive and provide good coverage for the new functionality. Overall, this is a high-quality contribution that significantly enhances Vortex's capabilities for handling nested data.

godnight10061 · 2026-01-30T09:13:20Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces significant improvements by enabling nested projection for list<struct> and fixed_size_list<struct> types. This is achieved through the addition of an internal get_item_list expression and enhanced split planning for list layouts. The changes are well-implemented, including new layout components, updated expression simplification logic, and comprehensive unit tests to validate the new functionality. The refactoring in scan_builder.rs also improves code organization.

godnight10061 added the fix Bug fixes label Jan 20, 2026

gemini-code-assist Bot reviewed Jan 20, 2026

View reviewed changes

Comment thread vortex-array/src/expr/exprs/get_item.rs Outdated

Comment thread vortex-array/src/expr/exprs/get_item.rs Outdated

Comment thread vortex-layout/src/layouts/list/mod.rs Outdated

Comment thread vortex-layout/src/layouts/list/writer.rs

gemini-code-assist Bot reviewed Jan 20, 2026

View reviewed changes

Comment thread vortex-array/src/expr/exprs/get_item.rs Outdated

Comment thread vortex-array/src/expr/exprs/get_item.rs Outdated

godnight10061 force-pushed the fix-list-of-struct-projection branch from d5a8576 to 87d9553 Compare January 21, 2026 02:35

gemini-code-assist Bot reviewed Jan 21, 2026

View reviewed changes

godnight10061 force-pushed the develop branch from 6439697 to f63b38e Compare January 23, 2026 05:41

godnight10061 force-pushed the fix-list-of-struct-projection branch from 6a2a8f7 to 2973c55 Compare January 23, 2026 07:12

godnight10061 added the changelog/fix Changelog: bug fix label Jan 23, 2026

gemini-code-assist Bot reviewed Jan 23, 2026

View reviewed changes

Comment thread vortex-layout/src/layouts/list/reader.rs Outdated

gemini-code-assist Bot reviewed Jan 23, 2026

View reviewed changes

godnight10061 force-pushed the develop branch from c9770e8 to 0b4b579 Compare January 23, 2026 13:04

godnight10061 force-pushed the fix-list-of-struct-projection branch 2 times, most recently from a7a6b7d to da77c0c Compare January 24, 2026 03:12

gemini-code-assist Bot reviewed Jan 25, 2026

View reviewed changes

godnight10061 changed the base branch from develop to develop-upstream January 25, 2026 12:37

godnight10061 changed the base branch from develop-upstream to develop January 25, 2026 12:37

gemini-code-assist Bot reviewed Jan 25, 2026

View reviewed changes

godnight10061 force-pushed the fix-list-of-struct-projection branch 3 times, most recently from 830efb6 to cc87962 Compare January 29, 2026 03:33

gemini-code-assist Bot reviewed Jan 29, 2026

View reviewed changes

godnight10061 added 15 commits January 30, 2026 15:49

Fix nested projection for list-of-struct columns

bb8ae25

Add unit regression for list-of-struct get_item

76d6958

Fix lints in list layout and projections

54489f3

Refactor list projection internals

7b7a679

Improve ListReader split planning for FixedSizeList

52d850e

Use get_item_list for list-of-struct projection

083f674

Address list-of-struct projection review

f83bc28

Fix scan dtype for list-of-struct projection

db2d278

Add unit regression for list-of-struct scan dtype

e58cf9a

Remove redundant GetItemList scope access special-case

875645c

Use unmasked_field_by_name in GetItemList

28db7d4

Refactor ScanBuilder projection prep

7bf74f8

Refactor GetItemList field projection

fdd2b4b

Improve list split planning via offsets

0d79be1

Simplify list offsets split mapping

8222856

godnight10061 force-pushed the fix-list-of-struct-projection branch from 6c6326a to 3690d75 Compare January 30, 2026 09:12

Fix list projection after VTable change

3690d75

gemini-code-assist Bot reviewed Jan 30, 2026

View reviewed changes

	/// Returns the number of elements in the buffer of type T.
	/// Returns the number of bytes in the buffer.

Conversation

godnight10061 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

References

Uh oh!

godnight10061 commented Jan 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

godnight10061 commented Jan 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

godnight10061 commented Jan 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

godnight10061 commented Jan 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

godnight10061 commented Jan 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

godnight10061 commented Jan 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

godnight10061 commented Jan 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

godnight10061 commented Jan 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

godnight10061 commented Jan 20, 2026 •

edited

Loading