feat(rust/sedona-raster-gdal): add RS_FromPath#831
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new GDAL-backed raster UDF that constructs out-of-db rasters from GDAL-openable paths, and wires it into SedonaContext for availability.
Changes:
- Introduced
RS_FromPath-style UDF (rs_frompath) backed by GDAL, plus a Criterion benchmark. - Added out-db raster construction helper (
append_as_outdb_raster) and centralized nodata (de)serialization helpers. - Registered GDAL UDFs during
SedonaContextinitialization and added SQL reference docs.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/sedona/src/context.rs | Registers all GDAL UDFs into SedonaContext during initialization. |
| rust/sedona-raster-gdal/src/utils.rs | Adds out-db raster builder path, refactors nodata conversion usage, expands tests. |
| rust/sedona-raster-gdal/src/rs_frompath.rs | Implements the path-based raster UDF and its unit tests. |
| rust/sedona-raster-gdal/src/lib.rs | Exposes the new UDF and provides all_gdal_udfs(). |
| rust/sedona-raster-gdal/src/gdal_dataset_provider.rs | Refactors nodata setting to shared helper. |
| rust/sedona-raster-gdal/src/gdal_common.rs | Adds shared helpers for reading/setting band nodata bytes. |
| rust/sedona-raster-gdal/benches/rs_frompath.rs | Adds benchmarks for the new UDF using test fixtures. |
| rust/sedona-raster-gdal/Cargo.toml | Adds deps needed for the UDF + registers the benchmark target. |
| docs/reference/sql/rs_frompath.qmd | Adds SQL reference documentation for RS_FromPath. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@james-willis please review too |
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you for working on this!
| let len = path_array.len(); | ||
| let mut builder = RasterBuilder::new(len); | ||
| for i in 0..len { | ||
| if path_array.is_null(i) { | ||
| builder.append_null()?; | ||
| } else { | ||
| let path = path_array.value(i); | ||
| append_as_outdb_raster(gdal, path, &mut builder)?; | ||
| } | ||
| } |
There was a problem hiding this comment.
This works, although our iteration elsewhere is generally for item_opt in path_array { ... }
|
|
||
| match &args[0] { | ||
| ColumnarValue::Scalar(_) => { | ||
| let scalar = datafusion_common::ScalarValue::try_from_array(&result, 0)?; | ||
| Ok(ColumnarValue::Scalar(scalar)) | ||
| } | ||
| ColumnarValue::Array(_) => Ok(ColumnarValue::Array(result)), | ||
| } |
There was a problem hiding this comment.
| match &args[0] { | |
| ColumnarValue::Scalar(_) => { | |
| let scalar = datafusion_common::ScalarValue::try_from_array(&result, 0)?; | |
| Ok(ColumnarValue::Scalar(scalar)) | |
| } | |
| ColumnarValue::Array(_) => Ok(ColumnarValue::Array(result)), | |
| } | |
| executor.finish(result) |
|
|
||
| let paths = args[0].cast_to(&DataType::Utf8, None)?.into_array(1)?; |
There was a problem hiding this comment.
| let paths = args[0].cast_to(&DataType::Utf8, None)?.into_array(1)?; | |
| let executor = WkbBytesExecutor::new(arg_types, args); | |
| let paths = args[0].cast_to(&DataType::Utf8, None)?.into_array(executor.num_iterations())?; |
The rest of this loop suggests this can handle both scalars and columns but I think as written this will fail for anything except a scalar or one row table
| fn assert_raster_dimensions( | ||
| result: &ColumnarValue, | ||
| expected_len: usize, | ||
| width: u64, | ||
| height: u64, | ||
| ) { | ||
| match result { |
There was a problem hiding this comment.
Not your problem for this PR, but we really need support for rasters in the ScalarUdfTester so that these tests can be more compact.
In the meantime this could be moved to sedona_testing::rasters which has a few asserters along these lines.
| /// Append a raster source path as a single out-db raster to the provided [`RasterBuilder`]. | ||
| pub fn append_as_outdb_raster(gdal: &Gdal, path: &str, builder: &mut RasterBuilder) -> Result<()> { | ||
| let gdal_path = normalize_outdb_source_path(path); | ||
| let dataset = gdal | ||
| .open_ex_with_options( | ||
| &gdal_path, | ||
| DatasetOptions { | ||
| open_flags: GDAL_OF_RASTER | GDAL_OF_READONLY, | ||
| ..Default::default() | ||
| }, | ||
| ) |
There was a problem hiding this comment.
I am guessing this does blocking IO. Should we make any attempt to mitigate this or implement RS_FromPath as an async UDF to get the auto batch size adjustment from DataFusion for these types of functions?
| result_large_err = "allow" | ||
|
|
||
| [dependencies] | ||
| arrow = { workspace = true } |
There was a problem hiding this comment.
Can we use a targeted subcrate for whatever uses this import?
| ); | ||
|
|
||
| for udf in sedona_raster_gdal::all_gdal_udfs() { | ||
| out.ctx.register_udf(udf.into()); |
There was a problem hiding this comment.
Can this use the same pattern as the other crates that provide scalar functions? (e.g., out.register_scalar_kernels(sedona_raster_gdal::register::scalar_kernels().into_iter())?;). I suppose if we make it an async UDF we'll need the pattern you have here.
zhangfengcdt
left a comment
There was a problem hiding this comment.
We may want to align the registration the same way as other UDFs via scalar_kernels() from a register module inside sedona-raster-gdal as @paleolimbot already points out, otherwise, LTGM!
Summary
RS_FromPathfor constructing out-db rasters from GDAL-openable pathssedona-raster-gdalandSedonaContextminimally for this standalone child PRTesting
cargo test -p sedona-raster-gdal rs_from_path --no-default-featurescargo clippy -p sedona-raster-gdal --no-default-features -- -D warningscargo test -p sedona-raster-gdal -p sedona --libcargo clippy -p sedona-raster-gdal -p sedona --lib -- -D warnings