Skip to content

feat(rust/sedona-raster-gdal): add RS_FromPath#831

Open
Kontinuation wants to merge 6 commits into
apache:mainfrom
Kontinuation:pr-c-rs-frompath
Open

feat(rust/sedona-raster-gdal): add RS_FromPath#831
Kontinuation wants to merge 6 commits into
apache:mainfrom
Kontinuation:pr-c-rs-frompath

Conversation

@Kontinuation
Copy link
Copy Markdown
Member

@Kontinuation Kontinuation commented May 11, 2026

Summary

  • add RS_FromPath for constructing out-db rasters from GDAL-openable paths
  • wire sedona-raster-gdal and SedonaContext minimally for this standalone child PR
  • keep other GDAL-backed raster functions out of scope for follow-up PRs

Testing

  • cargo test -p sedona-raster-gdal rs_from_path --no-default-features
  • cargo clippy -p sedona-raster-gdal --no-default-features -- -D warnings
  • cargo test -p sedona-raster-gdal -p sedona --lib
  • cargo clippy -p sedona-raster-gdal -p sedona --lib -- -D warnings

@Kontinuation Kontinuation requested a review from Copilot May 12, 2026 16:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new GDAL-backed raster UDF that constructs out-of-db rasters from GDAL-openable paths, and wires it into SedonaContext for availability.

Changes:

  • Introduced RS_FromPath-style UDF (rs_frompath) backed by GDAL, plus a Criterion benchmark.
  • Added out-db raster construction helper (append_as_outdb_raster) and centralized nodata (de)serialization helpers.
  • Registered GDAL UDFs during SedonaContext initialization and added SQL reference docs.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
rust/sedona/src/context.rs Registers all GDAL UDFs into SedonaContext during initialization.
rust/sedona-raster-gdal/src/utils.rs Adds out-db raster builder path, refactors nodata conversion usage, expands tests.
rust/sedona-raster-gdal/src/rs_frompath.rs Implements the path-based raster UDF and its unit tests.
rust/sedona-raster-gdal/src/lib.rs Exposes the new UDF and provides all_gdal_udfs().
rust/sedona-raster-gdal/src/gdal_dataset_provider.rs Refactors nodata setting to shared helper.
rust/sedona-raster-gdal/src/gdal_common.rs Adds shared helpers for reading/setting band nodata bytes.
rust/sedona-raster-gdal/benches/rs_frompath.rs Adds benchmarks for the new UDF using test fixtures.
rust/sedona-raster-gdal/Cargo.toml Adds deps needed for the UDF + registers the benchmark target.
docs/reference/sql/rs_frompath.qmd Adds SQL reference documentation for RS_FromPath.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rust/sedona-raster-gdal/src/rs_frompath.rs Outdated
Comment thread rust/sedona-raster-gdal/src/utils.rs
Comment thread rust/sedona/src/context.rs Outdated
Comment thread rust/sedona-raster-gdal/src/rs_frompath.rs Outdated
Comment thread rust/sedona-raster-gdal/src/gdal_common.rs
@Kontinuation Kontinuation marked this pull request as ready for review May 13, 2026 02:53
@jiayuasu
Copy link
Copy Markdown
Member

@james-willis please review too

Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

Comment on lines +68 to +77
let len = path_array.len();
let mut builder = RasterBuilder::new(len);
for i in 0..len {
if path_array.is_null(i) {
builder.append_null()?;
} else {
let path = path_array.value(i);
append_as_outdb_raster(gdal, path, &mut builder)?;
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, although our iteration elsewhere is generally for item_opt in path_array { ... }

Comment on lines +80 to +87

match &args[0] {
ColumnarValue::Scalar(_) => {
let scalar = datafusion_common::ScalarValue::try_from_array(&result, 0)?;
Ok(ColumnarValue::Scalar(scalar))
}
ColumnarValue::Array(_) => Ok(ColumnarValue::Array(result)),
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match &args[0] {
ColumnarValue::Scalar(_) => {
let scalar = datafusion_common::ScalarValue::try_from_array(&result, 0)?;
Ok(ColumnarValue::Scalar(scalar))
}
ColumnarValue::Array(_) => Ok(ColumnarValue::Array(result)),
}
executor.finish(result)

Comment on lines +64 to +65

let paths = args[0].cast_to(&DataType::Utf8, None)?.into_array(1)?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let paths = args[0].cast_to(&DataType::Utf8, None)?.into_array(1)?;
let executor = WkbBytesExecutor::new(arg_types, args);
let paths = args[0].cast_to(&DataType::Utf8, None)?.into_array(executor.num_iterations())?;

The rest of this loop suggests this can handle both scalars and columns but I think as written this will fail for anything except a scalar or one row table

Comment on lines +116 to +122
fn assert_raster_dimensions(
result: &ColumnarValue,
expected_len: usize,
width: u64,
height: u64,
) {
match result {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your problem for this PR, but we really need support for rasters in the ScalarUdfTester so that these tests can be more compact.

In the meantime this could be moved to sedona_testing::rasters which has a few asserters along these lines.

Comment on lines +107 to +117
/// Append a raster source path as a single out-db raster to the provided [`RasterBuilder`].
pub fn append_as_outdb_raster(gdal: &Gdal, path: &str, builder: &mut RasterBuilder) -> Result<()> {
let gdal_path = normalize_outdb_source_path(path);
let dataset = gdal
.open_ex_with_options(
&gdal_path,
DatasetOptions {
open_flags: GDAL_OF_RASTER | GDAL_OF_READONLY,
..Default::default()
},
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing this does blocking IO. Should we make any attempt to mitigate this or implement RS_FromPath as an async UDF to get the auto batch size adjustment from DataFusion for these types of functions?

Comment thread rust/sedona-raster-gdal/Cargo.toml Outdated
result_large_err = "allow"

[dependencies]
arrow = { workspace = true }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a targeted subcrate for whatever uses this import?

Comment thread rust/sedona/src/context.rs Outdated
);

for udf in sedona_raster_gdal::all_gdal_udfs() {
out.ctx.register_udf(udf.into());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use the same pattern as the other crates that provide scalar functions? (e.g., out.register_scalar_kernels(sedona_raster_gdal::register::scalar_kernels().into_iter())?;). I suppose if we make it an async UDF we'll need the pattern you have here.

Copy link
Copy Markdown
Member

@zhangfengcdt zhangfengcdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to align the registration the same way as other UDFs via scalar_kernels() from a register module inside sedona-raster-gdal as @paleolimbot already points out, otherwise, LTGM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants