feat(raster-gdal): port indb-raster loader utilities to canonical N-D schema#818
Closed
james-willis wants to merge 6 commits into
Closed
feat(raster-gdal): port indb-raster loader utilities to canonical N-D schema#818james-willis wants to merge 6 commits into
james-willis wants to merge 6 commits into
Conversation
Replaces apache#787's 2D-only band schema with the canonical N-D schema: spatial_dims/spatial_shape at the raster level; bands carry dim_names, source_shape, nullable view, outdb_uri, outdb_format, plus the non-nullable data buffer. Removes nodata_value, storage_type, outdb_url, and outdb_band_id - every one is encodable in the new schema: - storage_type ↔ outdb_uri.is_null() (null = InDb, set = OutDbRef). - outdb_url ↔ outdb_uri (no rename, same string). - outdb_band_id ↔ encoded inside outdb_uri (#band=N or GDAL native subdataset URI), parsed only inside the GDAL format driver. - nodata_value ↔ typed nodata: Binary (a null row means "no nodata"). Top-level adds spatial_dims: List<Utf8View> and spatial_shape: List<Int64>; nullable view is List<Struct<source_axis, start, step, steps: Int64>> where a null row encodes the canonical identity view. Note: intermediate commits in this PR are not expected to build; only the PR tip is CI-green. The trait, reader/builder, RS_* migration, and GDAL loader port land in subsequent commits.
RasterRef and BandRef accessors over the canonical N-D schema: spatial_dims/spatial_shape, transform, crs, num_bands, band(i), and band-level dim_names, source_shape, shape (visible, derived from view), view, data_type, nodata, outdb_uri, outdb_format, nd_buffer, contiguous_data returning Cow<[u8]>. validate_view enforces all view rules including i64-overflow on start + (steps-1)*step. NdBuffer exposes raw buffer + shape + byte strides + offset for zero-copy access (numpy / Arrow C Data Interface boundary); VIEW → byte strides happens inside nd_buffer(). Adds BandRef::is_2d() default method as the gate GDAL-backed paths use to refuse N-D input cleanly: true iff dim_names == ["y","x"] over the identity view.
… reader/builder + RS_* migration View-aware Arrow reader (RasterStructArray, BandRefImpl) with corruption- surgery (negative steps, bad source_axis, length mismatch) that round-trips an ArrowError. Builder exposes start_raster / start_band for full N-D plus start_raster_2d / start_band_2d for legacy 2D, with identity-view default written as a null view row. finish_raster validates each band's visible shape against the raster's spatial_shape along the spatial dims. All 33 RS_* functions migrated mechanically; outputs on 2D inputs are byte-identical to apache#787. RS_BandPath keeps its existing inline fragment-stripping (format-agnostic display, untouched by the GDAL parser). Test helpers in sedona-testing rewritten on the N-D builder API.
Reads outdb_uri + parse_outdb_source instead of apache#787's storage_type / outdb_url / outdb_band_id triplet. Each GDAL-backed SQL function gates on BandRef::is_2d() at entry and returns an Execution error on N-D input. VSI normalization, the dataset cache, and RasterIO bodies are byte-for-byte unchanged from apache#787 - only the schema-read sites move. In-db reads use BandRef::contiguous_data() and require Cow::Borrowed so MEM datasets can point at the StructArray's backing buffer without copying; for is_2d identity views this always holds. Tests rebuilt to use RasterBuilder directly. Adds an N-D rejection test for raster_ref_to_gdal_mem and the VRT path, plus an end-to-end
6 tasks
`raster_ref_to_gdal_mem` previously returned a `Result<Dataset>` and
guarded against `BandRef::contiguous_data()` returning `Cow::Owned`
with a runtime tripwire ("Internal: contiguous_data must be borrowed
for is_2d bands; got owned"). The check was correct — handing GDAL a
pointer into a `Vec<u8>` that drops at the end of the iteration would
dangle — but it ties an internal invariant ("`is_2d` ⇒ Borrowed") to
incidental properties of today's reader. Any future copy path in the
reader (compression, BinaryView block-boundary stitching, alignment
fix-up, sliced/broadcast/transposed views from apache#813 / apache#750) would
detonate the tripwire on perfectly valid 2-D rasters.
Change: return `Result<(Dataset, Vec<Vec<u8>>)>`. On `Cow::Borrowed`
the GDAL band still points directly at the StructArray buffer
(zero-copy). On `Cow::Owned` we move the `Vec<u8>` out of the Cow
without copying — the reader's existing materialization is the only
allocation — and stash it in the returned vector. The caller (the
provider in `gdal_dataset_provider.rs`) parks it in a new
`RasterDataset::_owned_band_bytes` field that lives as long as the
MEM dataset that holds the pointers.
`raster_ref_to_gdal_empty` discards the always-empty vector.
… schema Reintroduces `append_as_indb_raster` and `dataset_to_indb_raster` (deleted on the N-D-schema branch because they used the legacy `BandMetadata`/`StorageType`/`band_data_writer` API). Reads are written against the canonical `["y", "x"]` 2-D schema using `start_raster_2d` / `start_band_2d` and `band_data_writer().append_value`. The full test suite (single-band GeoTIFF, uint64 / int64 / uint16 nodata round-trips, multi-band GeoTIFF, per-band MEM nodata, multi-raster append) is ported with assertions rewritten against the new `RasterRef` / `BandRef` accessors and `is_indb()`.
21235c3 to
fed97ea
Compare
Contributor
Author
|
Folded into #749. The loader () has been restored verbatim and the GDAL backend now uses the compatibility shim to stay close to main's API surface, so a separate port is no longer needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reintroduces
append_as_indb_rasteranddataset_to_indb_rasteron top of the canonical N-D raster schema. These helpers were deleted on #749 (jw/nd-raster-type) because the version onmain(added in #811) targets the legacyBandMetadata/StorageType/band_data_writerAPI that the N-D port removes. This PR ports the helpers and their full test suite to the new builder + reader API so the deletion in #749 isn't permanent.start_raster_2d/start_band_2dandband_data_writer().append_value— bands are emitted as canonical["y", "x"]2-D bands with the identity view.UInt64/Int64/UInt16nodata round-trips, multi-band GeoTIFF, per-band MEM nodata, multi-raster append) ported to the newRasterRef/BandRefaccessors andis_indb().Stacking
Test plan
cargo build -p sedona-raster-gdalcargo test -p sedona-raster-gdal(51 passed, including 7 newutils::tests::*)cargo clippy -p sedona-raster-gdal --all-targets -- -D warningscargo fmt --all --check