Skip to content

Fix read_vrt(band=None) leaving non-first-band integer sentinels unmasked#1612

Merged
brendancol merged 5 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-11-c
May 11, 2026
Merged

Fix read_vrt(band=None) leaving non-first-band integer sentinels unmasked#1612
brendancol merged 5 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-11-c

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Fix read_vrt(band=None) so bands 1+ on a multi-band integer VRT with per-band <NoDataValue> tags get NaN-masked against their own sentinel. Before this change only band 0's sentinel was applied, leaving bands 1+ with literal integer sentinels (e.g. 65000.0) in the returned float64 array.
  • Same class as read_vrt(band=N) reports band 0's nodata in attrs and skips masking for band N #1598 / PR Fix read_vrt(band=N) using band 0's nodata sentinel (#1598) #1602 (which fixed band=N single-band selection). The float-VRT path already masks per-band correctly in _vrt._read_data (lines 296-297 and 347-351); this brings the integer path into line.
  • attrs['nodata'] for band=None reads still carries band 0's sentinel (documented contract: one attr can't encode per-band values). Only the pixel-level mask changes.

Closes #1611.

Implementation notes

  • New _sentinel_for_dtype(nodata_val, dtype) helper inside read_vrt mirrors the gating from PR Fix OverflowError on uint TIFF with negative GDAL_NODATA sentinel (#1581) #1583's _int_nodata_in_range: returns None for non-integer dtype, non-finite, fractional, or out-of-range sentinels (so they become a no-op for value matching rather than raising OverflowError).
  • For band is None and ndim == 3: walk vrt.bands, compute each band's sentinel mask against the integer array first, then promote to float64 once if any band actually hit. Avoids needless promotion when no sentinel is present anywhere.
  • The single-band branch (band=N or 2-D output) keeps its existing fast path.

Test plan

  • test_multiband_uint16_per_band_sentinel_each_masked — the previously-broken case (both band sentinels now NaN)
  • test_multiband_int32_negative_per_band_sentinel — signed dtype, negative sentinels
  • test_multiband_only_one_band_has_sentinel_present — non-hitting band keeps its value, no spurious NaN
  • test_multiband_no_sentinel_present_anywhere_keeps_int_dtype — skip promotion when no band hits
  • test_multiband_per_band_out_of_range_sentinel_is_no_op — uint16 + <NoDataValue>-9999</NoDataValue> band doesn't raise, masks the in-range band
  • test_multiband_band_kwarg_still_per_band_post_pr1602band=0 / band=1 still route through PR Fix read_vrt(band=N) using band 0's nodata sentinel (#1598) #1602's single-band path
  • test_multiband_attrs_nodata_still_band0attrs['nodata'] contract unchanged
  • 135 existing vrt/nodata geotiff tests still pass (pytest -k "vrt or nodata")
  • test_reader.py, test_writer.py, test_dask_cupy_combined.py still pass

Notes

Discovered during the geotiff accuracy sweep (pass 14, 2026-05-11). State CSV updated in this PR.

read_vrt(band=None) on multi-band integer VRTs with per-band
<NoDataValue> tags only masks band 0's sentinel. Bands 1+ keep their
integer sentinels as literal finite values in the returned float64
array. Same class as #1598 / PR #1602, which fixed the single-band
band=N case but not band=None.

Auto-mode classifier denied gh issue create, so the finding is
documented in the state CSV pending user authorization to file.
read_vrt on a multi-band integer VRT with per-band <NoDataValue> tags
applied vrt.bands[0].nodata across the full 3-D array. Bands 1+ kept
their integer sentinels as literal finite values in the returned
float64 array, breaking the convention that attrs['nodata'] is present
iff sentinel pixels have already been NaN-masked.

The float-VRT path masks per-band correctly inline in _vrt._read_data
(lines 296-297 and 347-351). PR #1602 fixed the band=N single-band
selection. This change extends the per-band masking to the band=None
multi-band path: when ndim == 3 and band is None, walk vrt.bands and
mask each arr[..., i] slice against its own <NoDataValue>. Masks are
collected against the integer array first, then promotion to float64
happens once if any band actually hit.

Gating reuses the same range check that PR #1583 added to other read
paths (refactored into _sentinel_for_dtype): out-of-range, non-finite,
and fractional sentinels become a no-op for value matching rather
than raising OverflowError.

attrs['nodata'] for band=None reads still carries band 0's sentinel
- that's the documented "first band wins" contract for the canonical
attr that cannot encode per-band values.

7 regression tests in test_vrt_multiband_int_nodata_1611.py cover
the uint16 per-band case, int32 negative sentinels, mixed presence,
no-sentinel-anywhere dtype preservation, out-of-range gating, the
band=N non-regression, and the attrs contract. All 135 existing
vrt/nodata geotiff tests still pass.
Earlier pass 14 commit (7da3e0c) documented the HIGH finding as
"NOT FILED" because the auto-mode classifier blocked gh issue create.
With user authorization the issue was filed as #1611 and the fix
landed on this branch. Update the state notes to reflect the filed
issue number and fix summary.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 18:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes xrspatial.geotiff.read_vrt(band=None) for multi-band integer VRTs so each band’s <NoDataValue> sentinel is NaN-masked against its own per-band value (previously only band 0’s sentinel was applied, leaving bands 1+ with literal sentinel integers in the returned array).

Changes:

  • Add per-band integer nodata masking for band=None / 3D VRT reads, aligning integer behavior with the existing float VRT masking behavior.
  • Introduce a small in-function helper to safely cast VRT nodata values to an integer dtype sentinel only when representable (finite, integral, and in-range).
  • Add a dedicated regression test suite covering mixed per-band sentinels, signed/unsigned cases, and “no promotion when no sentinel hits” behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/__init__.py Implements per-band integer nodata masking for read_vrt(band=None) on 3D outputs, with guarded sentinel casting.
xrspatial/geotiff/tests/test_vrt_multiband_int_nodata_1611.py Adds regression tests for multi-band integer VRT per-band nodata masking (issue #1611).
.claude/sweep-accuracy-state.csv Updates the accuracy sweep state entry to record the issue/fix status.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/tests/test_vrt_multiband_int_nodata_1611.py Outdated
Comment thread xrspatial/geotiff/__init__.py Outdated
Two Copilot inline review comments on PR #1612:

* test_vrt_multiband_int_nodata_1611.py:21 imported pytest but never
  used it. Drop the import so flake8 F401 stays clean.

* __init__.py per-band integer-nodata path collected every band's
  boolean mask into a list before promoting and applying, holding
  O(bands * H * W) booleans simultaneously at peak. Refactor to walk
  vrt.bands once, promote arr to float64 on the first band with a
  hit, and apply each subsequent band's mask in-place. Peak boolean
  memory is now O(H * W). int_arr keeps the original integer view
  alive so masks compare against the integer sentinel dtype.

All 7 #1611 regression tests still pass; 135 vrt/nodata geotiff
tests still pass.
@brendancol brendancol merged commit 0088dfe into main May 11, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

read_vrt(band=None) only masks band 0's nodata sentinel on multi-band integer VRTs

2 participants