Vectorize mode-resampling for COG overview generation by brendancol · Pull Request #1526 · xarray-contrib/xarray-spatial

brendancol · 2026-05-08T20:40:36Z

Summary

Replaces the per-pixel np.unique double loop in _block_reduce_2d(method='mode') with a vectorized sort-and-count over the (oh, ow, 4) block tensor.
Measured on a 1024x1024 uint8 input: 1008 ms (prior) -> 20 ms (new), about 50x faster (locally reproduced as 1037 ms -> 27 ms, ~39x).
Output is bit-exact identical to the prior implementation, verified by a reference copy of the old loop in the new test file.
Tie-break semantics preserved: when two values appear equally often the smaller value wins, because sorting groups equal values and np.argmax returns the leftmost max-count index.

Implementation

After reshaping each 2x2 block to a row of 4 cells:

Sort along the last axis so equal values are contiguous.
For each of the 4 positions, count cells equal to it (small fixed loop of length 4, each iteration vectorized over oh * ow).
np.argmax picks the leftmost position with the highest count, which after sorting is the smallest tied value.

Test plan

pytest xrspatial/geotiff/tests/test_mode_overview_perf.py -x -q (48 passed)
pytest xrspatial/geotiff/tests/test_cog.py xrspatial/geotiff/tests/test_sparse_cog.py -x -q (30 passed)
Bit-exact match against the prior reference for random uint8/uint16/int16/int32/uint32/int64 inputs at sizes including 17x19, 100x101, 64x65.
Hand-crafted tie-break cases (two-way tie, three-way tie, three-of-a-kind, all-same).
Sanity guard: 1024x1024 uint8 path completes in under 100 ms.

Replace the per-pixel double loop in `_block_reduce_2d(method='mode')` with a vectorized sort-and-count over the (oh, ow, 4) block tensor. On a 1024x1024 uint8 input the reference implementation took ~1037 ms; the vectorized path runs in ~27 ms (about 39x faster). Output is bit-exact identical to the prior implementation. Tie-break semantics ("lowest value wins" on equal counts) are preserved because sorting brings equal values adjacent and `np.argmax` returns the leftmost (smallest) position when counts tie. Adds tests/test_mode_overview_perf.py with bit-exact comparison against a copy of the old reference for randomized inputs across uint8/uint16/int16/int32/uint32/int64 and odd dimensions, hand-crafted tie-break cases, and a 100 ms sanity guard on a 1024^2 input.

Copilot

Pull request overview

This PR optimizes GeoTIFF/COG overview generation by replacing the previous per-block np.unique loop used for method='mode' downsampling with a vectorized NumPy approach, aiming to drastically reduce runtime while preserving the prior tie-break behavior.

Changes:

Replaced _block_reduce_2d(..., method='mode') implementation with a vectorized sort-and-count approach over 2x2 blocks.
Added correctness tests that compare output bit-for-bit against a reference implementation and cover key tie-break cases.
Added a performance-oriented test intended to guard against regressions in the mode-resampling path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`xrspatial/geotiff/_writer.py`	Implements the new vectorized mode resampling logic for 2x2 block reduction.
`xrspatial/geotiff/tests/test_mode_overview_perf.py`	Adds reference-based correctness tests, tie-break tests, and a runtime budget check for mode resampling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def test_perf_under_100ms_on_1024sq_uint8():
+    rng = np.random.default_rng(seed=0)
+    arr = rng.integers(0, 16, size=(1024, 1024), dtype=np.uint8)
+    # Warmup
+    _block_reduce_2d(arr, 'mode')
+    t0 = time.perf_counter()
+    out = _block_reduce_2d(arr, 'mode')
+    elapsed = time.perf_counter() - t0
+    assert out.shape == (512, 512)
+    assert elapsed < 0.1, (
+        f"mode resampling took {elapsed*1000:.1f} ms (threshold 100 ms)"
+    )


+    rng = np.random.default_rng(seed=42)
+    info = np.iinfo(dtype)
+    # Use a small categorical-style range so ties happen often.
+    lo = max(info.min, 0)


+    h2 = (shape[0] // 2) * 2
+    w2 = (shape[1] // 2) * 2
+    if h2 == 0 or w2 == 0:
+        return


github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026

brendancol requested a review from Copilot May 8, 2026 22:11

Copilot started reviewing on behalf of brendancol May 8, 2026 22:12 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

brendancol merged commit 5d06bda into xarray-contrib:main May 9, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize mode-resampling for COG overview generation#1526

Vectorize mode-resampling for COG overview generation#1526
brendancol merged 1 commit intoxarray-contrib:mainfrom
brendancol:perf/mode-overview-vectorize

brendancol commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brendancol commented May 8, 2026

Summary

Implementation

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants