Skip to content

Run float-predictor encode through an ngjit row loop#1556

Merged
brendancol merged 2 commits intomainfrom
worktree-agent-abb34600a9c01560d
May 11, 2026
Merged

Run float-predictor encode through an ngjit row loop#1556
brendancol merged 2 commits intomainfrom
worktree-agent-abb34600a9c01560d

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Closes #1554.

The predictor=3 (floating-point) encode path in _compression.py was calling its per-row kernel from a Python for loop, paying numba dispatch overhead on every row. The decode path was already moved into a single @ngjit wrapper (_fp_predictor_decode_rows); this does the same on the encode side by adding _fp_predictor_encode_rows. The per-row kernel is left in place so the change is additive.

Numbers

1024x1024 float32 deflate write (single thread, this machine):

predictor before after
1 (none) 16 ms 16 ms
2 (horizontal) 29 ms 29 ms
3 (float) 50 ms 14 ms

Predictor=3 was 2.5x slower than predictor=2. It is now a hair faster.

Verification

  • 1024x1024 float32 deflate+predictor=3 write produces SHA256 e573bccaa356d6cf7355976680c86387cd7c277ce6329c616273e3c4cbf5388f both before and after the change. Output bytes are identical for the same input.
  • pytest xrspatial/geotiff/tests/ -k "predictor or fp_predictor or float" runs 151 passed, 1 skipped (the new timing test).
  • pytest xrspatial/geotiff/tests/ runs 933 passed, 7 skipped (1 new), 7 deselected. The 7 deselected are TestPalette matplotlib deepcopy failures unrelated to this change.

Test plan

  • Confirm output bytes match pre-fix encoder for 1024x1024 float32 deflate+predictor=3
  • Run the targeted predictor/float test selector
  • Run the full geotiff test suite
  • Add a value-exact round-trip test for the 1024x1024 case
  • Add a skipped 2x timing regression test (run manually)

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 10, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 03:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the GeoTIFF floating-point predictor (TIFF predictor=3) encode path by moving the per-row dispatch loop into a single Numba-compiled wrapper, matching the already-optimized decode path. It also adds regression tests to ensure the refactor remains lossless and to provide an opt-in timing guard.

Changes:

  • Add @ngjit _fp_predictor_encode_rows to run the predictor=3 row loop inside Numba and call it from fp_predictor_encode.
  • Add a large (1024x1024) value-exact round-trip test for predictor=3 with deflate compression.
  • Add a skipped timing regression test intended for manual performance checks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/_compression.py Moves predictor=3 encode row dispatch into an @ngjit wrapper to remove Python per-row overhead.
xrspatial/geotiff/tests/test_predictor_fp_write_1313.py Adds a large lossless round-trip test and a manual timing regression test for predictor=3 encoding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/tests/test_predictor_fp_write_1313.py
Comment thread xrspatial/geotiff/tests/test_predictor_fp_write_1313.py Outdated
- The "bit-for-bit equal" claim now actually checks bits: assert dtype
  matches and compare ``arr.tobytes() == out_arr.tobytes()`` so the
  test catches signed-zero drift, NaN payload changes, and any other
  bit-level divergence that ``np.testing.assert_array_equal`` would
  silently mask.
- Convert the timing test from an unconditional ``@pytest.mark.skip``
  to the existing ``XRSPATIAL_RUN_PERF_TESTS=1`` env-var gate used by
  ``test_streaming_write_parallel.test_streaming_write_perf_sanity``.
  Developers can now run it with one env var instead of editing the
  decorator.
@brendancol brendancol merged commit 39322c3 into main May 11, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Float-predictor encode runs row loop in Python, costing ~30 ms per 1024x1024 float32 write

2 participants