Bounds-check IFD count to prevent DoS on crafted TIFFs#1527
Merged
brendancol merged 2 commits intoxarray-contrib:mainfrom May 9, 2026
Merged
Bounds-check IFD count to prevent DoS on crafted TIFFs#1527brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol merged 2 commits intoxarray-contrib:mainfrom
Conversation
`parse_ifd` previously trusted an entry's `count` field unconditionally
and passed it straight into `struct.unpack_from(f'{bo}{count}{fmt}',
...)`. A crafted TIFF with `count=5e7` on a LONG tag and a value
pointer aimed at a multi-MB padding region forced the parser to
materialize a ~1.4 GB tuple of Python ints before any caller-side
validation ran.
This change adds two guards inside the per-entry loop:
1. Reject `count > MAX_IFD_ENTRY_COUNT` (10_000_000) for tags that are
not pixel-data offset/byte-count arrays. STRIP_OFFSETS,
STRIP_BYTE_COUNTS, TILE_OFFSETS, TILE_BYTE_COUNTS, and COLORMAP are
exempt because their counts legitimately scale with image size.
2. For non-inline values, verify that
`value_offset + count * type_size <= len(data)` before calling
`_read_value`.
Both checks raise `ValueError` before any allocation. Threat model is
untrusted TIFF input (web download, fsspec source, third-party catalog,
user upload).
Tests added in test_parse_ifd_bounds.py:
- count over the cap on a non-pixel tag is rejected
- value range past EOF is rejected
- pixel-array tag with a huge count is allowed
- legal small IFDs still parse
- TileByteCounts SHORT path exercised
5 tasks
There was a problem hiding this comment.
Pull request overview
This PR hardens the GeoTIFF header/IFD parser against memory-DoS attacks by validating IFD entry count values and referenced value ranges before decoding tag payloads.
Changes:
- Add
MAX_IFD_ENTRY_COUNTand enforce a hard cap for non–pixel-array tags inparse_ifd. - Add bounds checking for non-inline IFD value pointers to ensure
value_offset + count * type_sizestays withinlen(data). - Add a new test module covering count cap enforcement, EOF bounds checks, and pixel-tag exemptions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_header.py |
Adds count hard-cap for non-pixel tags and value-range bounds checks before _read_value / struct.unpack_from. |
xrspatial/geotiff/tests/test_parse_ifd_bounds.py |
Introduces tests for the new guards and intended exemptions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Tighten the IFD-entry caps and clean up test scaffolding: - Lower MAX_IFD_ENTRY_COUNT 10_000_000 -> 100_000. Even at the lower bound a malformed entry can still force ~3 MB of Python tuple allocation for LONG values; 100K is well above any legitimate metadata count and bounds the worst case to ~30 MB. - Add MAX_IFD_ENTRY_BYTES = 256 KiB cap on `count * type_size` for non-pixel tags. Catches large-itemsize abuse (e.g. DOUBLE arrays) where the count alone passes but the bytes are excessive. Default caps are now independently useful: count cap stops integer-array attacks, byte cap stops large-element-size attacks. - Remove the dead `ptr < 0` branch on the value-range check; ptr is unpacked via unsigned `I`/`Q` so it can never be negative. - Reformat _build_tiff signature in the test to one-arg-per-line so the closing paren lines up cleanly (prior layout tripped E128/E124). - Replace test_huge_pixel_array_count_allowed (which allocated 40 MB and forced a 10M-int tuple in CI) with monkeypatch-based tests that exercise the cap logic with tiny payloads. Add a second test that proves the byte cap fires at default cap values, so the production thresholds stay covered.
brendancol
added a commit
to brendancol/xarray-spatial
that referenced
this pull request
May 9, 2026
`parse_all_ifds` walks the IFD chain via `next_ifd_offset` and deduplicates with a `seen` set of offsets. That catches cycles, but not long acyclic chains: a crafted BigTIFF can chain millions of distinct IFD offsets that each point at small valid IFDs scattered through a sparse multi-GB file, forcing O(N) memory in attacker- controlled N. This change adds a `MAX_IFDS = 256` ceiling and raises `ValueError` once the chain hits it. 256 is generous: real-world COGs carry the full-resolution IFD plus a handful of overview levels and (optionally) per-band masks, so they sit well under 64 even for deep pyramids. The cycle-detection `seen` set is preserved untouched. Threat model is untrusted TIFF input (web download, fsspec source, third-party catalog, user upload). Counterpart to S2 (xarray-contrib#1527), which bounded per-IFD entry counts. Tests in test_ifd_chain_cap.py: - chain past the cap is rejected with a clear `ValueError` - chain at MAX_IFDS - 1 still parses - chain at MAX_IFDS hits the cap (off-by-one boundary) - error message names MAX_IFDS and the threat - big-endian chains hit the same cap - a real COG with overview levels parses unaffected
brendancol
added a commit
that referenced
this pull request
May 9, 2026
* Cap IFD chain length in parse_all_ifds to prevent DoS `parse_all_ifds` walks the IFD chain via `next_ifd_offset` and deduplicates with a `seen` set of offsets. That catches cycles, but not long acyclic chains: a crafted BigTIFF can chain millions of distinct IFD offsets that each point at small valid IFDs scattered through a sparse multi-GB file, forcing O(N) memory in attacker- controlled N. This change adds a `MAX_IFDS = 256` ceiling and raises `ValueError` once the chain hits it. 256 is generous: real-world COGs carry the full-resolution IFD plus a handful of overview levels and (optionally) per-band masks, so they sit well under 64 even for deep pyramids. The cycle-detection `seen` set is preserved untouched. Threat model is untrusted TIFF input (web download, fsspec source, third-party catalog, user upload). Counterpart to S2 (#1527), which bounded per-IFD entry counts. Tests in test_ifd_chain_cap.py: - chain past the cap is rejected with a clear `ValueError` - chain at MAX_IFDS - 1 still parses - chain at MAX_IFDS hits the cap (off-by-one boundary) - error message names MAX_IFDS and the threat - big-endian chains hit the same cap - a real COG with overview levels parses unaffected * Address Copilot review on #1530 - Remove unused TAG_IMAGE_LENGTH import (F401). - Fix test_chain_at_boundary_passes docstring: the summary said "Exactly MAX_IFDS is allowed" but the body and code show MAX_IFDS - 1 is the largest accepted chain. Reword the summary to match. * Harmonize MAX_IFDS check with MAX_IFD_ENTRY_COUNT semantics Switch parse_all_ifds from `len(ifds) >= MAX_IFDS` to `len(ifds) > MAX_IFDS` so the cap matches the `> MAX_IFD_ENTRY_COUNT` convention used elsewhere in this module: "MAX = N" reads as "up to and including N is allowed". Update the boundary test to assert MAX_IFDS parses cleanly and MAX_IFDS + 1 raises.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses security audit finding S2:
parse_ifdinxrspatial/geotiff/_header.pypreviously trusted an entry'scountfield unconditionally and passed it straight intostruct.unpack_from(f'{bo}{count}{fmt}', ...). A crafted multi-MB TIFF withcount=5e7on a LONG tag and a value pointer aimed at a region of valid file bytes forced the parser to materialize a ~1.4 GB tuple of Python ints before any caller-side validation ran.Threat model
Untrusted TIFF input: web download, fsspec source, third-party catalog, user upload. An attacker only needs to ship a file large enough to cover the supposed value range (
count * type_sizebytes) to weaponize thecountfield.Fix
Two guards added inside the per-entry loop in
parse_ifd:MAX_IFD_ENTRY_COUNT = 10_000_000oncountfor tags that are not pixel-data offset/byte-count arrays.STRIP_OFFSETS,STRIP_BYTE_COUNTS,TILE_OFFSETS,TILE_BYTE_COUNTS, andCOLORMAPare exempt because their counts legitimately scale with image dimensions (a real high-resolution tiled COG can have millions of tiles).value_offset + count * type_size <= len(data)before calling_read_value.Both checks raise
ValueErrorbefore any large allocation runs.Test plan
pytest xrspatial/geotiff/tests/test_parse_ifd_bounds.py -x -q(5 new tests)pytest xrspatial/geotiff/tests/test_header.py xrspatial/geotiff/tests/test_reader.py -x -q(34 passed)test_features.pyunrelated to this changecount=5e7LONG onImageWidthagainst a 250 MB padded file) now raisesValueErrorwith no large allocation