Skip to content

Escape XML special characters in write_vrt output (#1607)#1610

Merged
brendancol merged 1 commit into
mainfrom
deep-sweep-security-geotiff-2026-05-11-b-01
May 11, 2026
Merged

Escape XML special characters in write_vrt output (#1607)#1610
brendancol merged 1 commit into
mainfrom
deep-sweep-security-geotiff-2026-05-11-b-01

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • write_vrt in xrspatial/geotiff/_vrt.py built its XML output through plain f-strings, so a CRS WKT or source filename carrying any of the predefined XML entities (& < > " ') either broke the document or injected extra elements when the VRT was read back.
  • Verified injection: a WKT of </SRS><Metadata><MDI key="AREA_OR_POINT">Point</MDI></Metadata><SRS> flipped the parsed raster_type from "area" to "point" on a round trip.
  • Fix routes every text slot through xml.sax.saxutils.escape / quoteattr via new _xml_text / _xml_attr helpers. Numeric fields stay as int/float literal interpolation because they cannot carry markup.

Closes #1607.

Test plan

  • pytest xrspatial/geotiff/tests/test_vrt_xml_escape_1607.py (4 new tests: WKT round-trip with each predefined entity, the headline injection case, source filename with &, well-formedness of the written XML)
  • pytest xrspatial/geotiff/tests/test_vrt_write.py xrspatial/geotiff/tests/test_vrt_band_nodata_1598.py xrspatial/geotiff/tests/test_vrt_int_nodata_1564.py — all 23 existing VRT tests still pass

Found by the deep-sweep security audit (geotiff module, 2026-05-11).

write_vrt in _vrt.py emitted the caller-supplied crs_wkt and the source
filenames into the VRT XML via plain f-strings. A value containing one of
the predefined XML entities (& < > " ') either broke the document or
opened the door to element injection: a WKT carrying
"</SRS><Metadata><MDI key='AREA_OR_POINT'>Point</MDI></Metadata><SRS>"
flipped the parsed VRT's raster_type from "area" to "point" on a round
trip.

Route every text slot through xml.sax.saxutils.escape / quoteattr via
new _xml_text and _xml_attr helpers. Numeric fields (offsets, sizes,
pixel scales) stay as int/float literal interpolation because they
cannot carry markup.

Added test_vrt_xml_escape_1607.py covering: WKT round-trip with each
predefined entity; the headline injection case (raster_type stays
"area"); ampersand inside a source filename; XML well-formedness of
the written bytes.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 18:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the GeoTIFF VRT writer (write_vrt) against malformed XML and XML-injection by ensuring caller-controlled strings (CRS WKT and source filenames) are properly XML-escaped before being embedded into the generated VRT, addressing issue #1607.

Changes:

  • Added XML escaping/quoting helpers and applied them to all text/attribute interpolation points in write_vrt.
  • Added regression tests covering XML special characters, the reported injection payload, filename escaping, and XML well-formedness.
  • Updated the security sweep state record to reflect the #1607 finding and fix.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
xrspatial/geotiff/_vrt.py Escapes/quotes caller-controlled strings when writing VRT XML to prevent breakage/injection.
xrspatial/geotiff/tests/test_vrt_xml_escape_1607.py Adds regression tests ensuring escaped output round-trips correctly and remains well-formed.
.claude/sweep-security-state.csv Records the #1607 MEDIUM finding and its remediation details.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@brendancol brendancol merged commit 0682de9 into main May 11, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

write_vrt: XML special characters in crs_wkt and source filenames are emitted unescaped

2 participants