Escape XML special characters in write_vrt output (#1607)#1610
Merged
Conversation
write_vrt in _vrt.py emitted the caller-supplied crs_wkt and the source filenames into the VRT XML via plain f-strings. A value containing one of the predefined XML entities (& < > " ') either broke the document or opened the door to element injection: a WKT carrying "</SRS><Metadata><MDI key='AREA_OR_POINT'>Point</MDI></Metadata><SRS>" flipped the parsed VRT's raster_type from "area" to "point" on a round trip. Route every text slot through xml.sax.saxutils.escape / quoteattr via new _xml_text and _xml_attr helpers. Numeric fields (offsets, sizes, pixel scales) stay as int/float literal interpolation because they cannot carry markup. Added test_vrt_xml_escape_1607.py covering: WKT round-trip with each predefined entity; the headline injection case (raster_type stays "area"); ampersand inside a source filename; XML well-formedness of the written bytes.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the GeoTIFF VRT writer (write_vrt) against malformed XML and XML-injection by ensuring caller-controlled strings (CRS WKT and source filenames) are properly XML-escaped before being embedded into the generated VRT, addressing issue #1607.
Changes:
- Added XML escaping/quoting helpers and applied them to all text/attribute interpolation points in
write_vrt. - Added regression tests covering XML special characters, the reported injection payload, filename escaping, and XML well-formedness.
- Updated the security sweep state record to reflect the #1607 finding and fix.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
xrspatial/geotiff/_vrt.py |
Escapes/quotes caller-controlled strings when writing VRT XML to prevent breakage/injection. |
xrspatial/geotiff/tests/test_vrt_xml_escape_1607.py |
Adds regression tests ensuring escaped output round-trips correctly and remains well-formed. |
.claude/sweep-security-state.csv |
Records the #1607 MEDIUM finding and its remediation details. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
write_vrtinxrspatial/geotiff/_vrt.pybuilt its XML output through plain f-strings, so a CRS WKT or source filename carrying any of the predefined XML entities (& < > " ') either broke the document or injected extra elements when the VRT was read back.</SRS><Metadata><MDI key="AREA_OR_POINT">Point</MDI></Metadata><SRS>flipped the parsedraster_typefrom"area"to"point"on a round trip.xml.sax.saxutils.escape/quoteattrvia new_xml_text/_xml_attrhelpers. Numeric fields stay as int/float literal interpolation because they cannot carry markup.Closes #1607.
Test plan
pytest xrspatial/geotiff/tests/test_vrt_xml_escape_1607.py(4 new tests: WKT round-trip with each predefined entity, the headline injection case, source filename with&, well-formedness of the written XML)pytest xrspatial/geotiff/tests/test_vrt_write.py xrspatial/geotiff/tests/test_vrt_band_nodata_1598.py xrspatial/geotiff/tests/test_vrt_int_nodata_1564.py— all 23 existing VRT tests still passFound by the deep-sweep security audit (geotiff module, 2026-05-11).