Skip to content

Cross-backend attrs drop: dask and GPU read paths discard pass-through TIFF tag attrs #1548

@brendancol

Description

@brendancol

Bug

open_geotiff returns different attrs keys depending on which backend handles the read. The eager numpy path (xrspatial/geotiff/__init__.py:411-510) populates several pass-through attrs from the file's TIFF tags. The dask path (__init__.py:1333-1342) and the GPU path (__init__.py:1881-1888) only set a small subset.

Keys dropped on dask read:

  • x_resolution, y_resolution, resolution_unit (TIFF resolution tags)
  • extra_tags (raw pass-through tag list)
  • image_description (tag 270)
  • extra_samples (tag 338, alpha indication)

Keys dropped on GPU read:

Reproducer

import numpy as np, xarray as xr
from xrspatial.geotiff import to_geotiff, open_geotiff

arr = np.random.random((128, 128)).astype(np.float32)
da = xr.DataArray(arr, dims=['y', 'x'], attrs={
    'x_resolution': 300.0,
    'y_resolution': 300.0,
    'resolution_unit': 'inch',
})
path = '/tmp/attrs_test.tif'
to_geotiff(da, path, compression='deflate', tiled=True, tile_size=64, nodata=-1.0)

np_da = open_geotiff(path)
dk_da = open_geotiff(path, chunks=64)
gpu_da = open_geotiff(path, gpu=True)

print('numpy attrs   :', sorted(np_da.attrs.keys()))
print('dask_cpu attrs:', sorted(dk_da.attrs.keys()))
print('cupy attrs    :', sorted(gpu_da.attrs.keys()))
print('x_resolution: np=', np_da.attrs.get('x_resolution'),
      'dk=', dk_da.attrs.get('x_resolution'),
      'gpu=', gpu_da.attrs.get('x_resolution'))
print('nodata: np=', np_da.attrs.get('nodata'),
      'dk=', dk_da.attrs.get('nodata'),
      'gpu=', gpu_da.attrs.get('nodata'))

Observed:

numpy attrs   : ['nodata', 'resolution_unit', 'transform', 'x_resolution', 'y_resolution']
dask_cpu attrs: ['nodata', 'transform']
cupy attrs    : ['transform']

x_resolution: np= 300.0 dk= None gpu= None
nodata: np= -1.0 dk= -1.0 gpu= None

Why this matters

to_geotiff reads attrs['extra_tags'] and the friendly resolution accessors when reconstructing the output file. If those attrs are dropped on read, write-then-read-then-write loses metadata. Downstream code that branches on attrs['x_resolution'] or attrs['nodata'] quietly behaves differently depending on which backend is active.

#1542 / PR #1547 fixes the GPU nodata drop. This issue covers the broader attrs class that both the dask and GPU read paths still drop.

Expected

All four backends (numpy, cupy, dask+numpy, dask+cupy) should populate the same attrs keys for the same input file. The eager numpy attrs set is the canonical reference.

Fix sketch

Factor the attrs population out of the eager numpy branch into a helper _populate_attrs_from_geo_info(geo_info, attrs) and call it from all three read paths so they cannot diverge again. Add a 4-backend equivalence test.

Audit pass

Found in geotiff backend parity sweep on 2026-05-09. Reproduced cleanly on this host on commit c41dfa6.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions