Skip to content

fix: implement soft-delete for ZipStore (issue #828)#4107

Open
mohammadZuherJaserAsad wants to merge 3 commits into
zarr-developers:mainfrom
mohammadZuherJaserAsad:fix-issue-828-1
Open

fix: implement soft-delete for ZipStore (issue #828)#4107
mohammadZuherJaserAsad wants to merge 3 commits into
zarr-developers:mainfrom
mohammadZuherJaserAsad:fix-issue-828-1

Conversation

@mohammadZuherJaserAsad

Copy link
Copy Markdown

Summary

Fixes #828 — implements soft-delete for ZipStore so that delete() and delete_dir() no longer raise NotImplementedError.

The ZIP format has no native entry-removal API (a known CPython limitation: python/cpython#51067). This PR uses the soft-delete approach suggested by @dcherian in the issue: overwrite the entry with an empty byte sentinel (b"") and filter it out in all read, exists, and list paths.

Changes

src/zarr/storage/_zip.py

  • supports_deletes = True (was False)
  • Added module-level _SOFT_DELETE_SENTINEL = b"" constant
  • delete(key): writes sentinel to zip entry; no-op if key is absent
  • delete_dir(prefix): collects live keys under prefix, calls delete() on each
  • _get(): reads all bytes first; returns None if content equals sentinel
  • exists(): returns False for entries whose content equals sentinel
  • list(): deduplicates entries (duplicate names can appear after overwrites) and skips sentinel entries
  • list_dir(): consumes filtered list() output for consistency

tests/test_store/test_zip.py

  • 7 new targeted tests covering: basic delete, no-op cases, sibling-key safety, double-delete, delete_dir, and list() filtering
  • test_api_integration updated: removed two pytest.raises(NotImplementedError) blocks that are no longer correct

tests/test_store/test_stateful.py

  • Removed two pytest.skip(reason="ZipStore does not support delete") blocks

Design notes

  • Why soft-delete? The only alternative that preserves persistence is rebuilding the entire zip on every delete, which is O(n) in archive size. The maintainer's own suggestion in the issue was the sentinel approach; prior PRs zipstore delitems via override #1184 and Implement soft delete for zip store #2838 validated it.
  • Duplicate zip entries: When an entry is overwritten (including soft-deleted), namelist() returns the name twice. NameToInfo tracks the last-written entry, so reading always returns the sentinel. list() uses a seen set to yield each name at most once.
  • Lock re-entrancy: delete_dir() holds the RLock while calling list_prefix() and then delete() (which also acquires it). This works because threading.RLock is re-entrant for the same thread.
  • set_if_not_exists edge case: After a soft-delete the name is still in namelist(), so set_if_not_exists will not re-write it. This matches the documented semantics and existing zarr usage patterns.

Testing

pytest tests/test_store/test_zip.py -v
pytest tests/test_store/test_stateful.py -v -k "not slow_hypothesis"

@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jun 27, 2026
@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 45.00000% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.12%. Comparing base (b9d3964) to head (fbe80ce).
⚠️ Report is 41 commits behind head on main.

Files with missing lines Patch % Lines
src/zarr/storage/_zip.py 45.00% 22 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4107      +/-   ##
==========================================
- Coverage   93.53%   85.12%   -8.41%     
==========================================
  Files          88       90       +2     
  Lines       11894    15090    +3196     
==========================================
+ Hits        11125    12846    +1721     
- Misses        769     2244    +1475     
Files with missing lines Coverage Δ
src/zarr/storage/_zip.py 82.94% <45.00%> (-14.74%) ⬇️

... and 41 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b

d-v-b commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

@mkitti

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implementing ZipStore's __delitem__ via overwrite

2 participants