Skip to content

refactor(storage)!: sharpen Get and List contracts#122

Merged
sbalabanov merged 1 commit into
mainfrom
pointer
Jun 18, 2026
Merged

refactor(storage)!: sharpen Get and List contracts#122
sbalabanov merged 1 commit into
mainfrom
pointer

Conversation

@sbalabanov

Copy link
Copy Markdown
Contributor

Summary

Two contract clarifications on the Storage interface, both addressing abstraction leaks where the documented behavior diverged from what the bundled implementations actually did. The interface is meant to fit cleanly behind backends as different as in-memory maps, local disk, S3, GCS, and remote RPC services — these changes make that easier and remove dead defensive code from callers.

1. Get returns DownloadResponse by value

Before:

Get(ctx context.Context, req DownloadRequest) (*DownloadResponse, error)

After:

// On success the returned DownloadResponse.ReadCloser is non-nil and the
// caller owns closing it. Returns NotFoundError when the blob is not found.
Get(ctx context.Context, req DownloadRequest) (DownloadResponse, error)

The pointer carried no information: absence is signaled by NotFoundError, not by a nil response. Every caller defensively checked resp == nil and resp.ReadCloser == nil — both dead code under any conforming implementation. The tightened contract eliminates four resp == nil checks and three resp.ReadCloser == nil checks across controller/, core/storage/graphreader.go, and core/storage/changedtargetsreader.go.

This also lines up with CLAUDE.md's style guidance: "Value types over pointers — prefer value types for structs, configs, and return values. Use (T, bool) to signal absence instead of *T. Pointers only when mutation or shared ownership is needed."

2. List is now literal-string-prefix, not "directory prefix"

Before:

// List returns the keys of all blobs under the given directory prefix.
List(ctx context.Context, dir string) ([]string, error)

After:

// List returns all keys whose name starts with the given prefix, semantically
// equivalent to filtering the full key namespace by strings.HasPrefix(key, prefix).
//
// Implementations MUST treat prefix as a literal string prefix and MUST NOT
// interpret it as a directory path. Callers control segment boundaries by
// including a trailing "/" in their prefix: List(ctx, "foo") matches both
// "foo/bar" and "foo-bar", while List(ctx, "foo/") matches only the former.
List(ctx context.Context, prefix string) ([]string, error)

The original wording was ambiguous and the two bundled implementations diverged. memstorage did strings.HasPrefix (a pure string match); disk did filepath.Walk on a real subdirectory. The two contracts agreed only when the caller passed a full slash-segmented path ending in / — which the sole production caller (storageCache.FloorKey) happens to do, but a future backend like S3 (whose natural primitive is ListObjects(prefix)) couldn't be plugged in without subtly different behavior.

The new contract picks the primitive that all plausible backends can satisfy without contortion. disk.List is updated to honor it: it walks the longest slash-bounded path-prefix and filters by strings.HasPrefix, preserving cheap walks for the common case while correctly returning sibling keys for partial-segment prefixes.

3. Interface-level doc on Storage

// Storage is an abstract interface for remote data storage.
//
// Keys are opaque strings; the interface has no concept of paths, directories,
// or segments. Any structure (e.g. "/"-delimited paths) is a convention of the
// caller, and implementations MUST NOT impose path semantics of their own.
type Storage interface { ... }

Breaking change

Out-of-tree Storage implementations need to update:

  • Get to return DownloadResponse by value, and return the zero value alongside any error (including NotFoundError).
  • List to do literal-string-prefix matching.

All in-tree implementations (memstorage, disk), callers (controller, orchestrator, core/itg/cache), mocks, and tests are updated in this PR. The only production caller of List (storageCache.FloorKey) already passes a slash-terminated prefix, so its behavior is unchanged.

Test plan

  • go test ./... green for all affected packages
  • make build green
  • make test green — 14/14 Bazel targets pass
  • Added three subtests in core/storage/disk/disk_test.go (TestStorage_List) that lock in literal-prefix behavior and would have failed under the old directory-walk implementation:
    • partial-segment prefix matches sibling keys
    • trailing slash delimits segment
    • top-level partial prefix without slash returns matching keys at any depth

🤖 Generated with Claude Code

@sbalabanov sbalabanov requested review from a team as code owners June 17, 2026 22:43
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment thread core/storage/storage.go
Get(ctx context.Context, req DownloadRequest) (*DownloadResponse, error)
// Get downloads a blob from the storage. On success the returned DownloadResponse.ReadCloser
// is non-nil and the caller owns closing it. Returns NotFoundError when the blob is not found.
Get(ctx context.Context, req DownloadRequest) (DownloadResponse, error)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we still leave this a pointer?

  1. this is a breaking change downstream for uber
  2. *DownloadResponse contains io.ReadCloser that's already a pointer. if return by value, on error it returns DownloadResponse{} with a zero-value ReadCloser==nil. The caller still has to check resp.ReadCloser != nil

@sbalabanov sbalabanov Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to accumulate a tech debt at this phase. It is not in prod yet and we should bring the API in accordance.

The caller still has to check resp.ReadCloser != nil

For internal interfaces the better approach is to use static checks (Go does not have them) or conventions rather than explicit checks which in well designed application are always dead branch code.
https://medium.com/javarevisited/avoid-verbose-null-checks-b3f11afbfcc9

The interface should contract what object is returned in what conditions. Almost always a non-nil error means that the value object is undefined, means the client should perform no checks and do not use it in any capacity.

Two contract clarifications on the Storage interface, both addressing
abstraction leaks where the documented behavior diverged from what
implementations actually did.

Get returns DownloadResponse by value, not by pointer. The pointer
carried no extra information — absence is signaled by NotFoundError, not
by a nil response — yet every caller defensively checked `resp == nil`
and `resp.ReadCloser == nil`, both dead code under any conforming
implementation. The interface now documents that on a nil error the
ReadCloser is non-nil and caller-owned, and the redundant checks are
removed.

List has literal-string-prefix semantics instead of "directory prefix".
The original wording was ambiguous and the two bundled implementations
diverged: memstorage did strings.HasPrefix while disk did filepath.Walk
on a real subdirectory, so the contracts only agreed when the caller
passed a full slash-segmented path ending in "/". The interface now
mandates strings.HasPrefix semantics — the natural primitive for
S3/GCS/in-memory backends — and disk.List walks the longest
slash-bounded prefix and filters by HasPrefix, preserving cheap walks
for the common case while honoring partial-segment prefixes.

Storage gains an interface-level comment stating keys are opaque and
the interface has no concept of paths or directories; any structure is
a caller convention.

Breaking change: out-of-tree implementations of Storage must update Get
to return a value and List to do literal-prefix matching. All in-tree
implementations (memstorage, disk), callers, mocks, and tests are
updated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sbalabanov sbalabanov merged commit 22d4264 into main Jun 18, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants