Binary cache improvements#523
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR documents and persists ChangesBatch narinfo and cache-info flow
Transfer activity text
Sequence Diagram(s)sequenceDiagram
participant Store
participant HttpBinaryCacheStore
participant CmdServe
participant NarInfoDiskCacheImpl
Store->>HttpBinaryCacheStore: queryPathInfos(paths, callback)
HttpBinaryCacheStore->>CmdServe: POST /get-narinfos-v1 with hashPart() lines
CmdServe-->>HttpBinaryCacheStore: concatenated narinfo blocks
HttpBinaryCacheStore->>NarInfoDiskCacheImpl: upsert narinfo fields
HttpBinaryCacheStore-->>Store: callback(results)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (1)
src/libstore/remote-store.cc (1)
229-229: 🗄️ Data Integrity & Integration | 🔵 TrivialTrack the remaining
partialClosureprotocol gap.
RemoteStore::querySubstitutablePathInfos()still dropsSubstitutablePathInfo::partialClosure; if this API remains supported, this needs a worker-protocol extension or an explicit compatibility note.I can help draft the protocol-compatible implementation or open a follow-up issue.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/libstore/remote-store.cc` at line 229, RemoteStore::querySubstitutablePathInfos() still ignores SubstitutablePathInfo::partialClosure, so update the remote-store worker protocol or add an explicit compatibility note to account for it. Use the existing querySubstitutablePathInfos() flow and the SubstitutablePathInfo/partialClosure handling point to either transmit this field end-to-end or clearly document that it is unsupported for now, removing the FIXME once the chosen behavior is implemented.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@doc/manual/source/protocols/nix-cache-info.md`:
- Around line 64-68: Add a language tag to the example fence in the
nix-cache-info documentation so the Markdown lint rule is satisfied; update the
fenced block that contains the StoreDir/WantMassQuery/Priority/GetNarInfosV1
example to use a text code fence, keeping the example content unchanged.
In `@src/libstore/include/nix/store/nar-info-disk-cache.hh`:
- Around line 31-41: `CacheInfo` currently drops the validated `StoreDir`, which
prevents `HttpBinaryCacheStore::init()` from checking cached metadata against
the active store root before using `WantMassQuery` and `GetNarInfosV1`. Update
`CacheInfo` and the related cache-hit path so the parsed `nix-cache-info`
retains `storeDir` alongside `fields`, then have `HttpBinaryCacheStore::init()`
compare it with the current store directory and refetch or invalidate the cache
when they differ; keep `BinaryCacheStore::parseNixCacheInfo()` as the source of
validation.
In `@src/libstore/include/nix/store/nar-info.hh`:
- Around line 20-29: The new partialClosure field is only present in the narinfo
struct, but the JSON serialization path still drops it. Update
UnkeyedNarInfo::toJSON() and UnkeyedNarInfo::fromJSON() to include
partialClosure in the JSON round-trip, matching the existing handling for the
other narinfo fields so the hint survives serialization and deserialization.
In `@src/libstore/misc.cc`:
- Around line 302-306: The opaque path handling in the `DerivedPath::Opaque`
visitor and the related `pathsToQuery`/`unknown` logic should respect
`useSubstitutes` and the availability of `subs` before queueing anything. Update
the code in `maybeQueryPathInfo`/`pathsToQuery` processing so missing opaque
paths are only queued when substitutes are enabled and there is at least one
substituter to query, otherwise ensure they are treated as `unknown` immediately
and never fetched from substituters.
- Around line 319-323: The concurrent substituter query in
queryMissing()/queryPathInfos handling currently lets a single
sub->queryPathInfos() failure abort the whole lookup, which breaks the old
fallback behavior. Update the forEachAsync loop around the Store substituter
queries to handle failures per substituter: keep disabled substituters skipped
as before, catch/report errors from each sub->queryPathInfos() call, and
continue querying the remaining stores so tryFallback semantics are preserved.
Use the existing queryMissing, forEachAsync, and sub->queryPathInfos symbols to
locate the fix.
- Around line 328-338: The aggregation in misc.cc is double-counting substitute
results when multiple substituters return the same path. Update the loop over
infos in the substitution handling path so narSize and downloadSize are only
added once per unique path, matching the deduplication already done by
willSubstitute; use the existing path key in the infos iteration (and the
NarInfo cast) to track which paths have already been counted before adding
sizes.
In `@src/nix/main.cc`:
- Around line 495-496: The shutdown stats logging in Finally should not call
getFileTransfer(), because that helper can lazily create a new curlFileTransfer
during teardown and report the wrong stats. Update the Finally lambda near
printStats/debug to read the request count from the existing singleton instance
only, or guard against a missing instance without allocating a new one, using
getFileTransfer and curlFileTransfer as the relevant symbols to adjust.
In `@src/nix/serve.cc`:
- Around line 30-41: makeNarInfo currently lets computeFSClosure() failures
abort narinfo generation, turning an optional partialClosure hint into a hard
error. Update makeNarInfo in serve.cc so closure computation is best-effort:
catch or otherwise handle failures from store.computeFSClosure(info.path,
closure), leave ni.partialClosure empty when it cannot be computed, and still
return the rest of the NarInfo response for /foo.narinfo and /get-narinfos-v1.
- Around line 317-324: The POST handling in RequestState::body for the
/get-narinfos-v1 path currently buffers the entire upload without any limit, so
add an early size cap before appending upload_data. Update the request-handling
logic around the RequestState allocation and the upload_data_size check to
reject oversized bodies as soon as they exceed the allowed maximum, and return
an error response instead of continuing to buffer in memory.
---
Nitpick comments:
In `@src/libstore/remote-store.cc`:
- Line 229: RemoteStore::querySubstitutablePathInfos() still ignores
SubstitutablePathInfo::partialClosure, so update the remote-store worker
protocol or add an explicit compatibility note to account for it. Use the
existing querySubstitutablePathInfos() flow and the
SubstitutablePathInfo/partialClosure handling point to either transmit this
field end-to-end or clearly document that it is unsupported for now, removing
the FIXME once the chosen behavior is implemented.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a18fca0d-0e40-467a-84fd-9a40a835375f
📒 Files selected for processing (20)
doc/manual/source/protocols/nix-cache-info.mdsrc/libstore-tests/nar-info-disk-cache.ccsrc/libstore/binary-cache-store.ccsrc/libstore/filetransfer.ccsrc/libstore/http-binary-cache-store.ccsrc/libstore/include/nix/store/binary-cache-store.hhsrc/libstore/include/nix/store/filetransfer.hhsrc/libstore/include/nix/store/http-binary-cache-store.hhsrc/libstore/include/nix/store/nar-info-disk-cache.hhsrc/libstore/include/nix/store/nar-info.hhsrc/libstore/include/nix/store/path-info.hhsrc/libstore/include/nix/store/store-api.hhsrc/libstore/misc.ccsrc/libstore/nar-info-disk-cache.ccsrc/libstore/nar-info.ccsrc/libstore/remote-store.ccsrc/libstore/store-api.ccsrc/nix/main.ccsrc/nix/serve.cctests/functional/binary-cache.sh
💤 Files with no reviewable changes (1)
- tests/functional/binary-cache.sh
This allows the client to start prefetching .narinfo files faster.
This allows servers to advertise optional endpoints. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…in one HTTP call This adds an optional endpoint /get-narinfos-v1 to binary caches that allows querying multiple narinfos in one HTTP call, which may reduce latency and overhead for non-static caches like cache.flakehub.com. To expose this, Store now has a virtual method queryPathInfos() that HttpBinaryCacheStore overrides. queryMissing() has been rewritten to try to batch paths together. Note: queryMissing() now tries all substituters for a store path concurrently rather than sequentially, unlike querySubstitutablePathInfosAsync(). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Instead of a typed "Features" set, store all non-standard nix-cache-info fields (everything other than StoreDir/WantMassQuery/Priority) verbatim in a std::map<std::string, std::string> on CacheInfo and BinaryCacheStore. This keeps the format forward-compatible: a newer server can advertise a capability that an older client persists in its disk cache without understanding it (e.g. a future BloomFilter field). It also makes the endpoint for fetching multiple narinfos server-configurable: the server advertises "GetNarInfosV1: /get-narinfos-v1", and the client POSTs to whatever path that field specifies, falling back to per-path .narinfo fetches when it is absent. The disk cache stores the field map as JSON in the BinaryCaches table. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
08da7ee to
9934bb0
Compare
We don't use querySubstitutablePathInfos() anymore.
The /get-narinfos-v1 endpoint now returns a stream of newline-delimited JSON objects (one store path's metadata per line) rather than concatenated .narinfo text. Since this is a new interface, only the V2 JSON format is supported. The JSON serialisation of NarInfo gains a "partialClosure" field, and a keyed NarInfo::toJSON()/fromJSON() that includes the store path (as a "path" field) so each object is self-describing. The single /<hash>.narinfo endpoint still serves the legacy text format. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Motivation
This makes two improvements to the binary cache protocol:
PartialClosureto .narinfo files, which lets a store path declare not just its direct references but also some or all of its indirect references. This letsqueryMissing()traverse the closure graph faster, especially for high-latency binary caches./get-narinfos-v1that lets a binary cache query many store paths in one HTTP call. This is intended for dynamic binary caches (like FlakeHub/Attic) where there may be some non-trivial overhead to each HTTP call (as opposed to HTTP/2 calls to a static CDN like cache.nixos.org).Some timings (against a
nix-serveon localhost with a 100ms latency), showing wall clock time and number of HTTP requests:Context
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.
Summary by CodeRabbit
New Features
Bug Fixes