Content-address the Docker BuildKit cache#28482
Conversation
…isses The core/full/e2e image builds read/write a single mutable registry cache ref (:cache-main), overwritten by every main push. Under heavy concurrent main builds this misses ~70% of the time on no-dep-change runs (the cache key is stable; the shared mutable ref is unreliable), adding 60-140s to the critical path per miss. Key the cache by a hash of the dependency inputs instead: - cache-from: :cache-deps-<hash> (content tag) then :cache-main (fallback) - cache-to (only when pushing to GHCR): - push to main: :cache-deps-<hash> + :cache-main (keep fallback warm) - pull_request: :cache-pr-<n> only (don't pollute the shared content tag) Cache strings are computed in the strategy steps and emitted as outputs rather than nested inline. e2e cache-from stays gated on should-push (its fork path uses the plain docker driver, which can't import registry cache). No non-cache behavior changes; fork/artifact path is unaffected.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #28482 +/- ##
=======================================
Coverage 73.74% 73.75%
=======================================
Files 1541 1541
Lines 132184 132231 +47
Branches 15784 15793 +9
=======================================
+ Hits 97484 97528 +44
+ Misses 33734 33714 -20
- Partials 966 989 +23
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Prototype for PLA-80. Draft — needs a real-CI A/B before merge.
Problem
The core/full/e2e image builds read/write a single mutable registry cache ref
:cache-main, overwritten by everymainpush. Sampling 20 recentmainruns, the install layer misses ~70% of the time on no-dep-change commits (6–7s hit vs 76–139s miss), adding 60–140s to the critical path per miss.The cache key is provably stable —
pnpm deploy/pack.js output is byte-deterministic, and both hits and misses changed 0 cache-key inputs. The misses alternate within the hour, far finer than the dailycleanup-ghcrjob, pointing at the single mutable shared ref being unreliable under Ghost main's frequent, cancel-happy concurrency (likely BuildKitmode=maxre-export thinning the shared tag after a cache hit).Change
Key the cache by a hash of the dependency inputs instead of one shared tag:
:cache-deps-<hash>(content tag, tried first) →:cache-main(fallback)main::cache-deps-<hash>+:cache-main(keeps the fallback warm for first-build-with-new-deps)pull_request::cache-pr-<n>only (PR builds don't pollute the shared content tag)<hash>=hashFiles('pnpm-lock.yaml','Dockerfile.production')(core/full) /+ 'e2e/Dockerfile.e2e'(e2e). Cache strings are computed in thestrategysteps and emitted as outputs rather than nested inline ternaries.Each dependency state gets its own immutable cache tag, so a dep-stable build reliably hits regardless of concurrent main pushes.
Expected impact
~70% misses → consistent hits → ~60–130s off the critical path on most builds (dep-stable is the common case).
Notes / things to verify on a real run
mode=max"thinning on re-export" is unproven. Writing both:cache-deps-<hash>and:cache-mainon main is intentional, but whether the second export thins the first must be confirmed viaBUILDKIT_PROGRESS=plainlogs. Core/full build steps already set plain progress; consider adding it to the e2e build step for the first verification run.cleanup-ghcr.ymlchange. Its age cutoff already prunes oldcache-deps-*tags. Edge case (follow-up): a dependency set stable longer than the cutoff would have its tag pruned and fall back to:cache-main— still correct, just a miss.actionlintclean (only pre-existing style-level SC2086 notes consistent with the rest of the file).