Skip to content

feat: add SOCI indexing plugin and bakery ci publish orchestrator#555

Open
ianpittwood wants to merge 55 commits into
mainfrom
feature/soci-images
Open

feat: add SOCI indexing plugin and bakery ci publish orchestrator#555
ianpittwood wants to merge 55 commits into
mainfrom
feature/soci-images

Conversation

@ianpittwood

@ianpittwood ianpittwood commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds a new soci builtin plugin that wraps soci convert / soci push and an in-module ctr image pull helper to materialize images in containerd. The plugin probes containerd namespaces (defaultmoby) and supports both non-standalone (default) and --standalone modes via SociOptions in bakery.yaml.
  • Standalone mode now performs a real OCI-layout round-trip. soci convert --standalone only accepts a local OCI image layout (tar/directory) for source and destination — it never touches a registry. SociConvertWorkflow._run_standalone bridges the registry on both ends with oras and needs no containerd: oras cp --to-oci-layout (registry → layout) → soci convert --standalone --format oci-dir → push by digest with oras cp --from-oci-layout (the converted layout is written untagged) → scratch cleanup. OrasCopy gained --from-oci-layout / --to-oci-layout flags to support this.
  • Splits OrasMergeWorkflow into three composable primitives (OrasIndexCreateWorkflow, OrasIndexCopyWorkflow, OrasIndexCleanupWorkflow) and rebuilds the existing merge workflow on top of them to preserve back-compat with bakery oras merge.
  • Adds bakery ci publish, which composes oras index-create → optional SOCI convert (gated by --enable-soci) → oras index-copy → cleanup. bakery ci merge is preserved as a thin alias that delegates with --no-enable-soci.
  • Adds a setup-soci composite action and wires an enable-soci workflow input + Publish step into bakery-build-native.yml. bakery-build.yml (QEMU) is explicitly deferred with an in-file note since it has no merge phase to interleave SOCI into.

The implementation follows the spec at docs/superpowers/specs/2026-05-18-soci-indexing-design.md (local-only) and the task-by-task plan at docs/superpowers/plans/2026-05-18-soci-indexing.md (local-only).

Test plan

  • Full bakery test suite passes (no regressions in oras / soci / dgoss / ci suites)
  • `bakery soci --help` and `bakery soci convert --help` show the expected flags
  • `bakery ci --help` lists both `merge` (back-compat alias) and `publish` (new orchestrator)
  • Pre-commit hooks clean across all changed files
  • Standalone mode verified end-to-end: ran the real `SociConvertWorkflow._run_standalone` against a throwaway local registry — oras pulls the source into an OCI layout, `soci convert --standalone` produces a zTOC, and the `-soci` index is pushed back. Confirmed via `oras manifest fetch` that the published index contains the SOCI artifact (`artifactType` includes `soci`).
  • Manual: build a small image, run `bakery ci publish --enable-soci --dry-run` and confirm the soci convert / push commands are produced as expected
  • Manual: full run against a single image (e.g. `connect-content-init`) with `enable-soci: true` in a sibling repo's workflow — verify the published manifest via `oras manifest fetch` and confirm SOCI metadata is present
  • Manual: confirm layer-digest stability across `soci convert` (spec verification item Implement a configuration standard for Posit image builds #3)
  • Per-product opt-in (e.g. flipping `tool: soci, enabled: true` on a Connect image and `enable-soci: true` in `images-connect`'s production workflow) lands as a follow-up PR in that repo, not here.

🤖 Generated with Claude Code

@ianpittwood ianpittwood force-pushed the feature/soci-images branch from db0ba57 to e779f7d Compare May 28, 2026 22:52
@github-actions

github-actions Bot commented May 28, 2026

Copy link
Copy Markdown

Test Results

1 656 tests  +52   1 656 ✅ +52   9m 39s ⏱️ + 1m 22s
    1 suites ± 0       0 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 91a77d5. ± Comparison against base commit 99222b6.

♻️ This comment has been updated with latest results.

ianpittwood added a commit that referenced this pull request May 29, 2026
The CI runner reports a fixed terminal width of 80 columns and emits
rich-styled help output, which causes typer/rich to wrap long option
names like \`--enable-soci/--no-enable-soci\` across rows with
embedded ANSI escapes. That defeats substring assertions like
\`"--enable-soci" in result.stdout\` even though the option is present.

Pass \`COLUMNS=200\`, \`TERM=dumb\`, and \`NO_COLOR=1\` via the
CliRunner's \`env\` argument so the rendered help is wide enough to
keep flag names on a single line and unstyled enough that no escape
codes get interleaved with the option text. Applied to all four help-
output assertions in \`test_ci_publish.py\` and \`test_cli.py\`.

Fixes the \`test_publish_command_flags_present\` failure on PR #555 CI.
ianpittwood added a commit that referenced this pull request Jun 1, 2026
The CI runner reports a fixed terminal width of 80 columns and emits
rich-styled help output, which causes typer/rich to wrap long option
names like \`--enable-soci/--no-enable-soci\` across rows with
embedded ANSI escapes. That defeats substring assertions like
\`"--enable-soci" in result.stdout\` even though the option is present.

Pass \`COLUMNS=200\`, \`TERM=dumb\`, and \`NO_COLOR=1\` via the
CliRunner's \`env\` argument so the rendered help is wide enough to
keep flag names on a single line and unstyled enough that no escape
codes get interleaved with the option text. Applied to all four help-
output assertions in \`test_ci_publish.py\` and \`test_cli.py\`.

Fixes the \`test_publish_command_flags_present\` failure on PR #555 CI.
@ianpittwood ianpittwood force-pushed the feature/soci-images branch from 87e9953 to d35d4be Compare June 1, 2026 15:54
ianpittwood added a commit that referenced this pull request Jun 1, 2026
The CI runner reports a fixed terminal width of 80 columns and emits
rich-styled help output, which causes typer/rich to wrap long option
names like \`--enable-soci/--no-enable-soci\` across rows with
embedded ANSI escapes. That defeats substring assertions like
\`"--enable-soci" in result.stdout\` even though the option is present.

Pass \`COLUMNS=200\`, \`TERM=dumb\`, and \`NO_COLOR=1\` via the
CliRunner's \`env\` argument so the rendered help is wide enough to
keep flag names on a single line and unstyled enough that no escape
codes get interleaved with the option text. Applied to all four help-
output assertions in \`test_ci_publish.py\` and \`test_cli.py\`.

Fixes the \`test_publish_command_flags_present\` failure on PR #555 CI.
@ianpittwood ianpittwood force-pushed the feature/soci-images branch from d35d4be to e411f75 Compare June 1, 2026 16:13
@ianpittwood ianpittwood marked this pull request as ready for review June 1, 2026 16:54
@ianpittwood ianpittwood requested a review from bschwedler as a code owner June 1, 2026 16:54
@ianpittwood

Copy link
Copy Markdown
Contributor Author

Commit 939587f MUST be dropped before merge.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a SOCI indexing plugin and a new bakery ci publish orchestrator that composes oras index-create → optional SOCI convert → oras index-copy. Splits the existing OrasMergeWorkflow into composable primitives, adds a setup-soci composite GitHub Action, and wires an enable-soci input through bakery-build-native.yml. The legacy bakery ci merge is preserved as a thin alias delegating to publish with SOCI disabled.

Changes:

  • New soci builtin plugin (SociConvert/SociPush/ContainerdImagePull/SociConvertWorkflow) with namespace probing and standalone mode.
  • Refactor OrasMergeWorkflow into OrasIndexCreateWorkflow + OrasIndexCopyWorkflow primitives; add bakery ci publish that orchestrates create → soci → copy.
  • Add setup-soci composite action and an enable-soci input to bakery-build-native.yml; update consumers to call bakery ci publish with the SOCI flag.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
setup-soci/action.yml New composite action installing soci CLI and exposing the containerd socket.
.github/workflows/bakery-build-native.yml Adds enable-soci input, Setup SOCI step, and switches merge step to bakery ci publish.
.github/workflows/bakery-build.yml Adds an explanatory note that SOCI is intentionally not wired into the QEMU workflow.
posit-bakery/pyproject.toml Registers SociPlugin as a builtin entry point.
posit-bakery/posit_bakery/plugins/builtin/soci/init.py New SociPlugin with execute/results and the bakery soci convert CLI.
posit-bakery/posit_bakery/plugins/builtin/soci/soci.py SOCI command wrappers and the convert workflow with namespace probing.
posit-bakery/posit_bakery/plugins/builtin/soci/options.py SociOptions config model with merge semantics.
posit-bakery/posit_bakery/plugins/builtin/oras/oras.py Splits merge workflow into OrasIndexCreateWorkflow and OrasIndexCopyWorkflow; reuses them from OrasMergeWorkflow.
posit-bakery/posit_bakery/cli/ci.py Adds bakery ci publish; converts bakery ci merge into an alias delegating with --no-enable-soci.
posit-bakery/test/... New unit tests for soci plugin/options/commands/workflow, oras index primitives, ci publish/merge alias; updates to existing ci merge tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/bakery-build-native.yml
Comment thread .github/workflows/bakery-build.yml
Comment thread posit-bakery/posit_bakery/cli/ci.py
Comment thread posit-bakery/posit_bakery/cli/ci.py Outdated
Comment thread posit-bakery/posit_bakery/cli/ci.py Outdated
@bschwedler bschwedler requested review from jforest and jspiewak June 3, 2026 13:27
Comment thread .github/workflows/bakery-build-native.yml Outdated
@jspiewak

jspiewak commented Jun 3, 2026

Copy link
Copy Markdown

I suggest updating to SOCI 0.13.0 and using the new --standalone to avoid needing to rely upon containerd.
See https://github.com/rstudio/aws-main-services/pull/1932

@ianpittwood

Copy link
Copy Markdown
Contributor Author

I suggest updating to SOCI 0.13.0 and using the new --standalone to avoid needing to rely upon containerd. See rstudio/aws-main-services#1932

I tried it out, but --standalone would require us to export the image each time which turns into a storage issue for us.

@jspiewak

jspiewak commented Jun 3, 2026

Copy link
Copy Markdown

I suggest updating to SOCI 0.13.0 and using the new --standalone to avoid needing to rely upon containerd. See rstudio/aws-main-services#1932

I tried it out, but --standalone would require us to export the image each time which turns into a storage issue for us.

Can you say more? I don't have enough of a picture to understand the optimization here.

@ianpittwood

Copy link
Copy Markdown
Contributor Author

I suggest updating to SOCI 0.13.0 and using the new --standalone to avoid needing to rely upon containerd. See rstudio/aws-main-services#1932

I tried it out, but --standalone would require us to export the image each time which turns into a storage issue for us.

Can you say more? I don't have enough of a picture to understand the optimization here.

I guess we can since now I'm remembering we do a mostly incremental pull/merge/push process rather than parallel. I could see exporting the tarball costing us 30GB+ for some image constructions though. We would need to:

  • Pull both platform builds. We build multiplatform images on native architectures to avoid the speed hit for emulation. It also allows us to test on native architectures.
  • Merge the individual platforms together.
  • Export the image to a tarball. This at least doubles the disk usage.
  • Run the SOCI conversion in standalone, push it to temp.
  • Push/copy images to final destinations.
  • Clear local tarball and image copies.

When running on expanded runners, I think we get 100GB of disk storage. If that is the case, we should be able to do standalone mode. We just need to be very diligent about clean up after each image is processed.

@jspiewak

jspiewak commented Jun 3, 2026

Copy link
Copy Markdown

ianpittwood added a commit that referenced this pull request Jun 5, 2026
The CI runner reports a fixed terminal width of 80 columns and emits
rich-styled help output, which causes typer/rich to wrap long option
names like \`--enable-soci/--no-enable-soci\` across rows with
embedded ANSI escapes. That defeats substring assertions like
\`"--enable-soci" in result.stdout\` even though the option is present.

Pass \`COLUMNS=200\`, \`TERM=dumb\`, and \`NO_COLOR=1\` via the
CliRunner's \`env\` argument so the rendered help is wide enough to
keep flag names on a single line and unstyled enough that no escape
codes get interleaved with the option text. Applied to all four help-
output assertions in \`test_ci_publish.py\` and \`test_cli.py\`.

Fixes the \`test_publish_command_flags_present\` failure on PR #555 CI.
@ianpittwood ianpittwood force-pushed the feature/soci-images branch from 945478a to 2e407ab Compare June 5, 2026 17:55
ianpittwood added a commit that referenced this pull request Jun 8, 2026
The CI runner reports a fixed terminal width of 80 columns and emits
rich-styled help output, which causes typer/rich to wrap long option
names like \`--enable-soci/--no-enable-soci\` across rows with
embedded ANSI escapes. That defeats substring assertions like
\`"--enable-soci" in result.stdout\` even though the option is present.

Pass \`COLUMNS=200\`, \`TERM=dumb\`, and \`NO_COLOR=1\` via the
CliRunner's \`env\` argument so the rendered help is wide enough to
keep flag names on a single line and unstyled enough that no escape
codes get interleaved with the option text. Applied to all four help-
output assertions in \`test_ci_publish.py\` and \`test_cli.py\`.

Fixes the \`test_publish_command_flags_present\` failure on PR #555 CI.
@ianpittwood ianpittwood force-pushed the feature/soci-images branch from 0748df3 to dbe2902 Compare June 8, 2026 17:08
ianpittwood added a commit that referenced this pull request Jun 8, 2026
The CI runner reports a fixed terminal width of 80 columns and emits
rich-styled help output, which causes typer/rich to wrap long option
names like \`--enable-soci/--no-enable-soci\` across rows with
embedded ANSI escapes. That defeats substring assertions like
\`"--enable-soci" in result.stdout\` even though the option is present.

Pass \`COLUMNS=200\`, \`TERM=dumb\`, and \`NO_COLOR=1\` via the
CliRunner's \`env\` argument so the rendered help is wide enough to
keep flag names on a single line and unstyled enough that no escape
codes get interleaved with the option text. Applied to all four help-
output assertions in \`test_ci_publish.py\` and \`test_cli.py\`.

Fixes the \`test_publish_command_flags_present\` failure on PR #555 CI.
@ianpittwood ianpittwood force-pushed the feature/soci-images branch from dbe2902 to 79d625d Compare June 8, 2026 17:12
ianpittwood and others added 28 commits June 8, 2026 15:44
In standalone mode `soci convert --standalone` only accepts a local OCI
image layout (tar/directory) for both source and destination — it never
touches a registry. The previous standalone path passed the temp-registry
refs straight to `soci convert --standalone`, which fails with
"failed to access input <ref>: no such file or directory".

Rework `SociConvertWorkflow._run_standalone` to bridge the registry on both
ends with oras (no containerd/ctr needed):

  1. oras cp --to-oci-layout <source_ref> <scratch>/src   (registry -> layout)
  2. soci convert --standalone --format oci-dir <src> <out>
  3. read <out>/index.json digest (soci writes the result untagged)
  4. oras cp --from-oci-layout <out>@<digest> <source_ref>-soci (layout -> registry)
  5. remove the scratch layouts (success and failure)

Supporting changes:
- OrasCopy gains --from-oci-layout / --to-oci-layout flags.
- SociConvertWorkflow gains an oras_bin field; SociPlugin.execute resolves
  it via find_oras_bin and passes it through.
- _build_convert takes optional source/destination/output_format overrides.
- destination_ref stays a registry ref in both modes; fix stale docstring
  that claimed standalone refs are filesystem paths.

Verified end-to-end: the real workflow pulls, converts, and pushes a
`-soci` index (with the SOCI artifact) to a local registry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The bakery ci merge -> publish refactor dropped the post-copy manifest
inspection the old merge ran with `docker buildx imagetools inspect`.
Re-add an existence check using ORAS, which is faster and more reliable.

Adds OrasManifestFetch (`oras manifest fetch --descriptor`) and an
OrasIndexVerifyWorkflow primitive, then wires a Phase 4 verify step into
`bakery ci publish` that fails the command if any destination tag does
not resolve.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the redundant enable-soci flag layer and rely entirely on the
per-image/variant `soci` options (`enabled: true`) to control SOCI
conversion. The publish command now always runs the convert phase; the
SOCI plugin already skips targets whose resolved options have
enabled=False, so the CLI/workflow flag was a redundant gate.

- Drop the --enable-soci/--no-enable-soci option from `bakery ci publish`
  and make Phase 2 (SOCI convert) unconditional.
- Drop enable_soci from the `bakery ci merge` alias.
- Remove the enable-soci workflow input; run Setup SOCI unconditionally
  and stop passing a SOCI flag to `bakery ci publish`.
- Update tests and the manual test plan to reflect config-driven SOCI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SociPlugin.execute resolved the soci/ctr/oras binaries eagerly and
unconditionally, before the dry-run-aware workflow ran. find_bin raises
when a tool is absent, so `bakery ci publish --dry-run` aborted with
BakeryToolNotFoundError on any host lacking those tools — even though a
dry run executes nothing and should only log the commands it would run.

In dry-run mode, tolerate a missing tool and fall back to the bare name
for the logged command; keep the resolved path when the tool is present
so dry-run output stays accurate. Outside dry-run, a missing tool
remains a hard error.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
execute() hard-resolved soci, ctr, and oras for every run, but standalone
mode bridges the registry with oras and never touches containerd, while
the containerd-backed path uses ctr but not oras. A real standalone
conversion therefore aborted with BakeryToolNotFoundError on hosts without
ctr installed, even though ctr is never invoked.

Only require a tool when an eligible target will actually execute it:
soci is always required, ctr only when a containerd-backed target is
present, oras only when a standalone target is present. Unused tools (and
all tools under --dry-run) fall back to the bare name instead of aborting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ci publish passes every config target to soci.execute, but with dev and
matrix versions included that spans many versions and streams — far more
than the targets present in a given set of build metadata. Only targets
with merge sources get a temp ref in the index-create phase, so every
other SOCI-enabled target arrived at execute() with no source ref and was
reported as "SOCI convert failed: no source ref provided", flipping the
whole run to a failure even though the in-scope targets converted fine.

A SOCI-enabled target with no source ref is simply not part of this run
(no merge sources / build metadata for it), so skip it like a disabled
target rather than surfacing a spurious conversion failure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The merge->publish refactor dropped the --image-name filter that PR #566
added, breaking the per-image merge fan-out: bakery-build-native.yml still
invokes `bakery ci publish --image-name "^${IMAGE_NAME}$"`, so every per-image
publish job failed with "no such option".

Restore the regex --image-name option on `publish` (wired into
BakeryConfigFilter) and forward it from the `merge` alias. Also scope publish
to the targets actually present in the provided metadata files rather than all
config.targets, so a single set of files no longer drags in every other
version/variant for each phase to re-skip.

Harden BuildMetadata.image_tags/image_ref against a null image.name (return
[]/None instead of crashing on None.split), and have get_merge_sources skip
metadata with no resolvable ref rather than emitting a null source.

Bring the two stale merge filter scenarios in line with the new publish flow:
pass --temp-registry (required by Phase 1 index-create) and give the
multi-image testdata valid image.name + digest so the filtered target merges.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a `sudo` field and `_sudo_prefix` property to the `SociCommand` base
class so that both `SociConvert` and `SociPush` prepend `["sudo", "-n"]`
when `sudo=True`, mirroring the same capability added to `ContainerdImagePull`.
Adds SociPrivilegeError and resolve_sudo_prefix() to determine the correct
command prefix ([], or ["sudo", "-n"]) for containerd-touching commands,
raising on a real run when neither root nor passwordless sudo is available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The workflow now uses the fixed "default" namespace, so the
candidate_namespaces field on SociOptions is no longer needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Within the `if not standalone and not dry_run:` guard, `dry_run` is
always `False`, making `dry_run=dry_run` a misleading no-op. Use the
default (`dry_run=False`) instead for clarity.
…ment gate

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Containerd-backed SOCI conversion required substantial per-user and
per-environment setup (running containerd, an accessible root-owned
socket, sudo) that made it unreliable in practice. The standalone
(oras-based, file-to-file) path is foolproof, so make it the only mode.

Remove the `--soci-mode` flag from `bakery soci convert` and
`bakery ci publish`, along with the containerd machinery it selected:
`SociModeEnum`, `ContainerdImagePull`, `SociPush`, `find_ctr_bin`,
`resolve_sudo_prefix`/`SociPrivilegeError`, and the sudo/namespace
fields on `SociCommand`/`SociConvertWorkflow`. `SociConvert` now always
emits `convert --standalone`, and the workflow's `run()` is just the
oras pull -> soci convert -> oras push round-trip. Binary resolution no
longer requires `ctr`.

Drop the now-vestigial containerd socket step from the setup-soci
action and update its description.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ianpittwood ianpittwood force-pushed the feature/soci-images branch from f8f6c0e to 91a77d5 Compare June 8, 2026 21:55
@ianpittwood ianpittwood requested a review from bschwedler June 8, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants