feat(marketplace): GPU (CUDA) plugin bundle variants + bundle-based demo image#638
feat(marketplace): GPU (CUDA) plugin bundle variants + bundle-based demo image#638staging-devin-ai-integration[bot] wants to merge 10 commits into
Conversation
…emo image Add optional per-plugin CUDA bundle variants alongside the canonical CPU bundle, end-to-end across schema, install client, release pipeline, and the CPU demo image. - marketplace.rs: add backward-compatible variants[] to manifests (empty => byte-identical to pre-variant CPU manifests). - marketplace_installer.rs: resolve bundle by explicit accelerator or CUDA auto-detect (libcuda probe), falling back to the CPU bundle. - build_registry.py / verify_bundles.py: --accelerator passes layer a cuda variant onto the published CPU manifest (append-only, immutable); CUDA NEEDED/RUNPATH deps allowlisted for cuda bundles only. - CI: build-marketplace-cuda job on the self-hosted Ada GPU runner builds + publishes <id>-<ver>-cuda-bundle.tar.zst to the per-plugin release. - Dockerfile.demo: fetch sha256-pinned prebuilt CPU marketplace bundles instead of rebuilding 11 plugins from source; drop the from-source sherpa-onnx copy (bundles vendor their own libs) and the torch-based Helsinki conversion (pull pre-converted model tarballs). Signed-off-by: streamkit-devin <devin@streamkit.dev>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #638 +/- ##
==========================================
+ Coverage 85.22% 85.24% +0.01%
==========================================
Files 249 249
Lines 75412 75765 +353
Branches 2437 2314 -123
==========================================
+ Hits 64272 64584 +312
- Misses 11134 11175 +41
Partials 6 6
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| @@ -803,7 +817,7 @@ impl PluginInstaller { | |||
| } | |||
There was a problem hiding this comment.
📝 Info: Accelerator marker keyed only by id+version; switching variants requires manual uninstall
The install directory and download cache are keyed by plugin id + version only, so a CPU and CUDA build of the same version cannot coexist. The new .accelerator marker plus the mismatch check in handle_bundle_install (marketplace_installer.rs:811-826) correctly turns a variant switch into an explicit error rather than a silent no-op. Legacy bundles installed before this marker existed read as None and skip the check, so a machine that gains a GPU and auto-detects cuda will get the CPU error message only after the marker is present; for pre-marker installs the variant simply stays whatever was first installed. This is the intended behavior per the marker doc comment, noted here for reviewer awareness of the UX (no automatic re-install on accelerator change).
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
| features=() | ||
| if grep -qE '^[[:space:]]*cuda[[:space:]]*=' "${plugin_dir}/Cargo.toml"; then | ||
| features=(--features cuda) | ||
| fi | ||
|
|
||
| echo "Building CUDA plugin: ${plugin} ${features[*]:-}" | ||
| ( | ||
| cd "${plugin_dir}" | ||
| CARGO_TARGET_DIR="${target_dir}" cargo build --release "${features[@]}" | ||
| ) |
There was a problem hiding this comment.
🚩 helsinki/whisper CUDA builds rely on toolkit + dynamic CUDA loading on self-hosted runner
build_official_plugins_cuda.sh builds plugins with --features cuda only when their Cargo.toml declares a cuda = feature. whisper (whisper-rs/cuda) and helsinki (candle-core/cuda etc.) both qualify. These require an NVIDIA CUDA toolkit/nvcc at build time, which the build-marketplace-cuda job assumes is preinstalled on the self-hosted runner (only stated in a comment, not enforced). Candle/cudarc and whisper.cpp CUDA generally dlopen libcudart/libcublas/libcudnn at runtime, which verify_bundles.py --accelerator cuda permits via CUDA_DEP_ALLOWLIST. No code defect, but the CUDA build correctness depends entirely on runner provisioning that CI does not validate.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
| # libfdk-aac2 is in Debian's non-free component. | ||
| # sherpa-onnx / ONNX Runtime are NOT installed here: each sherpa-backed | ||
| # bundle vendors its own libs (RUNPATH=$ORIGIN). |
There was a problem hiding this comment.
🚩 Demo image now depends on marketplace bundles being fully self-contained
The runtime stage no longer installs sherpa-onnx/onnxruntime into /usr/local/lib (previously copied from kokoro-builder), relying on each sherpa-backed bundle vendoring its own libs with RUNPATH=$ORIGIN. whisper's ort (onnxruntime) for Silero VAD and helsinki's Candle are expected to statically link / self-contain. This matches what external marketplace users install, so if a bundle is not self-contained it is a marketplace-wide regression rather than demo-specific. Confirmed whisper uses ort and helsinki uses Candle (not CTranslate2), so no CTranslate2 runtime dependency is missing from the slimmed runtime image. Worth a one-time validation that the demo container actually loads all 12 plugins.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
|
|
||
| # Variant passes (e.g. cuda) layer onto an already-published registry, so | ||
| # copy the full existing tree forward; untouched plugins must stay present | ||
| # and only the variant-updated manifests are overwritten below. | ||
| if accelerator != "cpu": | ||
| if not args.existing_registry: | ||
| print( | ||
| "ERROR: --accelerator other than 'cpu' requires --existing-registry", | ||
| file=sys.stderr, | ||
| ) | ||
| return 1 | ||
| src_plugins = existing_registry_path / "plugins" | ||
| if src_plugins.exists(): | ||
| shutil.copytree(src_plugins, registry_out / "plugins", dirs_exist_ok=True) |
There was a problem hiding this comment.
📝 Info: CUDA registry pass correctly preserves full tree, index, and pubkey
The cuda accelerator pass copies the existing CPU registry's plugins/ tree forward (build_registry.py:466-475), overwrites only variant-updated manifests, then regenerates index.json from existing_registry (loaded from the CPU registry) and copies streamkit.pub from the checked-out repo default path (build_registry.py:749-751). Since the CUDA job runs actions/checkout, the default docs/public/registry/streamkit.pub exists, so the merged registry uploaded with overwrite=true is complete. The append-only immutability check correctly diffs only non-variants fields for variant passes and carries variants forward on CPU reuse passes. Logic verified as consistent.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
publish-registry depends on build-marketplace-cuda, which is skipped when build_cuda=false. A skipped dependency fails the implicit success() gate, so CPU registry publishing was silently skipped too. Gate publish-registry on the CPU build succeeding and the CUDA build not failing. Signed-off-by: streamkit-devin <devin@streamkit.dev>
| let accelerator = self.resolve_accelerator(request.accelerator.as_deref()); | ||
| let bundle = manifest.resolve_bundle(accelerator.as_deref()).ok_or_else(|| { | ||
| InstallError::Other(anyhow!("Plugin manifest missing required `bundle` section")) | ||
| })?; | ||
| if let Some(requested) = accelerator.as_deref() { | ||
| tracing::info!( | ||
| plugin_id = %manifest.id, | ||
| requested_accelerator = requested, | ||
| selected_accelerator = bundle.accelerator, | ||
| "Resolved plugin bundle variant" | ||
| ); | ||
| } | ||
| let bundle_url = self |
There was a problem hiding this comment.
📝 Info: Accelerator auto-detection installs CPU bundle for CPU-only plugins (expected)
resolve_accelerator in apps/skit/src/marketplace_installer.rs:958-971 can return Some("cuda") purely from runtime auto-detection (cuda_runtime_available) even for plugins that declare no CUDA variant. PluginManifest::resolve_bundle (apps/skit/src/marketplace.rs:261-282) then falls back to the canonical CPU bundle when no matching variant exists, so the install still succeeds with the CPU artifact. This is the intended graceful fallback, but note the emitted log line Resolved plugin bundle variant requested=cuda selected=cpu will appear for every install on a CUDA host, which could be confusing during debugging.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Acknowledged — the CPU-fallback behavior is intended. The log wording (requested=cuda selected=cpu on a CUDA host for CPU-only plugins) is cosmetic; happy to soften it to a debug-level line or note "no cuda variant; using cpu" if you'd prefer, but leaving as-is for now to keep this PR scoped.
build_registry.py requires --signing-key, but the CUDA registry variant step omitted it, so argparse would abort the GPU registry build (and, via the publish-registry guard, block CPU publishing too). Signed-off-by: streamkit-devin <devin@streamkit.dev>
…switch Addresses review findings on the marketplace GPU bundle work: - Make build-marketplace-cuda best-effort (continue-on-error) so a failed GPU job no longer blocks CPU per-plugin releases or registry publishing; simplify the publish-registry guard to gate solely on the CPU build. - Guard the empty Bash array expansion in build_official_plugins_cuda.sh so feature-less plugins build under set -u on Bash < 4.4. - Embed manifest.json in CUDA variant bundles for parity with CPU bundles. - Record the activated accelerator in an installed bundle and return a clear 'already installed as <variant>' error instead of a silent no-op when a request would switch the variant for an already-installed version. Signed-off-by: streamkit-devin <devin@streamkit.dev>
| features=() | ||
| if grep -qE '^[[:space:]]*cuda[[:space:]]*=' "${plugin_dir}/Cargo.toml"; then | ||
| features=(--features cuda) | ||
| fi | ||
|
|
||
| echo "Building CUDA plugin: ${plugin} ${features[*]:-}" | ||
| ( | ||
| cd "${plugin_dir}" | ||
| CARGO_TARGET_DIR="${target_dir}" cargo build --release ${features[@]+"${features[@]}"} |
There was a problem hiding this comment.
📝 Info: CUDA-capable plugins relying on sherpa must keep sherpa linkage or the cuda build aborts the whole batch
build_official_plugins_cuda.sh:64-74 errors out (exit 1, aborting the entire script under set -e) for any plugin that declares the cuda accelerator but has neither a cuda Cargo feature nor libsherpa-onnx-c-api.so linkage. Today all six cuda-declared plugins satisfy this (whisper/helsinki have a cuda feature; kokoro/matcha/sensevoice/vad link sherpa), so the batch builds fine. But this is a sharp edge: if a future cuda-declared plugin drops sherpa linkage or its cuda feature, the guard aborts the whole CUDA build (not just that plugin), silently producing zero cuda variants while the job stays green due to continue-on-error.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Resolved in 6b4b1e8. build_official_plugins_cuda.sh now fails loudly for any cuda-declared plugin that ends up as a plain CPU build: after compiling, if the plugin has no cuda Cargo feature, it readelf-checks the produced artifact for libsherpa-onnx-c-api.so linkage (the EP-agnostic sherpa runtime that build_registry.py packages against GPU sherpa libs). If neither a cuda feature nor sherpa linkage is present, it errors out rather than packaging an unaccelerated .so as the cuda variant.
… variant build Asset types (e.g. Slint's) declared in plugin.yml were dropped by build_manifest, so manifest.json never carried them. The server only learns asset types from a plugin.yml beside the .so, so both raw-extraction consumers (the demo image) and real marketplace installs lost the type. - build_manifest now includes a non-empty assets block (omitted when empty, keeping asset-less manifests byte-identical for the append-only check). - build_bundle embeds a plugin.yml beside the entrypoint so bundles are self-describing, matching what the installer writes post-download. - Bump slint 0.4.0 -> 0.5.0 so the corrected, assets-bearing manifest can be published under the append-only registry; inject the source plugin.yml into the demo's pinned (pre-self-describing) slint bundle in the interim. - build_official_plugins_cuda.sh fails loudly when a cuda-declared plugin has neither a cuda Cargo feature nor sherpa linkage (was silently packaging a CPU .so as the cuda variant). - CUDA variant pass verifies the existing manifest signature before re-signing, matching the CPU reuse path. - Mirror the assets field in check_registry_versions.py. Signed-off-by: streamkit-devin <devin@streamkit.dev>
| fn resolve_accelerator(&self, requested: Option<&str>) -> Option<String> { | ||
| if let Some(value) = requested.map(str::trim).filter(|value| !value.is_empty()) { | ||
| return Some(value.to_string()); | ||
| } | ||
| if let Some(value) = | ||
| self.default_accelerator.as_deref().map(str::trim).filter(|value| !value.is_empty()) | ||
| { | ||
| return Some(value.to_string()); | ||
| } | ||
| if cuda_runtime_available() { | ||
| return Some("cuda".to_string()); | ||
| } | ||
| None | ||
| } |
There was a problem hiding this comment.
🟡 Plugin installs default to GPU builds on machines that only have the graphics driver, causing failures
The installer treats the mere presence of the NVIDIA driver stub (libcuda.so.1 probed by cuda_runtime_available() at apps/skit/src/marketplace_installer.rs:2153) as a signal to auto-select the GPU plugin build whenever no accelerator is configured, so a machine that has the driver but not the CUDA runtime ends up installing a GPU build it cannot load.
Impact: On systems with an NVIDIA driver but no CUDA toolkit, installing a GPU-capable plugin now silently picks the CUDA bundle and fails to load, whereas it previously installed and ran the working CPU build.
Auto-detection heuristic and fallback path
resolve_accelerator (apps/skit/src/marketplace_installer.rs:995-1008) returns, in order, the explicit request, the configured default_accelerator, and finally "cuda" when cuda_runtime_available() is true. The default config value is None (apps/skit/src/config.rs:594), so auto-detection is the effective default. cuda_runtime_available() only dlopens libcuda.so.1, which is present whenever the NVIDIA driver is installed, but the CUDA bundles (e.g. whisper built with --features cuda) additionally need libcudart/libcublas/libcudnn at load time — these are permitted as NEEDED deps by verify_bundles.py but are not vendored into the bundle. resolve_bundle only falls back to CPU when a plugin has no matching variant (apps/skit/src/marketplace.rs:261-282); for cuda-capable plugins it returns the CUDA variant, which then fails dlopen on a runtime-less host.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
This is already addressed on the current head: CUDA_RUNTIME_LIBS probes both libcuda.so.1 (driver stub) and libcudart.so (runtime), and cuda_runtime_available() only returns true when the full stack is loadable — driver-only hosts fall back to the CPU bundle. The comment's citation of a libcuda.so.1-only probe reflects an older revision.
| if not verify_existing_signature( | ||
| existing["manifest_path"], | ||
| existing["signature_path"], | ||
| public_key_path, | ||
| ): | ||
| print( | ||
| f"ERROR: {plugin_id}@{plugin_version} — existing manifest " | ||
| f"signature verification failed before attaching the " | ||
| f"'{accelerator}' variant. The manifest may have been " | ||
| f"modified after signing; revert the changes or bump the " | ||
| f"version to trigger re-signing.", | ||
| file=sys.stderr, | ||
| ) | ||
| return 1 | ||
|
|
||
| manifest_dir = registry_out / "plugins" / plugin_id / plugin_version | ||
| manifest_path = manifest_dir / "manifest.json" | ||
| write_json(manifest_path, would_be_manifest) | ||
| sign_manifest(manifest_path, signing_key) |
There was a problem hiding this comment.
🚩 Variant pass re-signs an already-published manifest, changing its bytes
The cuda variant pass in build_registry.py:560-605 rebuilds, re-signs, and overwrites an existing published manifest.json to append the variants list. The append-only guard only checks that non-variant core fields are unchanged, but the manifest.json bytes and its minisig DO change when the first variant is attached. Clients that cached the prior manifest/signature will see a different (still validly signed) manifest on next fetch. The PR docs describe this as intended ("layers a cuda variant onto the already-published CPU manifest ... append-only; a published variant is immutable"), so the manifest-mutation-on-first-variant is by design — flagging only so reviewers confirm cache/CDN invalidation is handled for the published registry.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
…eview findings Key install dirs by id+version+accelerator so CPU and CUDA variants of the same version coexist and switching never silently no-ops; removes the .accelerator marker and its fail-open path. resolve_accelerator is probed once and threaded through download_bundle. ActivePluginRecord now tracks the activated accelerator. Build/registry: extract shared manifest_builder (build_manifest + SHERPA libs) so the builder and append-only guard cannot drift; normalize accelerator case in reuse/dedup; drop the dead variants fallback. Whisper bundle build pins SOURCE_DATE_EPOCH + -march=x86-64 for portable ggml. Demo pins ggml-tiny-q5_1.bin sha256 and injects a version-matched slint plugin.yml. Signed-off-by: streamkit-devin <devin@streamkit.dev>
| # whisper.cpp/ggml enables GGML_NATIVE (-march=native) by default, which can | ||
| # emit AVX/AVX512 that SIGILLs on older CPUs than the build runner. The | ||
| # published bundle must run on any x86-64 host, so pin a portable baseline | ||
| # (SOURCE_DATE_EPOCH disables ggml's native-default upstream). | ||
| if [ "${plugin}" = "whisper" ]; then | ||
| export SOURCE_DATE_EPOCH=1 | ||
| export CFLAGS="-O3 -pipe -fPIC -march=x86-64 -mtune=generic" | ||
| export CXXFLAGS="-O3 -pipe -fPIC -march=x86-64 -mtune=generic" | ||
| fi |
There was a problem hiding this comment.
🚩 Demo now consumes prebuilt whisper 0.3.0 bundle that predates the portability flags
This PR adds portable build flags (SOURCE_DATE_EPOCH=1, -march=x86-64 -mtune=generic) for whisper in scripts/marketplace/build_official_plugins.sh:49-53, mirroring what the now-deleted whisper-builder stage in Dockerfile.demo used to do. However, the demo now pulls the already-published whisper-0.3.0 bundle (Dockerfile.demo:154 pins WHISPER_BUNDLE_SHA256=79212cfa..., matching docs/public/registry/plugins/whisper/0.3.0/manifest.json). Because the registry is append-only, whisper 0.3.0 is reused (never rebuilt) so the new portability flags do NOT apply to the bundle the demo ships. If the published 0.3.0 bundle was compiled with ggml's default GGML_NATIVE (-march=native) on a newer build host, the demo image could SIGILL on older CPUs — a regression from the prior demo which always compiled whisper portably from source. Worth confirming how the published 0.3.0 bundle was actually built (and whether a whisper version bump is needed to rebuild it portably).
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Correct, and worth calling out explicitly: the portability flags here are forward-looking only. Because the registry is append-only, whisper-0.3.0 is reused (never rebuilt), so the bundle the demo actually ships still reflects however 0.3.0 was originally compiled. These flags only take effect the next time whisper's version is bumped and the bundle is rebuilt.
Making the currently shipped 0.3.0 bundle portable would require a whisper version bump to trigger a portable rebuild, which is a separate decision I don't want to make unilaterally inside this PR. Flagging to the maintainer: do you want a whisper version bump in this PR to rebuild 0.3.0 portably, or track it as a follow-up? Until then the fix prevents the regression for all future bundles but does not retroactively fix 0.3.0.
| "assets": plugin.get("assets") or None, | ||
| } |
There was a problem hiding this comment.
📝 Info: build_manifest now emits assets; only safe because slint was version-bumped
scripts/marketplace/manifest_builder.py:60 adds "assets": plugin.get("assets") or None to the canonical manifest, which the old build_manifest/build_manifest_from_plugin in build_registry.py/check_registry_versions.py did not emit. The registry is append-only: any plugin that already has a committed manifest at its current version AND declares assets in plugin.yml would now produce a differing would-be manifest and fail the immutability/version checks. The only plugin with assets is slint, and it was bumped 0.4.0→0.5.0 in this PR (Cargo.toml + plugin.yml + official-plugins.json), so no committed manifest collides. This is correct for the current tree but is a latent constraint: any future plugin that gains assets without a version bump will trip the append-only guard.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Acknowledged — this is correct for the current tree (slint is the only plugin with assets and it was bumped 0.4.0→0.5.0, so no committed manifest collides) and the latent constraint is intentional: a plugin that gains assets without a version bump should trip the append-only guard, since changing a published manifest's content without a version bump is exactly what that guard exists to catch. The author would bump the version, same as for any other manifest-affecting metadata change.
| build-marketplace-cuda: | ||
| name: Build Marketplace CUDA Variants | ||
| needs: [build-marketplace] | ||
| if: ${{ inputs.build_cuda }} | ||
| # GPU variants are best-effort: a failure on the (potentially flaky) | ||
| # self-hosted runner must never regress CPU registry publishing or per-plugin | ||
| # releases. continue-on-error keeps the overall workflow green so downstream | ||
| # CPU jobs still run; the failure is still visible on this job. | ||
| continue-on-error: true | ||
| # Self-hosted NVIDIA (Ada) runner: CUDA toolkit + nvcc must be preinstalled. | ||
| # Adjust the labels to match your runner registration if they differ. | ||
| runs-on: [self-hosted, linux, x64, gpu] |
There was a problem hiding this comment.
🚩 Release workflow always invokes the CUDA job, which requires the self-hosted GPU runner
build_cuda defaults to true (.github/workflows/marketplace-build.yml:14-18) and marketplace-release.yml:18 calls the reusable workflow without passing build_cuda, so build-marketplace-cuda (which targets runs-on: [self-hosted, linux, x64, gpu]) always runs on release. continue-on-error: true makes the job's conclusion success even if its steps fail, and publish-registry/create-releases use always()/job-success semantics, so a failing GPU build degrades gracefully. However, if no runner with those labels is registered, the job stays queued indefinitely rather than failing — and downstream jobs (needs: [..., build-marketplace-cuda]) wait for it to complete, stalling the entire release. This is acceptable given the workflow is manual (workflow_dispatch) and the only caller, but it does hard-couple releases to GPU runner availability.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Agreed, and by design for now: the workflow is workflow_dispatch-only with a single caller, so coupling releases to GPU-runner availability is acceptable. If no runner with the [self-hosted, linux, x64, gpu] labels is registered the job stays queued rather than failing, which would stall downstream jobs — the mitigation is operational (ensure the runner is registered, or pass build_cuda=false to skip it). If we later want releases to be fully decoupled from GPU availability we can add a timeout or split the CUDA build into its own non-blocking workflow; tracking that as a follow-up rather than expanding this PR.
| })?; | ||
| } | ||
|
|
||
| if tokio::fs::try_exists(&bundles_root).await.unwrap_or(false) { | ||
| let bundles_root_real = | ||
| plugin_paths::ensure_existing_dir_under(&base_real, &bundles_root, "bundles").await?; | ||
| let mut entries = tokio::fs::read_dir(&bundles_root_real).await?; | ||
| if entries.next_entry().await?.is_none() { | ||
| let _ = tokio::fs::remove_dir(&bundles_root_real).await; | ||
| } | ||
| } | ||
| remove_dir_if_empty(&base_real, &version_dir, "version").await?; |
There was a problem hiding this comment.
📝 Info: Uninstall of an accelerator variant orphans sibling variant bundle dirs (no active record)
remove_active_record_and_bundle (apps/skit/src/server/plugins.rs:660-684) now removes only the accelerator-specific bundle dir and leaves sibling variants under the version dir (per the comment). Since there is exactly one active record per plugin id (active/{plugin_id}.json), uninstalling the active variant deletes the record while a previously-extracted sibling variant's bundle dir remains on disk with no record referencing it. The installer's bundle_dir.exists() fast-path (marketplace_installer.rs:829) will reuse it on a later reinstall, so this is intentional caching rather than a leak per se, but operators may be surprised that uninstall leaves variant bundle directories behind under bundles/<id>/<version>/.
(Refers to lines 660-684)
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Intentional, and a direct consequence of the re-keying: sibling variant bundle dirs under bundles/<id>/<version>/ are kept as a cache so reinstalling a previously-extracted variant hits the bundle_dir.exists() fast-path instead of re-downloading. remove_dir_if_empty still reaps the <version> and <id> dirs once the last variant is gone, so there's no unbounded leak — only an extracted-but-inactive variant lingers. If operators find the leftover dirs surprising we could add a --prune-variants flag to uninstall, but I'd keep the cache-by-default behavior; tracking the optional flag as a follow-up.
Signed-off-by: streamkit-devin <devin@streamkit.dev>
| manifest = strip_none(manifest) | ||
| if variants: | ||
| ordered: dict = {} | ||
| for key, value in manifest.items(): | ||
| ordered[key] = value | ||
| if key == "bundle": | ||
| ordered["variants"] = variants | ||
| manifest = ordered |
There was a problem hiding this comment.
📝 Info: New ValueError guard fails loud on variants-without-bundle
manifest_builder.py:63-68 now raises ValueError when variants is truthy but bundle_block is None. I traced all callers: in scripts/marketplace/build_registry.py the CPU reuse path passes variants=existing_manifest.get("variants") which is falsy for the common case (never triggers), and the variant pass always supplies a non-None existing_manifest.get("bundle") together with merged_variants. This converts a previously silent data-loss case (variants dropped because they anchor after the bundle key) into an explicit failure — a safe improvement, not a regression.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Guarded in 026a083: build_manifest now raises ValueError when variants is passed without a bundle block, so a future caller can't silently lose variants.
| })?; | ||
| } | ||
|
|
||
| if tokio::fs::try_exists(&bundles_root).await.unwrap_or(false) { | ||
| let bundles_root_real = | ||
| plugin_paths::ensure_existing_dir_under(&base_real, &bundles_root, "bundles").await?; | ||
| let mut entries = tokio::fs::read_dir(&bundles_root_real).await?; | ||
| if entries.next_entry().await?.is_none() { | ||
| let _ = tokio::fs::remove_dir(&bundles_root_real).await; | ||
| } | ||
| } | ||
| remove_dir_if_empty(&base_real, &version_dir, "version").await?; | ||
| remove_dir_if_empty(&base_real, &bundles_root, "bundles").await?; |
There was a problem hiding this comment.
📝 Info: Accelerator-keyed bundle removal is consistent with installer layout
The refactored remove_active_record_and_bundle (apps/skit/src/server/plugins.rs:660-685) computes version_dir/<accelerator> for new records and version_dir for legacy (empty-accelerator) records, then prunes empty parents via remove_dir_if_empty. I verified this matches the installer, which writes bundles to bundles/<id>/<version>/<accelerator>/ (marketplace_installer.rs:1211-1215) and stores the accelerator on the active record (activate_bundle at 1301-1309). The integration test test_uninstall_marketplace_plugin_removes_bundle sets accelerator: "cpu" but stages the bundle directly under bundles/gain/1.0.0/ (no accelerator subdir); it still passes only because unload_plugin(remove_file=true) deletes the entrypoint .so (plugins.rs:875-885), emptying the version dir so remove_dir_if_empty removes it. The accelerator-keyed removal itself is a no-op in that test's layout — not a production bug, but the test does not actually exercise the new accelerator-subdir removal path.
(Refers to lines 660-685)
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Good catch — fixed in 9684702: test_uninstall_marketplace_plugin_removes_bundle now stages the bundle under bundles/gain/1.0.0/cpu/ (matching the installer's accelerator-keyed layout) and additionally asserts the empty version dir is pruned, so the accelerator-subdir removal path is actually exercised. Verified the test passes locally.
| FROM bundle-deps AS slint-bundle | ||
| ARG SLINT_PLUGIN_VERSION=0.4.0 | ||
| ARG SLINT_BUNDLE_SHA256=2b33767cdd35b9eeba3bec4e1801aa020a59ef9a61c54b28f7c5d71fa087034d | ||
|
|
||
| WORKDIR /bundle | ||
| COPY plugins/native/slint/plugin.yml plugin.yml.src | ||
| RUN set -eux; \ | ||
| url="https://github.com/streamer45/streamkit/releases/download/plugin-slint-v${SLINT_PLUGIN_VERSION}/slint-${SLINT_PLUGIN_VERSION}-bundle.tar.zst"; \ | ||
| curl --proto '=https' --tlsv1.2 -fsSL -o slint-bundle.tar.zst "$url"; \ | ||
| echo "${SLINT_BUNDLE_SHA256} slint-bundle.tar.zst" | sha256sum -c -; \ | ||
| mkdir -p extracted; \ | ||
| tar --zstd -xf slint-bundle.tar.zst -C extracted; \ | ||
| test -f extracted/libslint.so; \ | ||
| sed "s/^version:.*/version: ${SLINT_PLUGIN_VERSION}/" plugin.yml.src > extracted/plugin.yml |
There was a problem hiding this comment.
🔍 Demo image pins slint 0.4.0 while the repo bumps slint to 0.5.0
Dockerfile.demo pins SLINT_PLUGIN_VERSION=0.4.0 (line 426) and injects a version-matched plugin.yml, while plugins/native/slint/{Cargo.toml,plugin.yml} and official-plugins.json bump slint to 0.5.0. This is intentional per the in-file comments (the 0.5.0 self-describing bundle is not yet published, so the demo keeps the last published bundle and injects plugin.yml), but it means the demo image will lag the registry until someone updates the pin. Confirming the 0.4.0 bundle + SLINT_BUNDLE_SHA256 remain downloadable is worthwhile before merge, since the demo build will hard-fail on a sha mismatch or 404.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Verified: downloaded plugin-slint-v0.4.0/slint-0.4.0-bundle.tar.zst from the release and its sha256 matches the pinned SLINT_BUNDLE_SHA256 (2b33767c…). The 0.4.0 pin is intentional until a self-describing 0.5.0 bundle is published; the in-file comment flags updating the pin at that point.
…uard variant manifests Signed-off-by: streamkit-devin <devin@streamkit.dev>
| if [ "${plugin}" = "whisper" ]; then | ||
| export SOURCE_DATE_EPOCH=1 | ||
| export CFLAGS="-O3 -pipe -fPIC -march=x86-64 -mtune=generic" | ||
| export CXXFLAGS="-O3 -pipe -fPIC -march=x86-64 -mtune=generic" | ||
| fi | ||
| CARGO_TARGET_DIR="${target_dir}" cargo build --release ${features[@]+"${features[@]}"} |
There was a problem hiding this comment.
📝 Info: Whisper CFLAGS scoped to subshell in CUDA build script
scripts/marketplace/build_official_plugins_cuda.sh:61-65 exports SOURCE_DATE_EPOCH/CFLAGS/CXXFLAGS only inside the ( cd ... ) subshell, so they do not leak to subsequent plugins in the loop. For whisper has_cuda_feature is 1, so --features cuda is used and the final sherpa-linkage guard (lines 75-86) is correctly skipped. The portable-baseline flags affect only host C/C++ ggml kernels, not nvcc GPU kernels, matching the CPU build script's behavior.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
Signed-off-by: streamkit-devin <devin@streamkit.dev>
| assert!(!tokio::fs::try_exists(&record_path).await.unwrap(), "Active record should be removed"); | ||
| assert!( | ||
| !tokio::fs::try_exists(&bundle_dir).await.unwrap(), | ||
| "Bundle directory should be removed" | ||
| "Accelerator variant directory should be removed" | ||
| ); | ||
| assert!( | ||
| !tokio::fs::try_exists(&version_dir).await.unwrap(), | ||
| "Empty version directory should be pruned" | ||
| ); |
There was a problem hiding this comment.
📝 Info: Uninstall test correctly exercises accelerator-keyed layout
The updated test_uninstall_marketplace_plugin_removes_bundle places the entrypoint under bundles/gain/1.0.0/cpu/ and sets accelerator: "cpu" on the record. This matches the production logic in apps/skit/src/server/plugins.rs:666-685: a non-empty accelerator resolves bundle_dir = version_dir/cpu, which is removed, after which remove_dir_if_empty prunes the now-empty version_dir and bundles_root. The two new assertions (variant dir removed, empty version dir pruned) are consistent with this flow. No behavioral gap found in the incremental change.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
…lugins Expose the active accelerator on PluginSummary (from active records on restart and from the installer on fresh installs), add an accelerator selector (auto-detect/cpu/cuda) to the marketplace details pane wired to InstallPluginRequest.accelerator, and show the active variant in the installed plugins list. Signed-off-by: streamkit-devin <devin@streamkit.dev>
Summary
variants[]array next to the canonicalbundle. When empty, manifests are byte-identical to today's CPU-only ones, so existing clients/tooling are unaffected. The install client resolves a bundle by explicitaccelerator(REST/configdefault_accelerator) or CUDA auto-detect, always falling back to the CPUbundlewhen no GPU match exists. Auto-detect probes bothlibcuda.so.1(driver) andlibcudart.so(runtime) so a driver-only host doesn't pick a CUDA bundle it can't load.id + version + accelerator. Installs now extract tobundles/<id>/<version>/<accelerator>/instead ofbundles/<id>/<version>/. This lets CPU and CUDA builds of the same version coexist, makes variant switching deterministic (never a silent no-op), and removes the fragile.acceleratormarker file (and its fail-open path where a missing marker +install_models=truereturned success without fetching the requested variant).ActivePluginRecordnow records the activated accelerator (#[serde(default)], empty for pre-existing records). Existingbundles/<id>/<version>/installs become invisible after upgrade and are simply re-fetched into the new layout; uninstall reaps empty<version>/<id>dirs.build_registry.py --accelerator cudalayers acudavariant onto the already-published CPU manifest (immutable).verify_bundles.py --accelerator cudaallowlists CUDANEEDED/RUNPATHdeps that the strict CPU gate rejects. Shared manifest/registry logic (build_manifest,SHERPA_*_LIBS) is now inscripts/marketplace/manifest_builder.pyso the builder, the append-only guard, and the bundle verifier can't drift. Accelerator comparison is case-normalized to match the Rusteq_ignore_ascii_caseresolver.build-marketplace-cudabuilds CUDA plugins, regenerates + minisigns the variant manifests, and uploads<id>-<ver>-cuda-bundle.tar.zstto the same per-plugin release. CPU bundle naming is unchanged.Dockerfile.democonsumes the marketplace instead of rebuilding from source. All 11 plugins are fetched as sha256-pinned prebuilt CPU bundles (the artifacts users install), making the demo a regression surface for published bundles. The whisper bundle build now pinsSOURCE_DATE_EPOCH+-march=x86-64 -mtune=genericso future ggml builds are portable;ggml-tiny-q5_1.binis now sha256-pinned in the demo; the slint stage injects a version-matchedplugin.ymlfor the pinned 0.4.0 bundle.Review & Validation
id+version+acceleratorinstall path is acceptable; existingbundles/<id>/<version>/installs are re-fetched (not migrated) after upgrade.Dockerfile.demomatchdocs/public/registry/plugins/<id>/<version>/manifest.json.[self-hosted, linux, x64, gpu]) and that the CUDA sherpa-onnx archive / minisign secret are available.Notes
whisper-0.3.0bundle the demo ships is reused under append-only and is not retroactively rebuilt. Making the shipped bundle portable requires a whisper version bump — flagged on the review thread for a maintainer decision.verify_bundles.py --accelerator cudaonly. Publishing + minisign signing happen in CI on the Ada runner.#9(copytree per variant pass) is intentionally left as a full-tree copy:publish-registryassembles the registry wholesale (rm -rf docs/public/registry && cp -R dist/registry/*), so the variant pass output must contain every plugin's manifest — copying only affected plugins would drop CPU-only plugins. The carried files are small JSON manifests + signatures (bundles live in releases), so cost is O(manifest count).Link to Devin session: https://staging.itsdev.in/sessions/3fe4bb10d20147bc9384def4bc81694b
Requested by: @streamer45
Devin Review
e0bba4f