feat(test-harness): add parallel execution, better error reporting, and lint fixes by markphelps · Pull Request #2944 · replicate/cog

markphelps · 2026-04-15T14:48:38Z

Summary

Add --concurrency flag (default 4) to run model builds/tests in parallel across run, build, and schema-compare commands
Improve error reporting so the final error message includes per-model failure details instead of just listing names
Harden setup command execution with set -euo pipefail and empty cog.yaml detection to catch missing tools (e.g. yq) immediately
Fix all 30 golangci-lint issues across the test-harness codebase

Parallel execution

Models are dispatched via errgroup with SetLimit(concurrency). Each model gets its own isolated work directory (copied from the shared clone) so setup commands like script/select.sh don't clobber each other. In parallel mode, build output is captured rather than streamed to avoid interleaving. --concurrency 1 preserves the original sequential behavior with real-time streaming.

Error reporting

Before:

Error: 15 model(s) failed: flux-schnell, flux-dev-lora, flux-fill-dev, ...

After:

Error: 15 model(s) failed:
  x flux-schnell: Preparation failed: setup command "script/select.sh schnell" failed: exit status 127
  x flux-dev-lora: Build failed: signal: killed
  x chatterbox:
      test "text-to-speech synthesis": cog predict exited with error: exit status 1

Setup hardening

Setup commands now run with set -euo pipefail so a missing yq (or any other tool) fails fast instead of silently producing an empty cog.yaml that later fails with a confusing python_version is required error.

Lint fixes

Resolved all 30 golangci-lint issues: copyloopvar, errcheck, gocritic (octal literals, if-else chains), gosec (SSRF annotations), misspell, modernize (SplitSeq, maps.Copy, built-in min).

…nd lint fixes Add --concurrency flag (default 4) to run model builds/tests in parallel using errgroup with a configurable semaphore. All three commands (run, build, schema-compare) support parallel execution with: - Pre-allocated result slices to preserve manifest ordering - Per-model progress lines under a mutex - Quiet mode that captures build output instead of streaming to avoid interleaved output in parallel mode - Per-model isolated work directories so models sharing a repo don't clobber each other's setup commands Improve error reporting: - Final error message now includes per-model failure details (error message or failing test names) instead of just listing model names - Setup commands run with 'set -euo pipefail' so missing tools (e.g. yq) are caught immediately instead of silently producing empty files - Validate cog.yaml is non-empty after setup to catch silent failures Fix all 30 golangci-lint issues across the test-harness: - copyloopvar: remove unnecessary Go <1.22 loop variable copies - errcheck: handle intentionally ignored errors - gocritic: use 0o octal literals, rewrite if-else chains as switch - gosec: annotate intentional HTTP calls to known APIs - misspell: fix British/American spelling - modernize: use strings.SplitSeq, maps.Copy, built-in min

ask-bonk

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

I found one issue in the sequential execution path where test failures could be silently ignored.

ask-bonk · 2026-04-15T14:50:43Z

I've posted my review. The key issue is in the sequential execution path in run.go (lines 111-116) where there's no status feedback showing whether tests pass or fail - only the initial "Running..." message is printed. This is inconsistent with the parallel mode which shows "x FAILED" or "+ passed" status updates, and with the PR description claiming --concurrency 1 preserves real-time feedback behavior.

github run

…odel manifest Add requires_tools field to manifest model schema so models can declare CLI tools needed for setup commands. The harness checks tools exist on PATH before running setup and errors with a clear message listing the missing tools. Improve setup error reporting: both quiet and non-quiet modes now capture stderr and include it in the error message so users see the actual failure instead of a downstream error from an empty cog.yaml. Add manifest-replicate.yaml with 15 Replicate production models.

ask-bonk · 2026-04-15T15:21:11Z

@markphelps Bonk workflow was cancelled.

View workflow run · To retry, trigger Bonk again.

ask-bonk

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, and lint fixes to the test-harness tool.

I found the same issue from the previous review exists in two more files: the sequential execution paths in build.go and schema_compare.go also lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED".

ask-bonk · 2026-04-15T15:23:16Z

I reviewed the PR files and confirmed the same issue from the previous review exists in build.go and schema_compare.go - their sequential execution paths lack pass/fail feedback (unlike the parallel mode). I posted a batch review with suggestions to add this feedback to both files.

github run

Replace the quiet/buffered approach with a prefixWriter that prepends each output line with the model name, like docker-compose: [flux-schnell ] Step 3/12: RUN pip install ... [chatterbox ] Building image cog-harness-chatterbox:test [flux-schnell ] Successfully built abc123 Output streams in real-time even in parallel mode. Sequential mode (--concurrency 1) streams without prefixes as before.

ask-bonk

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

The same issue from previous reviews still exists in all three files: the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED". This contradicts the PR description claiming --concurrency 1 preserves real-time feedback behavior.

ask-bonk · 2026-04-15T16:10:18Z

The same issue from previous reviews still exists in all three files (build.go, run.go, and schema_compare.go): the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode. I've posted a batch review with suggestions for adding this missing feedback.

github run

Print === banners with timing info at each phase boundary so users can see where time is being spent: === Preparing flux-schnell... === Building flux-schnell (timeout 900s)... === Build complete (62.3s) === Predict test 1/1: generate image from prompt (timeout 900s)... === Predict test 1/1 PASSED (48.7s) In parallel mode these are prefixed with the model name like all other output.

ask-bonk

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

The same issue from previous reviews still exists in all three files: the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED". This contradicts the PR description claiming --concurrency 1 preserves real-time feedback behavior.

ask-bonk · 2026-04-15T16:26:54Z

I reviewed the three changed Go files to verify whether previous feedback had been addressed. Finding the same issue still existed (missing pass/fail feedback in sequential execution mode across all three files), I posted a batch GitHub review with code suggestions to fix the inconsistent behavior.

github run

Expose the existing SkipSchemaValidation option as a CLI flag. This is useful when building models that fail schema validation due to import-time side effects (e.g. accessing GPU/weights files) or when using a remote Docker context where the legacy schema validation container cannot access local resources. Also add skip_schema_validation field to the test harness manifest so models can opt out of schema validation per-model.

ask-bonk

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

The same issue from previous reviews still exists in all three files: the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED". This contradicts the PR description claiming --concurrency 1 preserves real-time feedback behavior.

ask-bonk · 2026-04-15T18:39:36Z

Review posted successfully with suggestions for all three files.

github run

Signed-off-by: Mark Phelps <mphelps@cloudflare.com>

cog predict has a default --setup-timeout of 300s which kills the container if model setup (weight downloads) takes longer. Pass the model's timeout value as the setup timeout so large models have enough time to download weights during first run.

ask-bonk

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

The same issue from previous reviews still exists in all three files: the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED". This contradicts the PR description claiming --concurrency 1 preserves real-time feedback behavior.

…rting

ask-bonk · 2026-04-15T19:55:57Z

Review posted successfully with suggestions for all three files.

github run

ask-bonk

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

The same issue from previous reviews still exists in all three files: the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED". This contradicts the PR description claiming --concurrency 1 preserves real-time feedback behavior.

ask-bonk · 2026-04-15T19:58:08Z

Review posted successfully with suggestions for all three files.

github run

The sequential code paths (--concurrency 1) were missing per-model status output after each build/run/compare, unlike the parallel paths. Now both modes print consistent pass/fail/skip feedback.

ask-bonk · 2026-04-15T21:30:56Z

LGTM

github run

…rting

ask-bonk · 2026-04-16T14:08:09Z

LGTM

github run

Signed-off-by: Mark Phelps <mphelps@cloudflare.com>

…ttps://github.com/replicate/cog into mphelps/test-harness-parallel-and-error-reporting * 'mphelps/test-harness-parallel-and-error-reporting' of https://github.com/replicate/cog: Bump version to 0.18.0 (#2946)

ask-bonk · 2026-04-16T14:37:35Z

LGTM

github run

…runner - Replace errgroup.WithContext with WaitGroup.Go + semaphore channel; goroutines never returned errors so errgroup semantics were misleading and g.Wait() was silently discarded - Extract generic runModels[T] helper to deduplicate the parallel/sequential loop that was copy-pasted across build.go, run.go, and schema_compare.go - Use singleflight.Group in cloneRepo instead of holding a mutex during the entire git clone operation, which serialized all parallel clones - Capture cloneRepo stderr into a buffer instead of writing directly to os.Stderr, preventing interleaved output in parallel mode - Add --concurrency flag validation (reject values < 1 to prevent panic) - Split modelOutput into modelLoggers + modelOutput so RunModel/BuildModel don't allocate capture buffers they never use - Cap prefixWriter line buffer at 64 KiB to prevent unbounded growth from long lines without newlines (e.g. progress bars) - Truncate runSetupCommands error output to last 2000 bytes, matching the pattern already used in buildModelWithEnv

Signed-off-by: Mark Phelps <mphelps@cloudflare.com>

ask-bonk · 2026-04-16T15:25:48Z

LGTM

github run

Improve schema-compare output so differences are immediately readable: - Replace terse one-line-per-path diff format with unified-diff style output showing actual JSON values with --- static / +++ runtime headers and @@ path @@ hunks (like git diff) - Sort diff keys alphabetically for deterministic output - Suppress docker build log streaming during schema comparison builds (quietBuildModelWithEnv) so build output no longer intermingles with the diff report; build logs are still captured for error messages - Add spacing around diff blocks in console report for readability - Add table-driven tests for the new jsonDiff format

ask-bonk · 2026-04-16T17:49:00Z

Looking at the current code, I can see the PR has been updated with a refactored runModels helper function in root.go that handles both sequential and parallel execution. The sequential path now properly prints status lines after each model completes, and all three commands (build, run, schema-compare) check for failures after processing.

The previous issues about missing pass/fail feedback in sequential mode appear to be resolved:

runModels in root.go:142-148 now prints status lines for both modes
Each command checks results and returns errors via formatFailureSummary

LGTM

github run

markphelps added 2 commits April 15, 2026 10:47

fix: use bash instead of sh for setup commands (dash lacks pipefail)

dd1145d