Skip to content

feat(test-harness): add parallel execution, better error reporting, and lint fixes#2944

Open
markphelps wants to merge 16 commits intomainfrom
mphelps/test-harness-parallel-and-error-reporting
Open

feat(test-harness): add parallel execution, better error reporting, and lint fixes#2944
markphelps wants to merge 16 commits intomainfrom
mphelps/test-harness-parallel-and-error-reporting

Conversation

@markphelps
Copy link
Copy Markdown
Contributor

Summary

  • Add --concurrency flag (default 4) to run model builds/tests in parallel across run, build, and schema-compare commands
  • Improve error reporting so the final error message includes per-model failure details instead of just listing names
  • Harden setup command execution with set -euo pipefail and empty cog.yaml detection to catch missing tools (e.g. yq) immediately
  • Fix all 30 golangci-lint issues across the test-harness codebase

Parallel execution

Models are dispatched via errgroup with SetLimit(concurrency). Each model gets its own isolated work directory (copied from the shared clone) so setup commands like script/select.sh don't clobber each other. In parallel mode, build output is captured rather than streamed to avoid interleaving. --concurrency 1 preserves the original sequential behavior with real-time streaming.

Error reporting

Before:

Error: 15 model(s) failed: flux-schnell, flux-dev-lora, flux-fill-dev, ...

After:

Error: 15 model(s) failed:
  x flux-schnell: Preparation failed: setup command "script/select.sh schnell" failed: exit status 127
  x flux-dev-lora: Build failed: signal: killed
  x chatterbox:
      test "text-to-speech synthesis": cog predict exited with error: exit status 1

Setup hardening

Setup commands now run with set -euo pipefail so a missing yq (or any other tool) fails fast instead of silently producing an empty cog.yaml that later fails with a confusing python_version is required error.

Lint fixes

Resolved all 30 golangci-lint issues: copyloopvar, errcheck, gocritic (octal literals, if-else chains), gosec (SSRF annotations), misspell, modernize (SplitSeq, maps.Copy, built-in min).

…nd lint fixes

Add --concurrency flag (default 4) to run model builds/tests in parallel
using errgroup with a configurable semaphore. All three commands (run,
build, schema-compare) support parallel execution with:

- Pre-allocated result slices to preserve manifest ordering
- Per-model progress lines under a mutex
- Quiet mode that captures build output instead of streaming to avoid
  interleaved output in parallel mode
- Per-model isolated work directories so models sharing a repo don't
  clobber each other's setup commands

Improve error reporting:
- Final error message now includes per-model failure details (error
  message or failing test names) instead of just listing model names
- Setup commands run with 'set -euo pipefail' so missing tools (e.g.
  yq) are caught immediately instead of silently producing empty files
- Validate cog.yaml is non-empty after setup to catch silent failures

Fix all 30 golangci-lint issues across the test-harness:
- copyloopvar: remove unnecessary Go <1.22 loop variable copies
- errcheck: handle intentionally ignored errors
- gocritic: use 0o octal literals, rewrite if-else chains as switch
- gosec: annotate intentional HTTP calls to known APIs
- misspell: fix British/American spelling
- modernize: use strings.SplitSeq, maps.Copy, built-in min
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

I found one issue in the sequential execution path where test failures could be silently ignored.

Comment thread tools/test-harness/cmd/run.go Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 15, 2026

I've posted my review. The key issue is in the sequential execution path in run.go (lines 111-116) where there's no status feedback showing whether tests pass or fail - only the initial "Running..." message is printed. This is inconsistent with the parallel mode which shows "x FAILED" or "+ passed" status updates, and with the PR description claiming --concurrency 1 preserves real-time feedback behavior.

github run

…odel manifest

Add requires_tools field to manifest model schema so models can declare
CLI tools needed for setup commands. The harness checks tools exist on
PATH before running setup and errors with a clear message listing the
missing tools.

Improve setup error reporting: both quiet and non-quiet modes now
capture stderr and include it in the error message so users see the
actual failure instead of a downstream error from an empty cog.yaml.

Add manifest-replicate.yaml with 15 Replicate production models.
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 15, 2026

@markphelps Bonk workflow was cancelled.

View workflow run · To retry, trigger Bonk again.

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, and lint fixes to the test-harness tool.

I found the same issue from the previous review exists in two more files: the sequential execution paths in build.go and schema_compare.go also lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED".

Comment thread tools/test-harness/cmd/build.go Outdated
Comment thread tools/test-harness/cmd/schema_compare.go Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 15, 2026

I reviewed the PR files and confirmed the same issue from the previous review exists in build.go and schema_compare.go - their sequential execution paths lack pass/fail feedback (unlike the parallel mode). I posted a batch review with suggestions to add this feedback to both files.

github run

Replace the quiet/buffered approach with a prefixWriter that prepends
each output line with the model name, like docker-compose:

  [flux-schnell        ] Step 3/12: RUN pip install ...
  [chatterbox          ] Building image cog-harness-chatterbox:test
  [flux-schnell        ] Successfully built abc123

Output streams in real-time even in parallel mode. Sequential mode
(--concurrency 1) streams without prefixes as before.
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

The same issue from previous reviews still exists in all three files: the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED". This contradicts the PR description claiming --concurrency 1 preserves real-time feedback behavior.

Comment thread tools/test-harness/cmd/build.go Outdated
Comment thread tools/test-harness/cmd/run.go Outdated
Comment thread tools/test-harness/cmd/schema_compare.go Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 15, 2026

The same issue from previous reviews still exists in all three files (build.go, run.go, and schema_compare.go): the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode. I've posted a batch review with suggestions for adding this missing feedback.

github run

Print === banners with timing info at each phase boundary so users can
see where time is being spent:

  === Preparing flux-schnell...
  === Building flux-schnell (timeout 900s)...
  === Build complete (62.3s)
  === Predict test 1/1: generate image from prompt (timeout 900s)...
  === Predict test 1/1 PASSED (48.7s)

In parallel mode these are prefixed with the model name like all other
output.
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

The same issue from previous reviews still exists in all three files: the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED". This contradicts the PR description claiming --concurrency 1 preserves real-time feedback behavior.

Comment thread tools/test-harness/cmd/build.go Outdated
Comment thread tools/test-harness/cmd/run.go Outdated
Comment thread tools/test-harness/cmd/schema_compare.go Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 15, 2026

I reviewed the three changed Go files to verify whether previous feedback had been addressed. Finding the same issue still existed (missing pass/fail feedback in sequential execution mode across all three files), I posted a batch GitHub review with code suggestions to fix the inconsistent behavior.

github run

Expose the existing SkipSchemaValidation option as a CLI flag. This is
useful when building models that fail schema validation due to import-time
side effects (e.g. accessing GPU/weights files) or when using a remote
Docker context where the legacy schema validation container cannot
access local resources.

Also add skip_schema_validation field to the test harness manifest so
models can opt out of schema validation per-model.
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

The same issue from previous reviews still exists in all three files: the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED". This contradicts the PR description claiming --concurrency 1 preserves real-time feedback behavior.

Comment thread tools/test-harness/cmd/build.go Outdated
Comment thread tools/test-harness/cmd/run.go Outdated
Comment thread tools/test-harness/cmd/schema_compare.go Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 15, 2026

Review posted successfully with suggestions for all three files.

github run

Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
cog predict has a default --setup-timeout of 300s which kills the
container if model setup (weight downloads) takes longer. Pass the
model's timeout value as the setup timeout so large models have
enough time to download weights during first run.
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

The same issue from previous reviews still exists in all three files: the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED". This contradicts the PR description claiming --concurrency 1 preserves real-time feedback behavior.

Comment thread tools/test-harness/cmd/build.go Outdated
Comment thread tools/test-harness/cmd/run.go Outdated
Comment thread tools/test-harness/cmd/schema_compare.go Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 15, 2026

Review posted successfully with suggestions for all three files.

github run

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

The PR adds parallel execution support, improved error reporting, setup hardening with set -euo pipefail, and lint fixes to the test-harness tool.

The same issue from previous reviews still exists in all three files: the sequential execution paths lack pass/fail feedback, inconsistent with the parallel mode which shows "+ passed" or "x FAILED". This contradicts the PR description claiming --concurrency 1 preserves real-time feedback behavior.

Comment thread tools/test-harness/cmd/build.go Outdated
Comment thread tools/test-harness/cmd/run.go Outdated
Comment thread tools/test-harness/cmd/schema_compare.go Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 15, 2026

Review posted successfully with suggestions for all three files.

github run

The sequential code paths (--concurrency 1) were missing per-model
status output after each build/run/compare, unlike the parallel paths.
Now both modes print consistent pass/fail/skip feedback.
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 15, 2026

LGTM

github run

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 16, 2026

LGTM

github run

Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
…ttps://github.com/replicate/cog into mphelps/test-harness-parallel-and-error-reporting

* 'mphelps/test-harness-parallel-and-error-reporting' of https://github.com/replicate/cog:
  Bump version to 0.18.0 (#2946)
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 16, 2026

LGTM

github run

…runner

- Replace errgroup.WithContext with WaitGroup.Go + semaphore channel;
  goroutines never returned errors so errgroup semantics were misleading
  and g.Wait() was silently discarded
- Extract generic runModels[T] helper to deduplicate the parallel/sequential
  loop that was copy-pasted across build.go, run.go, and schema_compare.go
- Use singleflight.Group in cloneRepo instead of holding a mutex during the
  entire git clone operation, which serialized all parallel clones
- Capture cloneRepo stderr into a buffer instead of writing directly to
  os.Stderr, preventing interleaved output in parallel mode
- Add --concurrency flag validation (reject values < 1 to prevent panic)
- Split modelOutput into modelLoggers + modelOutput so RunModel/BuildModel
  don't allocate capture buffers they never use
- Cap prefixWriter line buffer at 64 KiB to prevent unbounded growth from
  long lines without newlines (e.g. progress bars)
- Truncate runSetupCommands error output to last 2000 bytes, matching the
  pattern already used in buildModelWithEnv
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 16, 2026

LGTM

github run

@markphelps markphelps marked this pull request as ready for review April 16, 2026 16:02
@markphelps markphelps requested a review from a team as a code owner April 16, 2026 16:02
Improve schema-compare output so differences are immediately readable:

- Replace terse one-line-per-path diff format with unified-diff style
  output showing actual JSON values with --- static / +++ runtime
  headers and @@ path @@ hunks (like git diff)
- Sort diff keys alphabetically for deterministic output
- Suppress docker build log streaming during schema comparison builds
  (quietBuildModelWithEnv) so build output no longer intermingles with
  the diff report; build logs are still captured for error messages
- Add spacing around diff blocks in console report for readability
- Add table-driven tests for the new jsonDiff format
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 16, 2026

Looking at the current code, I can see the PR has been updated with a refactored runModels helper function in root.go that handles both sequential and parallel execution. The sequential path now properly prints status lines after each model completes, and all three commands (build, run, schema-compare) check for failures after processing.

The previous issues about missing pass/fail feedback in sequential mode appear to be resolved:

  • runModels in root.go:142-148 now prints status lines for both modes
  • Each command checks results and returns errors via formatFailureSummary

LGTM

github run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant