Skip to content

chore: upgrade skillgym to 0.8.0#493

Merged
thymikee merged 1 commit into
mainfrom
codex/upgrade-skillgym-0-8
May 11, 2026
Merged

chore: upgrade skillgym to 0.8.0#493
thymikee merged 1 commit into
mainfrom
codex/upgrade-skillgym-0-8

Conversation

@thymikee
Copy link
Copy Markdown
Member

Summary

Upgrade SkillGym from 0.6.0 to 0.8.0 and update the benchmark suite/docs for the v0.8 API.

Details:

  • Use the v0.8 Case type export instead of TestCase.
  • Adopt soft source-read guardrail assertions with deferred explain questions.
  • Document v0.8 repeat/retry/explain workflow notes.

Touched files: 4. Scope stayed within SkillGym tooling/tests/docs.

Known gaps:

  • Full SkillGym run had 2 stochastic agent-output failures, both passed on targeted rerun.
  • SkillGym maintainer notes: TestCase export removal, skillgym run --help behavior, parallel SIGINT/SIGTERM listener warning, and v0.8 changelog missing from the tag.

Validation

  • pnpm format
  • pnpm check:tooling
  • pnpm test:skillgym (140/142 executions passed; reran both failed case/runner pairs successfully)
  • pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --case same-session-mutations-serial --runner claude-haiku
  • pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --case batch-inline-step-schema-positionals --runner codex-mini

@github-actions
Copy link
Copy Markdown

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-493/

Built to branch gh-pages at 2026-05-11 09:31 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@thymikee thymikee merged commit 076f0c0 into main May 11, 2026
18 checks passed
@thymikee thymikee deleted the codex/upgrade-skillgym-0-8 branch May 11, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant