Popular repositories Loading
-
skillsbench
skillsbench PublicSkillsBench evaluates how well skills work and how effective agents are at using them
-
-
Repositories
Showing 10 of 18 repositories
- benchflow Public
Framework for creating high fidelity and complex RL environments and evaluation tasks
benchflow-ai/benchflow’s past year of commit activity - skillsbench-leaderboard Public
benchflow-ai/skillsbench-leaderboard’s past year of commit activity - skillsbench Public
SkillsBench evaluates how well skills work and how effective agents are at using them
benchflow-ai/skillsbench’s past year of commit activity - posttrainarena Public
benchflow-ai/posttrainarena’s past year of commit activity - mini-swe-agent Public Forked from SWE-agent/mini-swe-agent
The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!
benchflow-ai/mini-swe-agent’s past year of commit activity - benchmarks Public
benchflow-ai/benchmarks’s past year of commit activity - skillsbench-trajectories Public
benchflow-ai/skillsbench-trajectories’s past year of commit activity - mockflow Public
benchflow-ai/mockflow’s past year of commit activity - agent-client-protocol Public Forked from agentclientprotocol/agent-client-protocol
A protocol for connecting any editor to any agent
benchflow-ai/agent-client-protocol’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…