Skip to content

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Dec 16, 2025

Summary

This PR is now a no-op branch sync (it no longer contains the SWE-bench report standardization changes).

Status

The report model + GAIA/commit0/SWT report updates were moved to a clean branch and PR:

If you want #170 closed, let me know and I can close it.

simonrosenberg and others added 3 commits December 16, 2025 18:26
- Remove unused imports in build_images.py
- Fix whitespace formatting in eval_infer.py

Co-authored-by: openhands <[email protected]>
- Move ensure_docker_running() to benchmarks/utils/docker.py
- Unify Docker connectivity check for both local and remote (DOCKER_HOST) cases
- Remove unnecessary dockerd startup logic that required root/privileged mode
- Users running locally should have Docker already running
- Remote evaluation workflow polls DOCKER_HOST sidecar until ready
- Clearer error messages guide users on how to fix Docker connectivity issues

Co-authored-by: openhands <[email protected]>
@OpenHands OpenHands deleted a comment from openhands-ai bot Dec 26, 2025
@OpenHands OpenHands deleted a comment from openhands-ai bot Dec 26, 2025
@OpenHands OpenHands deleted a comment from openhands-ai bot Dec 26, 2025
@simonrosenberg simonrosenberg force-pushed the feature/swtbench-workflow-sync branch from 17af0e4 to 8d19438 Compare December 26, 2025 15:06
@openhands-ai
Copy link

openhands-ai bot commented Dec 26, 2025

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Pre-commit checks
    • Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #170 at branch `feature/swtbench-workflow-sync`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants