What Are We Actually Benchmarking in Robot Manipulation?

Public release package for the paper "What Are We Actually Benchmarking in Robot Manipulation?"

Project website: https://ripl.github.io/manipulation_benchmark_audit/

Purpose

This repository contains lightweight public artifacts for the manipulation benchmark audit diagnostics. It is a curated release layer: CSV/JSON/YAML/MD result files, claim mappings, and lightweight support scripts, not a dump of internal training/evaluation workspaces.

Layout

.
├── shortcut_solvability/
├── statistical_significance/
├── creeping_overfitting/
├── data_source_dependency/
├── leaderboards/
├── provenance/
├── scripts/
├── CLAIMS.md
├── SHA256SUMS
├── public_manifest.json
├── LICENSE
└── README.md

Included Diagnostics

shortcut_solvability/: LIBERO and CALVIN DINO+MLP/task-id shortcut-solvability summaries, configs, and compact per-trial/per-sequence outcomes.
statistical_significance/: LIBERO Goal five-policy 5k shared-instance outcome rows, policy summaries, pairwise-disagreement summary, aggregate leaderboard significance-category CSVs, cutoff reference code, and shared init-state/config provenance.
creeping_overfitting/: SimplerEnv fixed-grid and Protocol A-E rows/summaries, CALVIN resampled-pose and fresh-sequence rows/summaries, and LIBERO Layer 2 summaries.
data_source_dependency/: scripted-demo WidowX data-source-dependency summaries and official 4 x 24 grid trial outcomes.
leaderboards/: copied public leaderboard CSV snapshots and the Sam official-protocol previous-SOTA exports used for the significance category tables.
provenance/: best-effort package/environment provenance and checkpoint identity manifests, with unrecoverable exact fields marked unknown.

Exclusions

This release intentionally excludes model weights, datasets, rollout videos, full rollout directories, full observations, per-step action traces, simulator caches, conda environments, containers, raw logs, third-party source checkouts, Git metadata, browser state, credential files, and credential material. Checkpoint identity metadata is included without binary payloads. Private paths, hostnames, job IDs, and W&B links are not release blockers by policy if credential-clean, but this package keeps them minimal.

Contact

For questions, please either post an issue to this repository or email Tianchong Jiang at tianchongj@ttic.edu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What Are We Actually Benchmarking in Robot Manipulation?

Purpose

Layout

Included Diagnostics

Exclusions

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
creeping_overfitting		creeping_overfitting
data_source_dependency		data_source_dependency
leaderboards		leaderboards
provenance		provenance
scripts		scripts
shortcut_solvability		shortcut_solvability
statistical_significance		statistical_significance
.gitignore		.gitignore
CLAIMS.md		CLAIMS.md
LICENSE		LICENSE
README.md		README.md
SHA256SUMS		SHA256SUMS
public_manifest.json		public_manifest.json

Folders and files

Latest commit

History

Repository files navigation

What Are We Actually Benchmarking in Robot Manipulation?

Purpose

Layout

Included Diagnostics

Exclusions

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages