A platform for reproducible world model research and evaluation.
Installation · Quick Start · Environments · Solvers & Baselines · Documentation · Citation
stable-worldmodel provides a single, unified interface for the three stages of world model research — collecting data, training, and evaluating with model-predictive control — across a large suite of standardized environments. It ships with reference implementations of common baselines and planning solvers so research code can stay focused on the contribution that matters: the model and the objective.
From PyPI:
pip install stable-worldmodelFrom source (development):
git clone https://github.com/galilai-group/stable-worldmodel
cd stable-worldmodel
uv venv --python=3.10 && source .venv/bin/activate
uv sync --all-extras --group devDatasets and checkpoints are stored under $STABLEWM_HOME (defaults to ~/.stable_worldmodel/). Override the variable to point at your preferred storage location.
The library is in active development. APIs may change between minor versions.
import stable_worldmodel as swm
from stable_worldmodel.policy import WorldModelPolicy, PlanConfig
from stable_worldmodel.solver import CEMSolver
# 1. Collect a dataset
world = swm.World("swm/PushT-v1", num_envs=8)
world.set_policy(your_expert_policy)
world.collect("data/pusht_demo.lance", episodes=100, seed=0)
# 2. Load it and train your world model (format is autodetected)
dataset = swm.data.load_dataset("data/pusht_demo.lance", num_steps=16)
world_model = ... # your model
# 3. Evaluate with model-predictive control
solver = CEMSolver(model=world_model, num_samples=300)
policy = WorldModelPolicy(solver=solver, config=PlanConfig(horizon=10))
world.set_policy(policy)
results = world.evaluate(episodes=50)
print(f"Success Rate: {results['success_rate']:.1f}%")Reference implementations are provided in scripts/train/: prejepa.py reproduces DINO-WM, and gcivl.py implements several goal-conditioned RL baselines.
GPU utilization for DINO-WM trained on Push-T with a DINOv2-Small backbone.
Recording, loading, and conversion all go through a small format registry. Pick the backend that matches your trade-off, or register your own.
| Format | On-disk layout | Best for |
|---|---|---|
lance |
LanceDB table (episode-contiguous flat rows) | default — append-friendly, fast indexed reads |
hdf5 |
single .h5 file (one dataset per column) |
portable single-file artifact |
folder |
.npz columns + one JPEG per step |
inspection, partial-key streaming |
video |
.npz columns + one MP4 per episode (decord) |
long episodes, compact image storage |
lerobot |
lerobot://<repo_id> (read-only adapter) |
training/eval directly on LeRobot Hub datasets |
world.collect("data/pusht.lance", episodes=100) # default: lance
world.collect("data/pusht_video", episodes=100, format="video") # mp4 episodes
ds = swm.data.load_dataset("data/pusht.lance", num_steps=16) # autodetect
swm.data.convert("data/pusht.lance", "data/pusht_video",
dest_format="video", fps=30) # one-shot migrationEvery writer accepts a mode kwarg ('append' (default), 'overwrite', 'error'); re-running world.collect extends the existing dataset rather than failing.
Environments are pulled from the DeepMind Control Suite, Gymnasium classic control, OGBench, Craftax, the Arcade Learning Environment (100+ Atari games), and classical world model benchmarks (Two-Room, PushT). Each ships with both visual and physical factors of variation for robustness studies. Adding a new environment only requires conforming to the Gymnasium interface.
| Environment ID | # FoV |
|---|---|
| swm/PushT-v1 | 16 |
| swm/TwoRoom-v1 | 17 |
| swm/OGBCube-v0 | 11 |
| swm/OGBScene-v0 | 12 |
| swm/HumanoidDMControl-v0 | 7 |
| swm/CheetahDMControl-v0 | 7 |
| swm/HopperDMControl-v0 | 7 |
| swm/ReacherDMControl-v0 | 8 |
| swm/WalkerDMControl-v0 | 8 |
| swm/AcrobotDMControl-v0 | 8 |
| swm/PendulumDMControl-v0 | 6 |
| swm/CartpoleDMControl-v0 | 6 |
| swm/BallInCupDMControl-v0 | 9 |
| swm/FingerDMControl-v0 | 10 |
| swm/ManipulatorDMControl-v0 | 8 |
| swm/QuadrupedDMControl-v0 | 7 |
| swm/CartPoleControl-v1 | 10 |
| swm/MountainCarControl-v0 | 5 |
| swm/MountainCarContinuousControl-v0 | 4 |
| swm/AcrobotControl-v1 | 11 |
| swm/PendulumControl-v1 | 9 |
| swm/FetchReach-v3 | 8 |
| swm/FetchPush-v3 | 11 |
| swm/FetchSlide-v3 | 11 |
| swm/FetchPickAndPlace-v3 | 11 |
| swm/CraftaxClassicPixels-v1 | — |
| swm/CraftaxClassicSymbolic-v1 | — |
| swm/CraftaxPixels-v1 | — |
| swm/CraftaxSymbolic-v1 | — |
| ALE/* (100+ Atari games) | — |
| Solver | Type |
|---|---|
| Cross-Entropy Method (CEM) | Sampling |
| Improved CEM (iCEM) | Sampling |
| Model Predictive Path Integral (MPPI) | Sampling |
| Predictive Sampling | Sampling |
| Gradient Descent (SGD, Adam) | Gradient |
| Projected Gradient Descent (PGD) | Gradient |
| Augmented Lagrangian | Constrained Opt |
| Baseline | Type |
|---|---|
| DINO-WM | JEPA |
| PLDM | JEPA |
| LeWM | JEPA |
| GCBC | Behaviour Cloning |
| GCIVL | RL |
| GCIQL | RL |
After installation, the swm command is available for inspecting datasets, environments, and checkpoints without writing code:
swm datasets # list cached datasets
swm inspect pusht_expert_train # inspect a specific dataset
swm envs # list all registered environments
swm fovs PushT-v1 # show factors of variation for an environment
swm checkpoints # list available model checkpointsThe full documentation lives at galilai-group.github.io/stable-worldmodel, with API references, tutorials, and guides.
@misc{maes_lelidec2026swm-1,
title = {stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation},
author = {Lucas Maes and Quentin Le Lidec and Dan Haramati and
Nassim Massaudi and Damien Scieur and Yann LeCun and
Randall Balestriero},
year = {2026},
eprint = {2602.08968},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
url = {https://arxiv.org/abs/2602.08968},
}Open an issue — happy to help.











































