Here provides some examples as a quickstart guide to get you up and running with RLightning.
Each example maintains its own virtual environment under examples/<project_name>/.venv.
PPO-based fine-tuning of OpenVLA (a 7B vision-language-action model) on ManiSkill manipulation tasks.
1. Environment setup
cd examples/openvla_ppo
uv syncNote
Python 3.11 is required (the flash_attn pre-built wheel only
supports cp311). Pin the version in pyproject.toml:
requires-python = "==3.11.*"2. Download model weights
Install the download tool and fetch the checkpoint (China mainland users
can set HF_ENDPOINT for acceleration):
uv pip install huggingface_hub
export HF_ENDPOINT="https://hf-mirror.com" # optional, for China mainland
.venv/bin/huggingface-cli download gen-robot/openvla-7b-rlvla-warmup \
--local-dir /data/ckpts/gen-robot/openvla-7b-rlvla-warmupThe default config expects the checkpoint at
/data/ckpts/gen-robot/openvla-7b-rlvla-warmup. To use a different
path, update model_path and tokenizer_path in
conf/policy/openvla_ppo.yaml.
3. Download simulation assets
ManiSkill built-in assets (bridge table scene and WidowX robot):
source .venv/bin/activate
python -m mani_skill.utils.download_asset bridge_v2_real2sim -y
python -m mani_skill.utils.download_asset widowx250s -yCustom scene assets (carrot/plate objects and table overlay backgrounds):
cd examples/openvla_ppo/maniskill
../.venv/bin/hf download --repo-type dataset RLinf/maniskill_assets --local-dir ./assets4. Launch training
| Script | Mode | Use Case |
|---|---|---|
launch_train_ppo_sync.sh |
Single-GPU sync | Simplest, quick validation |
launch_train_ppo_ddp.sh |
DDP | 2 trainers + 1 eval worker |
launch_train_ppo_colocate_ddp_x8.sh |
Colocated DDP x8 | Large-scale multi-GPU |
Single-GPU quick start:
bash launch_train_ppo_sync.shThe script auto-detects GPU count, starts a Ray cluster, and launches training.
PPO-based fine-tuning of OpenPI (π₀/π₀.₅) vision-language-action models on the LIBERO manipulation benchmark.
1. Setup LIBERO
Clone LIBERO to .venv/LIBERO for editable install (required because
assets are not included when installing from git):
cd examples/openpi_ppo
uv venv .venv
bash scripts/setup_libero.sh2. Environment setup
cd examples/openpi_ppo
uv sync3. Setup OpenPI
Apply the transformers library patches required for OpenPI PyTorch models:
cd examples/openpi_ppo
bash scripts/setup_openpi.shThis also downloads OpenPI assets (tokenizer, etc.) and resolves the
pynvml / nvidia-ml-py conflict.
4. Download model weights
Install the download tool and fetch the checkpoint (China mainland users
can set HF_ENDPOINT for acceleration):
uv pip install huggingface_hub
export HF_ENDPOINT="https://hf-mirror.com" # optional, for China mainland
.venv/bin/huggingface-cli download RLinf/RLinf-Pi0-LIBERO-Spatial-Object-Goal-SFT \
--local-dir /data/ckpts/RLinf/RLinf-Pi0-LIBERO-Spatial-Object-Goal-SFTThe default config expects the checkpoint at
/data/ckpts/RLinf/RLinf-Pi0-LIBERO-Spatial-Object-Goal-SFT. To use a
different path, update model_path and tokenizer_path in
conf/policy/openpi_ppo.yaml.
5. Launch training
| Script | Mode | Use Case |
|---|---|---|
launch_train_ppo_sync.sh |
Single-GPU sync | Simplest, quick validation |
launch_train_ppo_sync_tiny.sh |
Single-GPU tiny | Reduced batch size, fast iteration |
launch_train_ppo_sync_ddp.sh |
DDP (8 GPUs) | Multi-GPU distributed |
launch_train_ppo_sync_tiny_ddp.sh |
DDP tiny (8 GPUs) | Multi-GPU with reduced batch size |
Single-GPU quick start:
cd RLightning
bash examples/openpi_ppo/launch_train_ppo_sync.shThe script auto-detects GPU count, starts a Ray cluster, and launches training.
Humanoid whole-body control (WBC) motion tracking using a Unitree robot in IsaacLab simulation.
Note
Prerequisite: NVIDIA GPU with an Isaac Sim compatible driver. This example has the highest environment requirements among all examples.
1. Download robot assets
cd RLightning
bash examples/wbc_tracking/setup.shThis downloads the Unitree robot URDF model to
examples/wbc_tracking/assets/unitree_description/.
2. Initialize git submodules
git submodule update --init --recursiveThe training depends on third_party/rsl_rl. This must be completed
before uv sync.
3. Environment setup
cd examples/wbc_tracking
uv syncDependencies are heavy: rlightning[dev, isaaclab, mujoco, humanoid] + rsl-rl.
4. Download and process motion data
cd RLightning
source examples/wbc_tracking/.venv/bin/activate4.1 Download the lafan motion capture dataset:
python -m rlightning.humanoid.utils.download.download_lafan4.2 Retarget motions to the Unitree robot:
PYTHONPATH=$PWD/examples python -m wbc_tracking.retarget_lafan --f-path .data/lafan14.3 Convert to WBC tracking task format:
PYTHONPATH=$PWD/examples python -m wbc_tracking.motion_converter --input-dir .data/lafan1/retargetedProcessed data is saved under .data/lafan1/retargeted/wbc_tracking/.
5. Launch training
| Script | Description |
|---|---|
launch.sh |
Single-node, multi-process |
launch_local.sh |
Local, no Ray |
launch_ddp.sh |
Single-node DDP |
launch_multi_node.sh |
Multi-node distributed |
launch_multi_node_ddp_x8.sh |
Multi-node DDP x8 |
cd RLightning
bash examples/wbc_tracking/launch.shNote
For multi-node scripts, start the Ray cluster manually on each node first. See Ray documentation.