Skip to content

Latest commit

 

History

History
278 lines (179 loc) · 6.73 KB

File metadata and controls

278 lines (179 loc) · 6.73 KB

Quickstart

Here provides some examples as a quickstart guide to get you up and running with RLightning.

Each example maintains its own virtual environment under examples/<project_name>/.venv.

OpenVLA PPO

PPO-based fine-tuning of OpenVLA (a 7B vision-language-action model) on ManiSkill manipulation tasks.

1. Environment setup

cd examples/openvla_ppo
uv sync

Note

Python 3.11 is required (the flash_attn pre-built wheel only supports cp311). Pin the version in pyproject.toml:

requires-python = "==3.11.*"

2. Download model weights

Install the download tool and fetch the checkpoint (China mainland users can set HF_ENDPOINT for acceleration):

uv pip install huggingface_hub

export HF_ENDPOINT="https://hf-mirror.com"   # optional, for China mainland

.venv/bin/huggingface-cli download gen-robot/openvla-7b-rlvla-warmup \
  --local-dir /data/ckpts/gen-robot/openvla-7b-rlvla-warmup

The default config expects the checkpoint at /data/ckpts/gen-robot/openvla-7b-rlvla-warmup. To use a different path, update model_path and tokenizer_path in conf/policy/openvla_ppo.yaml.

3. Download simulation assets

ManiSkill built-in assets (bridge table scene and WidowX robot):

source .venv/bin/activate
python -m mani_skill.utils.download_asset bridge_v2_real2sim -y
python -m mani_skill.utils.download_asset widowx250s -y

Custom scene assets (carrot/plate objects and table overlay backgrounds):

cd examples/openvla_ppo/maniskill
../.venv/bin/hf download --repo-type dataset RLinf/maniskill_assets --local-dir ./assets

4. Launch training

Script Mode Use Case
launch_train_ppo_sync.sh Single-GPU sync Simplest, quick validation
launch_train_ppo_ddp.sh DDP 2 trainers + 1 eval worker
launch_train_ppo_colocate_ddp_x8.sh Colocated DDP x8 Large-scale multi-GPU

Single-GPU quick start:

bash launch_train_ppo_sync.sh

The script auto-detects GPU count, starts a Ray cluster, and launches training.

OpenPI PPO (LIBERO)

PPO-based fine-tuning of OpenPI (π₀/π₀.₅) vision-language-action models on the LIBERO manipulation benchmark.

1. Setup LIBERO

Clone LIBERO to .venv/LIBERO for editable install (required because assets are not included when installing from git):

cd examples/openpi_ppo
uv venv .venv
bash scripts/setup_libero.sh

2. Environment setup

cd examples/openpi_ppo
uv sync

3. Setup OpenPI

Apply the transformers library patches required for OpenPI PyTorch models:

cd examples/openpi_ppo
bash scripts/setup_openpi.sh

This also downloads OpenPI assets (tokenizer, etc.) and resolves the pynvml / nvidia-ml-py conflict.

4. Download model weights Install the download tool and fetch the checkpoint (China mainland users can set HF_ENDPOINT for acceleration):

uv pip install huggingface_hub

export HF_ENDPOINT="https://hf-mirror.com"   # optional, for China mainland

.venv/bin/huggingface-cli download RLinf/RLinf-Pi0-LIBERO-Spatial-Object-Goal-SFT \
  --local-dir /data/ckpts/RLinf/RLinf-Pi0-LIBERO-Spatial-Object-Goal-SFT

The default config expects the checkpoint at /data/ckpts/RLinf/RLinf-Pi0-LIBERO-Spatial-Object-Goal-SFT. To use a different path, update model_path and tokenizer_path in conf/policy/openpi_ppo.yaml.

5. Launch training

Script Mode Use Case
launch_train_ppo_sync.sh Single-GPU sync Simplest, quick validation
launch_train_ppo_sync_tiny.sh Single-GPU tiny Reduced batch size, fast iteration
launch_train_ppo_sync_ddp.sh DDP (8 GPUs) Multi-GPU distributed
launch_train_ppo_sync_tiny_ddp.sh DDP tiny (8 GPUs) Multi-GPU with reduced batch size

Single-GPU quick start:

cd RLightning
bash examples/openpi_ppo/launch_train_ppo_sync.sh

The script auto-detects GPU count, starts a Ray cluster, and launches training.

WBC Tracking (IsaacLab + RSL_RL)

Humanoid whole-body control (WBC) motion tracking using a Unitree robot in IsaacLab simulation.

Note

Prerequisite: NVIDIA GPU with an Isaac Sim compatible driver. This example has the highest environment requirements among all examples.

1. Download robot assets

cd RLightning
bash examples/wbc_tracking/setup.sh

This downloads the Unitree robot URDF model to examples/wbc_tracking/assets/unitree_description/.

2. Initialize git submodules

git submodule update --init --recursive

The training depends on third_party/rsl_rl. This must be completed before uv sync.

3. Environment setup

cd examples/wbc_tracking
uv sync

Dependencies are heavy: rlightning[dev, isaaclab, mujoco, humanoid] + rsl-rl.

4. Download and process motion data

cd RLightning
source examples/wbc_tracking/.venv/bin/activate

4.1 Download the lafan motion capture dataset:

python -m rlightning.humanoid.utils.download.download_lafan

4.2 Retarget motions to the Unitree robot:

PYTHONPATH=$PWD/examples python -m wbc_tracking.retarget_lafan --f-path .data/lafan1

4.3 Convert to WBC tracking task format:

PYTHONPATH=$PWD/examples python -m wbc_tracking.motion_converter --input-dir .data/lafan1/retargeted

Processed data is saved under .data/lafan1/retargeted/wbc_tracking/.

5. Launch training

Script Description
launch.sh Single-node, multi-process
launch_local.sh Local, no Ray
launch_ddp.sh Single-node DDP
launch_multi_node.sh Multi-node distributed
launch_multi_node_ddp_x8.sh Multi-node DDP x8
cd RLightning
bash examples/wbc_tracking/launch.sh

Note

For multi-node scripts, start the Ray cluster manually on each node first. See Ray documentation.