- PyTorch NGC 25.05/25.06/25.08
- ✅ 25.05 (CUDA 12.8)
- ✅ 25.06 (CUDA 12.9)
⚠️ 25.08 (CUDA 13.0)
- CPU 64 cores / fGPU 8.0 / RAM 256GB per Node
# ❗ NOTE: command would be run inside of vFolder (NFS)
git clone https://github.com/lablup/torchtitan-scripts-only
cd torchtitan-scripts-only
bash setup_all.shThis script will read BACKENDAI_* environment variables and setup all nodes within cluster.
bash pdsh_run_fsdp_auto.shYou can also test this script with small debug model (~100M params)
bash pdsh_run_fsdp_auto_debug.sh