Fine‑Tuning GR00T‑N1.5: From Human Demonstrations to Distributed Imitation Learning
This tutorial walks through fine‑tuning the complex VLA model GR00T‑N1.5 by collecting human demonstrations, annotating and massively augmenting data with DLC, performing distributed imitation learning, and validating the model through a server‑client DSW setup, complete with code snippets, resource specs, and visual examples.
Overview
This guide demonstrates an end‑to‑end pipeline for fine‑tuning the VLA model GR00T‑N1.5‑3B using the RobotLearningLab public dataset. The workflow includes human demonstration collection, dataset annotation, large‑scale synthetic data generation with DLC, distributed augmentation via Ray, data merging and conversion to Lerobot format, distributed imitation learning, and a server‑client closed‑loop evaluation.
Environment Setup
Launch a DSW instance with a Docker image that contains Isaac Lab 2.2 and its dependencies:
dsw-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-training-algorithm/isaac-sim:isaaclab220-nb4-v7-20250916Mount the public dataset at /mnt/RobotLearningLab_Dataset to avoid repeated downloads. Recommended instance types (Beijing region) are:
ecs.ebmgn8is.32xlarge
ecs.gn8is-8x.32xlarge
ecs.ebmgn8te.32xlarge
ecs.ebmgn9t.48xlarge
Human Demonstration
Start a VNC server inside the DSW container:
/opt/TurboVNC/bin/vncserver :0 -geometry 3840x2160From a local terminal, create an SSH tunnel to the DSW instance (replace <DSW_IP> and <DSW_PORT> with the instance’s public IP and port):
ssh -L 5900:127.0.0.1:5900 root@<DSW_IP> -p <DSW_PORT>Record demonstrations with the Isaac Lab script (teleoperation device can be spacemouse or keyboard):
mkdir -p /mnt/data/isaac_tmp/nb4/datasets
cd /workspace/RobotLearningLab && ./isaaclab.sh -p usecase/scripts/record_demos.py \
--task Isaac-Stack-Cube-Galbot-Left-Arm-RmpFlow-Rel-v0 \
--teleop_device keyboard \
--dataset_file /mnt/data/isaac_tmp/nb4/datasets/dataset.hdf5 \
--num_demos 10Key bindings during demonstration:
Reset: R Toggle gripper: K Move X axis: W/S Move Y axis: A/D Move Z axis: Q/E Rotate X axis: Z/X Rotate Y axis: T/G Rotate Z axis:
C/VData Annotation
Annotate the collected dataset.hdf5 to add sub‑task labels:
output_path_str=$EXTERNAL_STORAGE_PATH/datasets
annotate_command="cd $ROBOT_LEARNING_LAB_PATH && \
./isaaclab.sh -p usecase/scripts/annotate_demos.py \
--task Isaac-Stack-Cube-Galbot-Left-Arm-RmpFlow-Abs-Mimic-v0 \
--device cuda \
--auto \
--input_file $output_path_str/dataset.hdf5 \
--output_file $output_path_str/dataset_annotate.hdf5 \
--headless"
!$annotate_commandLarge‑Scale Data Augmentation
Generate synthetic trajectories with DLC:
generate_command="cd $ROBOT_LEARNING_LAB_PATH && \
./isaaclab.sh -p usecase/scripts/generate_dataset.py \
--task Isaac-Stack-Cube-Galbot-Left-Arm-RmpFlow-Abs-Mimic-v0 \
--device cuda \
--num_envs 10 \
--generation_num_trials 10000 \
--input_file $output_path_str/dataset_annotate.hdf5 \
--output_file $output_path_str/dataset_generate.hdf5 \
--headless"
!$generate_commandImportant flags: --num_envs 10: run 10 parallel environments. --generation_num_trials 10000: target 10 000 successful trajectories. --device cuda: GPU acceleration. --headless: no GUI.
Distributed Augmentation with Ray
Create a Ray task that distributes the generation across multiple nodes:
/workspace/RobotLearningLab/isaaclab.sh -p /mnt/data/isaac_tmp/nb4/datasets/ray_isaac_new.py \
--command "cd /workspace/RobotLearningLab && \
./isaaclab.sh -p /mnt/data/isaac_tmp/nb4/datasets/generate_dataset_ray.py \
--task Isaac-Stack-Cube-Galbot-Left-Arm-RmpFlow-Abs-Mimic-v0 \
--device cuda \
--num_envs 10 \
--generation_num_trials 625 \
--input_file /mnt/data/isaac_tmp/nb4/datasets/dataset_annotate.hdf5 \
--output_file /mnt/data/isaac_tmp/nb4/datasets/dataset_generate.hdf5 \
--headless" \
--gpu 1 --cpu 10 --memory 80 --num_per_worker 8Resource layout per worker:
CPU: 10 cores
Memory: 80 GB
GPU: 1
Tasks per worker: 8
Data Processing
Merge the HDF5 shards produced by DLC, replay trajectories to generate videos, and convert the dataset to Lerobot joint‑space format.
# Merge successful shards
python merge_hdf5_datasets.py --input_files $(ls $EXTERNAL_STORAGE_PATH/datasets/*_*.hdf5) --output_file merged_dataset.hdf5
# Replay for video generation
replay_command="cd $ROBOT_LEARNING_LAB_PATH && \
./isaaclab.sh -p usecase/scripts/replay_demos_with_camera.py \
--task Isaac-Stack-Cube-Galbot-Left-Arm-Image-Based-v0 \
--dataset_file $output_path_str/dataset_generate.hdf5 \
--num_envs 10 --video --video_path $output_path_str \
--camera_view_list ego left_wrist right_wrist --headless"
!$replay_command
# Convert to Lerobot format
convert_cmd="cd $ROBOT_LEARNING_LAB_PATH && \
./isaaclab.sh -p benchmarks/gr00t/convert_hdf5_to_lerobot_joint_space.py \
--data_root $output_path \
--hdf5_filename dataset_generate.hdf5 \
--hdf5_file_path $output_path/dataset_generate.hdf5 \
--lerobot_data_dir $output_path/lerobot_joint_space"
!$convert_cmdDistributed Imitation Learning
Fine‑tune the GR00T‑N1.5‑3B checkpoint on the augmented Lerobot dataset using two GPUs:
cd /mnt/data/isaac_tmp/nb4/Isaac-GR00T && \
export WANDB_MODE=offline && NCCL_P2P_DISABLE=1 NCCL_IB_DISABLE=1 \
python scripts/gr00t_finetune.py \
--base_model_path /mnt/data/isaac_tmp/nb4/GR00T-N1.5-3B \
--dataset-path /mnt/data/isaac_tmp/nb4/datasets/lerobot_joint_space \
--num-gpus 2 \
--batch-size 2 \
--output-dir /mnt/data/isaac_tmp/nb4/datasets/joint_space_2_2 \
--max-steps 40000 \
--data-config galbot_joint_space \
--video-backend decord \
--no-tune-visualKey flags: --base_model_path: path to the pre‑trained GR00T‑N1.5‑3B checkpoint. --dataset-path: directory containing the augmented Lerobot data. --num-gpus 2: number of GPUs for training (adjustable). --no-tune-visual: keep the visual encoder frozen.
Closed‑Loop Evaluation (Server‑Client DSW)
Server side – launch an inference server that hosts the fine‑tuned model:
cd /mnt/data/isaac_tmp/nb4/Isaac-GR00T && \
python gr00t_inference_server.py --port 5555 \
--model_path /mnt/data/isaac_tmp/nb4/checkpoint-40000 \
--data_config galbot_joint_spaceObtain the private IP of the server instance (example 10.0.0.207) for the client to connect:
PRI_IP=$(ifconfig eth1 | grep 'inet ' | awk '{print $2}') && echo "My private IP is: $PRI_IP"Client side – run an inference client that connects to the server and executes the stacked‑cube task:
cd /workspace/RobotLearningLab && ./isaaclab.sh -p benchmarks/gr00t/gr00t_inference_client.py \
--server_port 5555 --server_host 10.0.0.207 \
--num_total_experiments 100 --num_success_steps 8 \
--policy_type joint_space \
--task Isaac-Stack-Cube-Galbot-Left-Arm-Joint-Position-Image-Based-v0The client visualizes the robot performing the task under the guidance of the fine‑tuned policy.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
