Step-by-Step Guide: Train a Lerobot Robotic Arm from Scratch on GPUFree
This tutorial walks you through renting a GPUFree RTX 4090 cloud instance, uploading your Lerobot dataset, launching training via a lightweight Flask web UI, automatically shutting down the server, and downloading the trained model, all with detailed code snippets and practical tips.
Preparation
Ensure the dataset directory exists, e.g. ~/.cache/huggingface/lerobot/mytest/so100_test. The system disk is limited to 30 GB; using a larger data disk is recommended.
Rent a GPUFree cloud instance
Register or log in at https://www.gpufree.cn/market.
Select an RTX 4090 instance (≈ ¥1.38 per hour).
Choose an Ubuntu 20.04 or 22.04 image with Miniconda or PyTorch pre‑installed.
Recharge about ¥10 and create the instance.
Start the instance with GPU enabled for training; use the “no‑GPU” mode for non‑training operations.
Web UI for remote training
The official Lerobot image on GPUFree provides a Flask web service that handles dataset upload, training launch and model download.
One‑click upload of a local dataset folder.
Select training algorithm (ACT, Pi0.5, Smolval, etc.) and start training.
Automatic shutdown of the instance after training to avoid extra charges.
Download the zipped model directly from the browser.
Start the Flask service
Create a JupyterLab notebook or a Python file named remote_train.ipynb (or remote_train.py) and run the following code. The service listens on port 7001, which GPUFree maps to a public address.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""LeRobot 远程训练平台"""
import os, time, subprocess, threading, shutil, re
from flask import Flask, request, jsonify, send_file, render_template_string
from werkzeug.utils import secure_filename
app = Flask(__name__)
UPLOAD_FOLDER = './upload_temp'
DOWNLOAD_FOLDER = './output'
DOWNLOAD_TEMP_FOLDER = './download_temp'
MAX_CONTENT_LENGTH = 10*1024*1024*1024 # 10 GiB
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.config['DOWNLOAD_FOLDER'] = DOWNLOAD_FOLDER
app.config['MAX_CONTENT_LENGTH'] = MAX_CONTENT_LENGTH
def ensure_directories():
for folder in [UPLOAD_FOLDER, DOWNLOAD_FOLDER, DOWNLOAD_TEMP_FOLDER]:
if not os.path.path.exists(folder):
os.makedirs(folder, exist_ok=True)
# ... (Flask routes for /list_datasets, /start_training, /download_model) ...Upload the collected dataset
In the UI click “Upload”, select the hidden folder ~/.cache/huggingface/lerobot/.../so100_test (press Ctrl+H to show hidden files). Upload speed is about 3 MiB/s; the data is stored on the instance under /root/data/upload_temp.
Start training
Select the uploaded dataset, choose an algorithm (e.g., ACT), optionally adjust parameters, and click “Start Training”. Real‑time logs are displayed. When training finishes the model is zipped and placed in /root/data/download_temp. Enable “auto‑shutdown” to terminate the instance automatically.
Download the model
The right‑hand side of the UI lists the zipped model file. Click to download, then unzip locally, for example:
lerobot-record \
--robot.type=so100_follower \
--robot.port=/dev/ttyACM1 \
--robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
--robot.id=my_awesome_follower_arm \
--display_data=false \
--dataset.repo_id=${HF_USER}/eval_so100 \
--dataset.single_task="Put lego brick into the transparent box" \
--policy.path={your_unzipped_directory}Key implementation details
The Flask service defines the following routes (simplified): /list_datasets (GET) – returns a JSON list of uploaded datasets with validation. /start_training (POST) – receives JSON with dataset, algorithm, and optional parameters, launches the training subprocess, and stores process handles and logs. /download_model (GET) – packages the latest output directory into a zip archive and returns the file.
Utility functions ensure_directories, extract_output_dir, and pack_model handle directory creation, parsing of the training command for the --output_dir argument, and creation of a timestamped zip archive.
Summary
Using a GPUFree RTX 4090 instance reduces training time dramatically compared with local CPU training. The web UI and the ~500‑line Flask service allow dataset upload, training launch, automatic shutdown, and model download without manual Linux configuration. The entire workflow—from preparing the dataset to running lerobot-record with the downloaded policy—fits within a few hundred lines of Python and a few hundred yuan of cloud cost.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ShiZhen AI
Tech blogger with over 10 years of experience at leading tech firms, AI efficiency and delivery expert focusing on AI productivity. Covers tech gadgets, AI-driven efficiency, and leisure— AI leisure community. 🛰 szzdzhp001
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
