How X‑R1’s New Open‑Source 0.5B/1.5B/3B Models Enable LoRA and Chinese Inference

The X‑R1 release introduces fully open‑source 0.5B, 1.5B and 3B models with one‑click training scripts, LoRA fine‑tuning support, Chinese inference capabilities, detailed reward‑curve visualizations, and quick‑start instructions for both CUDA and Ascend platforms.

Baobao Algorithm Notes
Baobao Algorithm Notes
Baobao Algorithm Notes
How X‑R1’s New Open‑Source 0.5B/1.5B/3B Models Enable LoRA and Chinese Inference

Release Highlights

On February 16, 2025 X‑R1 added LoRA support; on February 15 a Chinese training dataset was released; on February 13 the X‑R1‑3B model became publicly available with Colab inference; on February 12 the X‑R1‑1.5B configuration, Weights & Biases logs, and model files were released.

Fully Open‑Source Model Sizes

The X‑R1 project now provides standard training scripts for the 0.5B, 1.5B and 3B variants via the R1‑Zero script, which can be launched with a single command: bash ./scripts/run_x_r1_zero.sh All training data are publicly available, and a Colab notebook is provided for quick testing.

Training Without SFT

Using four 3090 (24 GB) GPUs, the GRPO optimizer successfully trained the 0.5B/1.5B/3B models without supervised fine‑tuning, achieving strong inference and format‑following abilities, as demonstrated by the 3B model’s reward curve.

Chinese Inference Adaptation

The X‑R1‑3B‑CN checkpoint is released, showing that the 3B model can learn to reason over Chinese math problems with high accuracy. Example inference output follows the expected format and produces correct answers.

ACCELERATE_LOG_LEVEL=info \
accelerate launch \
  --config_file recipes/zero3.yaml \
  --num_processes=3 \
  src/x_r1/grpo.py \
  --config recipes/examples/mathcn_zero_3B_config.yaml \
  > ./output/mathcn_3B_sampling.log 2>&1

Reward Curve

Training on four 3090 GPUs for roughly 16 hours with 7,500 Chinese math examples produced the reward curve shown below.

LoRA Fine‑Tuning

LoRA training is now supported on a single 3090 (24 GB) for 7B models. A minimal example command is:

ACCELERATE_LOG_LEVEL=info \
accelerate launch \
  --config_file recipes/zero1.yaml \
  --num_processes=1 \
  src/x_r1/grpo.py \
  --config recipes/X_R1_zero_0dot5B_peft_config.yaml \
  > ./output/x_r1_test_sampling.log 2>&1

In the *.yaml configuration file, add LoRA parameters such as:

lora_r: 32
lora_target_modules: ["q_proj","v_proj","k_proj","embed_tokens"]
lora_alpha: 8
lora_dropout: 0.0
bias: "none"
use_peft: true

Quick Start

With CUDA ≥12.4 and flash‑attn installed, the environment can be set up in minutes:

conda create -n xr1 python=3.11
conda activate xr1
pip install -r requirements.txt
pip install flash-attn

The model also runs successfully on Huawei Ascend 910B.

Future Plans

X‑R1 will support LLM‑As‑a‑Judge evaluation.

Vertical domain adaptations (e.g., medical) will be provided via the R1‑Zero framework.

Standard benchmark results on MATH500 and AIMO will be added.

Open‑Source Repository

Repository: https://github.com/dhcode-cpp/X-R1
LoRAAI researchopen-sourceChinese inferenceX-R1
Baobao Algorithm Notes
Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.