Artificial Intelligence 10 min read

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

dots.llm1, an open‑source 142‑billion‑parameter Mixture‑of‑Experts language model from hi lab, achieves Qwen2.5‑72B‑level performance after training on 11.2 T high‑quality tokens, and the release includes full models, intermediate checkpoints, and detailed training pipelines for the research community.

Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

Overview

dots.llm1 is a large‑scale Mixture‑of‑Experts (MoE) language model released by the Humane Intelligence Lab (hi lab). It contains 142 billion total parameters, activates 14 billion per token, and after training on 11.2 T high‑quality tokens reaches performance comparable to Qwen2.5‑72B.

Model Details

Parameters: 142 B total, 14 B active.

MoE configuration: 6‑in‑128 experts with 2 shared experts.

Training data: 11.2 T tokens from Common Crawl and proprietary web crawl, filtered and de‑duplicated.

Training efficiency: Interleaved 1F1B pipeline with All‑to‑All overlap and optimized grouped GEMM, yielding ~14 % forward and ~6.7 % backward speed‑ups on H800 GPUs.

Training Procedure

The pre‑training uses a decoder‑only Transformer inspired by DeepSeek, with WSD learning‑rate schedule, batch‑size scaling from 64 M to 128 M tokens, and two fine‑tuning stages (base and instruct) that bring the model on par with Qwen2.5‑72B on multilingual, math, code and alignment benchmarks.

Open‑Source Release

hi lab provides the final Instruct model, the base model, intermediate checkpoints every 1 T tokens, and detailed hyper‑parameters, enabling continued pre‑training, annealing, long‑document training, or supervised fine‑tuning. Model and code are hosted on Hugging Face and GitHub.

Resources

Model repository: https://huggingface.co/rednote-hilab and GitHub .

Mixture of Expertsopen-sourcelarge language modelAI researchTraining Efficiency
Xiaohongshu Tech REDtech
Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.