What’s New in Qwen2? A Deep Dive into the Latest Open‑Source LLMs
Qwen2 introduces five new pre‑trained and instruction‑tuned LLM sizes, expands multilingual training to 27 languages, boosts code and math abilities, supports up to 128K context tokens, and achieves leading benchmark results across NLU, code, math, and safety, with detailed model specs and evaluation data provided.
Introduction
Qwen2 is the next major upgrade of the Qwen series, offering five model sizes ranging from 0.5B to 72B parameters. The models are released both as pre‑trained checkpoints and instruction‑tuned variants, and are publicly available on Hugging Face and ModelScope.
Model Specifications
The five sizes are:
Qwen2‑0.5B (0.49 B total parameters, 0.35 B non‑embedding)
Qwen2‑1.5B (1.54 B total, 1.31 B non‑embedding)
Qwen2‑7B (7.07 B total, 5.98 B non‑embedding)
Qwen2‑57B‑A14B (57.41 B total, 56.32 B non‑embedding)
Qwen2‑72B (72.71 B total, 70.21 B non‑embedding)
All models use GQA (True). Tie‑embedding is enabled for the two smallest models and disabled for the larger ones. Context length defaults to 32K tokens for pre‑trained models, with instruction‑tuned variants supporting up to 128K tokens (Qwen2‑7B‑Instruct and Qwen2‑72B‑Instruct achieve 128K via YARN or Dual Chunk Attention).
Multilingual Training
Beyond Chinese and English, Qwen2 adds high‑quality data for 27 additional languages covering Western Europe, Eastern Europe, the Middle East, East Asia, Southeast Asia, and South Asia. The training also reduces the probability of unintended code‑switching, improving language‑consistency in multilingual prompts.
Evaluation
Compared with the previous Qwen1.5 series, Qwen2‑72B shows substantial gains across a wide range of benchmarks, surpassing leading open‑source models such as Llama‑3‑70B and the largest Qwen1.5‑110B. The evaluation covers natural language understanding, knowledge, code, mathematics, and multilingual capabilities.
Instruction‑tuned models were further refined with large‑scale supervised fine‑tuning, feedback‑model training, and online DPO, using automated methods for high‑quality instruction data (e.g., rejection sampling for math, code execution feedback, back‑translation for creative writing, scalable oversight for role‑play). These steps improve code, math, reasoning, instruction following, and alignment with human values while minimizing manual labeling.
Safety testing on four multilingual harmful‑query categories (illegal activity, fraud, pornography, privacy‑violence) shows Qwen2‑72B‑Instruct matches GPT‑4 and significantly outperforms Mistral‑8x22B, with statistical significance confirmed by p‑value analysis.
Highlights
Code & Mathematics
By integrating lessons from CodeQwen1.5, Qwen2 achieves notable improvements in programming language tasks and mathematical problem solving, especially for the 72B‑Instruct variant.
Long‑Context Processing
All Instruct models are trained on 32K context and extended to longer contexts (up to 128K) using YARN or Dual Chunk Attention. Needle‑in‑a‑Haystack experiments demonstrate perfect information extraction at 128K tokens for Qwen2‑72B‑Instruct, with smaller models handling 64K or 32K tokens accordingly.
Safety
Extensive multilingual safety benchmarks reveal Qwen2‑72B‑Instruct’s competitive performance against top commercial models.
Usage and Availability
The models are open‑source under Apache 2.0 (all sizes except Qwen2‑72B, which remains under the Qianwen License). Users can access model cards for detailed usage instructions, feature lists, and metrics.
Future Directions
The team plans to train larger models, explore scaling laws, and extend Qwen2 to multimodal capabilities, incorporating vision and speech understanding in upcoming releases.
Citation
@article{qwen2,
title={Qwen2 Technical Report},
year={2024}
}Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
