AntAngelMed: 6.1B‑Activated MoE Model Tops Three Medical Benchmarks
AntAngelMed, a 100‑billion‑parameter medical LLM using a 6.1 billion‑parameter MoE architecture, achieves performance comparable to a 40 billion‑parameter dense model, exceeds 200 tokens/s inference speed, and ranks first on HealthBench, MedAIBench and MedBench, with a three‑stage training pipeline and extensive efficiency optimizations.
In collaboration with the Zhejiang Provincial Health Information Center, Ant Group Health, and Zhejiang AnZhenEr Medical AI Technology Co., the largest and most capable open‑source medical language model, AntAngelMed, has been released.
The model builds on Ling‑flash‑2.0 and adopts a Mixture‑of‑Experts (MoE) architecture with 100 B total parameters, but only 6.1 B are activated at inference, delivering performance roughly equivalent to a 40 B dense model while achieving inference speeds over 200 tokens/s.
Across the three leading medical benchmark suites—HealthBench, MedAIBench, and MedBench—AntAngelMed consistently ranks first, surpassing all open‑source models and many top closed‑source alternatives. It shows especially strong gains on the challenging HealthBench‑Hard subset and dominates the 36‑dataset, ~700 k‑sample MedBench collection.
The training pipeline consists of three stages. Continuous pre‑training on a large, high‑quality medical corpus (encyclopedias, web text, academic publications) injects deep domain and world knowledge. Supervised fine‑tuning (SFT) uses a multi‑source instruction dataset that blends general tasks (math, coding, logic) with medical scenarios (patient‑doctor Q&A, diagnostic reasoning, safety/ethics) to sharpen clinical performance. Finally, reinforcement learning (RL) with the GRPO algorithm and a task‑specific reward model refines behavior, emphasizing empathy, clear structure, safety boundaries, and evidence‑based reasoning to reduce hallucinations.
AntAngelMed inherits Ling‑flash‑2.0’s advanced design and applies Ling Scaling Laws together with a 1/32 activation MoE layout. Core components—expert granularity, shared‑expert ratio, attention balance, sigmoid routing without auxiliary loss, MTP layer, QK‑Norm, and Partial‑RoPE—are fully optimized, yielding up to a 7× efficiency boost over an equally sized dense model. With only 61 B active parameters, it matches the capability of a 400 B dense model.
Inference has been further accelerated using FP8 quantization combined with EAGLE3 optimizations. Under 32‑thread concurrency, throughput surpasses FP8‑only baselines. On H20 hardware, the model runs three times faster than a 360 B dense counterpart, and with YaRN extrapolation it supports a 128 K context length, delivering up to 7× relative speedup for longer outputs. Benchmarks on HumanEval, GSM8K, and Math‑500 show improvements of 71 %, 45 %, and up to 94 % respectively, balancing speed and stability.
Developers and small‑to‑mid‑size enterprises in the medical AI field can now access the model for free. Trial and source code are available at:
https://modelscope.cn/studios/MedAIBase/AntAngelMed
https://modelscope.cn/models/MedAIBase/AntAngelMed
https://github.com/MedAIBase/AntAngelMed
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
