Artificial Intelligence 6 min read

AntAngelMed: 6.1B‑Activated MoE Model Tops Three Medical Benchmarks

AntAngelMed, a 100‑billion‑parameter medical LLM using a 6.1 billion‑parameter MoE architecture, achieves performance comparable to a 40 billion‑parameter dense model, exceeds 200 tokens/s inference speed, and ranks first on HealthBench, MedAIBench and MedBench, with a three‑stage training pipeline and extensive efficiency optimizations.

SuanNi

May 12, 2026

AntAngelMed: 6.1B‑Activated MoE Model Tops Three Medical Benchmarks

In collaboration with the Zhejiang Provincial Health Information Center, Ant Group Health, and Zhejiang AnZhenEr Medical AI Technology Co., the largest and most capable open‑source medical language model, AntAngelMed, has been released.

The model builds on Ling‑flash‑2.0 and adopts a Mixture‑of‑Experts (MoE) architecture with 100 B total parameters, but only 6.1 B are activated at inference, delivering performance roughly equivalent to a 40 B dense model while achieving inference speeds over 200 tokens/s.

Across the three leading medical benchmark suites—HealthBench, MedAIBench, and MedBench—AntAngelMed consistently ranks first, surpassing all open‑source models and many top closed‑source alternatives. It shows especially strong gains on the challenging HealthBench‑Hard subset and dominates the 36‑dataset, ~700 k‑sample MedBench collection.

The training pipeline consists of three stages. Continuous pre‑training on a large, high‑quality medical corpus (encyclopedias, web text, academic publications) injects deep domain and world knowledge. Supervised fine‑tuning (SFT) uses a multi‑source instruction dataset that blends general tasks (math, coding, logic) with medical scenarios (patient‑doctor Q&A, diagnostic reasoning, safety/ethics) to sharpen clinical performance. Finally, reinforcement learning (RL) with the GRPO algorithm and a task‑specific reward model refines behavior, emphasizing empathy, clear structure, safety boundaries, and evidence‑based reasoning to reduce hallucinations.

AntAngelMed inherits Ling‑flash‑2.0’s advanced design and applies Ling Scaling Laws together with a 1/32 activation MoE layout. Core components—expert granularity, shared‑expert ratio, attention balance, sigmoid routing without auxiliary loss, MTP layer, QK‑Norm, and Partial‑RoPE—are fully optimized, yielding up to a 7× efficiency boost over an equally sized dense model. With only 61 B active parameters, it matches the capability of a 400 B dense model.

Inference has been further accelerated using FP8 quantization combined with EAGLE3 optimizations. Under 32‑thread concurrency, throughput surpasses FP8‑only baselines. On H20 hardware, the model runs three times faster than a 360 B dense counterpart, and with YaRN extrapolation it supports a 128 K context length, delivering up to 7× relative speedup for longer outputs. Benchmarks on HumanEval, GSM8K, and Math‑500 show improvements of 71 %, 45 %, and up to 94 % respectively, balancing speed and stability.

Developers and small‑to‑mid‑size enterprises in the medical AI field can now access the model for free. Trial and source code are available at:

https://modelscope.cn/studios/MedAIBase/AntAngelMed

https://modelscope.cn/models/MedAIBase/AntAngelMed

https://github.com/MedAIBase/AntAngelMed

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Open Source large language model MoE medical-ai HealthBench MedAIBench MedBench

Written by

SuanNi

A community for AI developers that aggregates large-model development services, models, and compute power.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.