What Do the Latest AIIA FactTesting Benchmarks Reveal About China’s Large Language Models?

At the AIIA’s 14th plenary meeting in Nanjing, the FactTesting benchmark released its Q1 2025 results, evaluating over 200 large models and highlighting Baidu’s Wenxin 4.5 and Wenxin X1 as leaders in basic and reasoning capabilities, while outlining the expanded multimodal and agent testing roadmap for the year.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
What Do the Latest AIIA FactTesting Benchmarks Reveal About China’s Large Language Models?

The China Artificial Intelligence Industry Development Alliance (AIIA) continuously tracks large‑model and intelligent‑agent advancements. Since 2024 it has built the “FactTesting” benchmark, completing six monitoring rounds and testing more than 200 open‑source and closed‑source models. In 2025 the scope expands to multimodal understanding, text‑to‑image, text‑to‑video, and early autonomous‑agent evaluation.

Q1 2025 Benchmark Release

On 9 April 2025, at the 14th AIIA plenary meeting in Nanjing, the Q1 2025 FactTesting results were announced. Wei Kai, head of the overall group, presented the findings.

Basic Capability Rankings

Wenxin 4.5 from Baidu topped the basic‑capability scores.

Reasoning Capability Rankings

Wenxin X1 from Baidu achieved the highest reasoning scores.

Model Details

Wenxin 4.5 is Baidu’s next‑generation native multimodal foundation model. By jointly modeling multiple modalities it delivers strong multimodal comprehension, improved language abilities, reduced hallucinations, and enhanced logical reasoning and code generation.

Wenxin X1 offers stronger understanding, planning, reflection, and evolution capabilities, supports multimodal input, and is the first deep‑thinking model that autonomously uses tools. It excels in Chinese knowledge Q&A, literary creation, document writing, everyday dialogue, logical reasoning, complex calculations, and tool invocation.

Both models are freely available on the Wenxin Yiyan website ( https://yiyan.baidu.com).

Future Outlook

2025 is positioned as a year of comprehensive iteration for large‑model technology. Baidu plans to increase investments in AI, data centers, and cloud infrastructure to build the next generation of smarter models.

Related Reading

New PaddlePaddle 3.0 framework release: accelerating large‑model innovation.

Wenxin X1 now open to enterprise users.

Paper on Baidu’s ad recommendation system in the large‑model era.

DeepSeek‑VL2 multimodal model algorithm analysis.

Case study of a rapid‑growth app attracting 20 k users on launch day.

large language modelsAI benchmarkindustry insightsWenxin X1FactTestingChina AIWenxin 4.5
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.