Industry Insights 5 min read

What Do the Latest AIIA FactTesting Benchmarks Reveal About China’s Large Language Models?

At the AIIA’s 14th plenary meeting in Nanjing, the FactTesting benchmark released its Q1 2025 results, evaluating over 200 large models and highlighting Baidu’s Wenxin 4.5 and Wenxin X1 as leaders in basic and reasoning capabilities, while outlining the expanded multimodal and agent testing roadmap for the year.

Baidu Geek Talk

Apr 16, 2025

What Do the Latest AIIA FactTesting Benchmarks Reveal About China’s Large Language Models?

The China Artificial Intelligence Industry Development Alliance (AIIA) continuously tracks large‑model and intelligent‑agent advancements. Since 2024 it has built the “FactTesting” benchmark, completing six monitoring rounds and testing more than 200 open‑source and closed‑source models. In 2025 the scope expands to multimodal understanding, text‑to‑image, text‑to‑video, and early autonomous‑agent evaluation.

Q1 2025 Benchmark Release

On 9 April 2025, at the 14th AIIA plenary meeting in Nanjing, the Q1 2025 FactTesting results were announced. Wei Kai, head of the overall group, presented the findings.

Basic Capability Rankings

Wenxin 4.5 from Baidu topped the basic‑capability scores.

Reasoning Capability Rankings

Wenxin X1 from Baidu achieved the highest reasoning scores.

Model Details

Wenxin 4.5 is Baidu’s next‑generation native multimodal foundation model. By jointly modeling multiple modalities it delivers strong multimodal comprehension, improved language abilities, reduced hallucinations, and enhanced logical reasoning and code generation.

Wenxin X1 offers stronger understanding, planning, reflection, and evolution capabilities, supports multimodal input, and is the first deep‑thinking model that autonomously uses tools. It excels in Chinese knowledge Q&A, literary creation, document writing, everyday dialogue, logical reasoning, complex calculations, and tool invocation.

Both models are freely available on the Wenxin Yiyan website ( https://yiyan.baidu.com).

Future Outlook

2025 is positioned as a year of comprehensive iteration for large‑model technology. Baidu plans to increase investments in AI, data centers, and cloud infrastructure to build the next generation of smarter models.

Related Reading

New PaddlePaddle 3.0 framework release: accelerating large‑model innovation.

Wenxin X1 now open to enterprise users.

Paper on Baidu’s ad recommendation system in the large‑model era.

DeepSeek‑VL2 multimodal model algorithm analysis.

Case study of a rapid‑growth app attracting 20 k users on launch day.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models AI benchmark industry insights Wenxin X1 FactTesting China AI Wenxin 4.5

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Q1 2025 Benchmark Release

Basic Capability Rankings

Reasoning Capability Rankings

Model Details

Future Outlook

Related Reading

Baidu Geek Talk

How this landed with the community

Was this worth your time?

0 Comments

Q1 2025 Benchmark Release