Tag

AI benchmarking

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Jun 9, 2025 · Artificial Intelligence

Can AI Models Pass the Chinese Math Gaokao? A Fair, Objective Test

The author conducts a transparent, objective assessment of several large language models on the 2025 Chinese national math exam, converting all questions to LaTeX, applying strict Gaokao scoring rules, and revealing each model's strengths and weaknesses across single‑choice, multiple‑choice, and fill‑in‑the‑blank items.

AI benchmarkingGaokaolarge language models
0 likes · 7 min read
Can AI Models Pass the Chinese Math Gaokao? A Fair, Objective Test
DataFunSummit
DataFunSummit
May 4, 2023 · Artificial Intelligence

LLM Ranking Arena: Elo‑Based Competitive Evaluation of Open‑Source Chatbots

A recent study by the LMSYS organization introduces an Elo‑rated, 1v1 battle arena for large language models, ranking open‑source chatbots like Vicuna, Koala, and ChatGLM, while discussing the limitations of traditional benchmarks and the advantages of crowd‑sourced, scalable evaluation.

AI benchmarkingChatbot ArenaElo rating
0 likes · 7 min read
LLM Ranking Arena: Elo‑Based Competitive Evaluation of Open‑Source Chatbots