Meituan Technology Team
Aug 28, 2025 · Artificial Intelligence
How Meeseeks Redefines LLM Instruction-Following Evaluation
Meeseeks, a new benchmark released by Meituan’s M17 team, systematically evaluates large language models’ instruction‑following ability with a three‑tier framework, multi‑round self‑correction, and extensive real‑world data, revealing performance gaps among models such as OpenAI o‑series, Claude, DeepSeek and Qwen2.5.
AILLM evaluationMeeseeks
0 likes · 13 min read
