Tag

benchmark cheating

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Apr 7, 2025 · Artificial Intelligence

Llama 4 Open‑Source Release Marred by Performance Failures and Alleged Training‑Data Cheating

Meta's newly released Llama 4 quickly became a controversy as internal leaks reveal training‑data cheating, benchmark over‑optimization, and disappointing code‑generation performance that fails to match even older models, prompting resignations and widespread criticism from the AI community.

AI model performanceLlama 4Meta AI
0 likes · 7 min read
Llama 4 Open‑Source Release Marred by Performance Failures and Alleged Training‑Data Cheating
Java Tech Enthusiast
Java Tech Enthusiast
Feb 22, 2025 · Artificial Intelligence

Grok‑3 Evaluation Controversy and Community Reactions

Three days after Grok‑3’s launch, OpenAI was accused of inflating its benchmark scores by using a “cons@64” method that aggregates 64 answers, a practice critics say unfairly skews comparisons with single‑shot models like o3‑mini, while developers have already begun experimenting with the model in simple games.

AIGrok 3OpenAI
0 likes · 5 min read
Grok‑3 Evaluation Controversy and Community Reactions