Artificial Intelligence 5 min read

Meta AI VP Responds to Llama 4 Controversies and Allegations of Benchmark Manipulation

Meta AI Vice President Ahmad Al‑Dahle addressed recent criticisms of the newly released Llama 4 model, denying claims of test‑set cheating, explaining quality variations as post‑release optimization, and acknowledging internal concerns that led to staff resignations and calls for transparency.

DataFunTalk

Apr 8, 2025

Meta AI VP Responds to Llama 4 Controversies and Allegations of Benchmark Manipulation

In the early hours of today, Meta AI department Vice President Ahmad Al‑Dahle posted a response to the recent controversy surrounding the release of the Llama 4 large model.

Regarding the issue of “inconsistent model quality across different services,” Al‑Dahle explained that because the model was released as soon as it was ready, Meta expects all publicly deployed applications to require several days of optimization and adjustment, and the team will continue fixing bugs.

Concerning the accusation that Llama 4 “cheated by training on the test set,” Al‑Dahle called it baseless, stating the team would never do such a thing and that quality differences stem from the need to stabilize the applications.

Llama 4 was officially launched on April 6, touted as a native multimodal MoE model with 20 trillion parameters, surpassing DeepSeek V3. However, user feedback has been overwhelmingly negative, with many questioning its real performance.

According to users on the “One Acre Three Points” community, after repeated training, Llama 4 failed to achieve open‑source state‑of‑the‑art results and lagged far behind. Meta set an internal deadline for the release at the end of April, and leadership allegedly suggested mixing benchmark test sets during post‑training to produce superficially acceptable results.

The mixing of benchmark test sets during post‑training refers to combining different benchmark datasets in the model’s post‑training phase so the model can learn across various tasks and improve generalization.

The original poster, a current academic, declared they could not accept Meta’s approach, submitted a resignation, and demanded removal of their name from Llama 4’s technical report. They also noted that Meta’s VP of AI resigned for the same reason, and that Joelle Pineau, Meta AI research head, announced her departure on May 30.

A Meta employee named Licheng Yu, whose research focuses on computer vision and natural language processing and has published at top conferences, responded in the comments, saying the team has listened to defect feedback and hopes to improve the next version, denying any over‑fitting of test sets and challenging the cheating claim.

Licheng Yu (虞立成) is a research scientist manager at Meta, involved in the release of Llama 3.2 multimodal models (11B+90B) and leading the text‑plus‑image reinforcement learning stage of Llama 4 with configurations 17B×128 and 17B×16.

Source: APPSO

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

artificial intelligence Model Evaluation Benchmarking Llama 4 Meta AI

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.