Llama 3 Unveiled: 8B & 70B Models Set New SOTA Across Benchmarks

Meta announced the open‑source Llama 3 series (8B and 70B parameters), detailing its decoder‑only Transformer architecture, 15 T‑token multilingual training data, superior benchmark scores over competitors, a limited 8K context window, and upcoming cloud and web‑based deployments.

NewBeeNLP
NewBeeNLP
NewBeeNLP
Llama 3 Unveiled: 8B & 70B Models Set New SOTA Across Benchmarks

Meta has officially released Llama 3, offering two parameter sizes—8 billion and 70 billion—both as open‑source, instruction‑tuned models. The announcement highlights that the 8B version outperforms Gemma 7B and Mistral 7B Instruct on benchmarks such as MMLU, GPQA, and HumanEval, while the 70B model surpasses Claude 3 Sonnet and Google Gemini Pro 1.5.

Achieving SOTA but Limited to an 8K Context Window

The models use a classic decoder‑only Transformer architecture with a 128K token vocabulary. Training consumed 15 trillion tokens drawn from publicly available sources, with 5% non‑English data covering over 30 languages. Compared with Llama 2, Llama 3’s data volume is seven times larger, and its code portion is four times larger.

To improve inference efficiency, Meta incorporated Grouped‑Query Attention (GQA) and trained on sequences up to 8192 tokens, masking self‑attention to prevent crossing document boundaries. Across a wide range of tasks—language (MMLU), knowledge (GPQA), coding (HumanEval), mathematics (GSM‑8K, MATH)—both Llama 3 variants achieve new state‑of‑the‑art results, often beating models of similar scale.

Beyond standard benchmarks, Meta evaluated Llama 3 on a custom 1,800‑item test set covering 12 real‑world use cases such as code generation, reasoning, writing, and summarization. The models outperformed Llama 2, Claude 3 Sonnet, Mistral Medium, and even GPT‑3.5 on these tasks, and showed strong performance on higher‑order datasets like AGIEval, BIG‑Bench, and ARC‑Challenge.

The primary limitation noted is the 8K context window, which lags behind newer models offering tens or hundreds of thousands of tokens. Nonetheless, experts such as Matt Shumer remain optimistic that the open‑source community will quickly extend the window length.

Llama Gets an Official Web Interface

Both the base and instruction versions of Llama 3 are now downloadable from Hugging Face, and cloud providers—including Microsoft Azure, Google Cloud, Amazon AWS, and NVIDIA NIM—plan to host the models. Hardware support from Intel, NVIDIA, AMD, and Qualcomm is also announced.

Meta released a web‑based interface named “Meta AI” that provides conversational and image‑generation capabilities. The chat feature works without registration, while the drawing tool requires a login. The web app currently runs simple Python snippets (text output only) and does not yet support Chinese input or file uploads.

One More Thing

Hours before Meta’s official announcement, Azure leaked the Llama 3 8B Instruct model, and Replicate’s pricing page for Llama 3 briefly appeared before being taken down. The brief misinformation was quickly corrected, and the community can now freely experiment with the open‑source models.

Source: 量子位

References:

https://ai.meta.com/blog/meta-llama-3/

https://about.fb.com/news/2024/04/meta-ai-assistant-built-with-llama-3/

https://huggingface.co/meta-llama/Meta-Llama-3-70B

Open Sourcelarge language modelbenchmarkMeta AILlama 3
NewBeeNLP
Written by

NewBeeNLP

Always insightful, always fun

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.