Tagged articles
2 articles
Page 1 of 1
Data Party THU
Data Party THU
Aug 11, 2025 · Artificial Intelligence

What Sets the Latest LLMs Apart? A Deep Dive into V3, OLMo, Gemma, Mistral, Llama 4 and More

This article systematically compares the architectures of recent large language models—including DeepSeek V3/R1, OLMo 2, Gemma 3, Mistral Small 3.1, Llama 4, Qwen 3, SmolLM 3 and Kimi 2—highlighting innovations such as MLA, MoE, post‑norm, sliding‑window attention, NoPE and optimizer choices, with diagrams and code examples to illustrate their impact on efficiency and performance.

ComparisonLLMMLA
0 likes · 12 min read
What Sets the Latest LLMs Apart? A Deep Dive into V3, OLMo, Gemma, Mistral, Llama 4 and More
DevOps
DevOps
Apr 7, 2025 · Artificial Intelligence

Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances

The article introduces Meta's newly open‑sourced Llama 4 series—including Scout with a 1 billion‑token context window, Maverick with 400 billion parameters, and the upcoming Behemoth teacher model—detailing their expert‑mix architecture, the NoPE positional‑encoding removal, training pipelines, performance benchmarks, and infrastructure improvements for large‑scale AI research.

AI researchContext WindowLlama 4
0 likes · 8 min read
Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances