Tagged articles

LLaMA

15 articles · Page 1 of 1

Apr 9, 2026 · Industry Insights

Meta Unveils First ‘Super‑Intelligent’ Model – Implications for Open‑Source AI and the Talent War

Meta’s debut of a ‘super‑intelligent’ large model, led by Scale AI founder Alexandr Wang, signals a strategic shift toward open‑source AI development and intensifies the competition for top talent, reshaping the industry’s roadmap toward AGI.

AGILLaMAMeta

0 likes · 5 min read

Meta Unveils First ‘Super‑Intelligent’ Model – Implications for Open‑Source AI and the Talent War

DataFunTalk

Jul 20, 2025 · Artificial Intelligence

Why Meta’s AI Pioneer Yang Li‑kun Is Being Marginalized: Power Struggles Behind the Scenes

The article examines how Meta’s CEO Mark Zuckerberg’s aggressive talent‑buying and commercial focus have sidelined Turing‑award winner Yang Li‑kun, detailing the restructuring of Meta’s AI labs, the clash over research directions, and the broader dilemma of balancing academic innovation with business imperatives in the AI industry.

AI industryAI researchArtificial Intelligence

0 likes · 14 min read

Why Meta’s AI Pioneer Yang Li‑kun Is Being Marginalized: Power Struggles Behind the Scenes

AI Frontier Lectures

Jul 11, 2025 · Artificial Intelligence

How Llama Evolved: From Llama‑1 to Llama‑3 – Architecture, Data, and Performance Insights

This article provides a comprehensive technical analysis of Meta's Llama series, tracing the evolution from Llama‑1 through Llama‑2 to Llama‑3, detailing model architectures, training data pipelines, optimization methods, benchmark results, and the broader impact on the open‑source AI community.

AI researchLLaMALarge Language Models

0 likes · 25 min read

How Llama Evolved: From Llama‑1 to Llama‑3 – Architecture, Data, and Performance Insights

Sohu Tech Products

Jun 18, 2025 · Artificial Intelligence

Master LLaMA Factory Fine‑Tuning: Key Parameter Settings & Memory Optimization

This tutorial walks through LLaMA‑Factory fine‑tuning by explaining how to choose learning rate, epochs, batch size, cutoff length, LoRA rank, and validation split, and shows how to estimate and reduce GPU memory usage with techniques like gradient accumulation, liger_kernel, and DeepSpeed.

AIDeepSpeedLLaMA

0 likes · 25 min read

Master LLaMA Factory Fine‑Tuning: Key Parameter Settings & Memory Optimization

Code Mala Tang

Mar 31, 2025 · Artificial Intelligence

Unlocking LLM Power: A Hands‑On Guide to Function Calling with Mistral, Llama, and Qwen

This tutorial explains how large language models can use function calling to access real‑time data, walks through setting up a Flask endpoint, demonstrates integration with Mistral Small, Llama 3.2‑1B, and Qwen models, and provides complete Python code examples for end‑to‑end execution.

APIFunction CallingLLM

0 likes · 10 min read

Unlocking LLM Power: A Hands‑On Guide to Function Calling with Mistral, Llama, and Qwen

DataFunTalk

Jul 26, 2024 · Artificial Intelligence

Llama 3: Open‑source Large Language Model Technical Report and Evaluation

This comprehensive technical report details the development, architecture, training methodology, extensive benchmark evaluations, safety measures, and inference optimizations of Meta's open‑source Llama 3 large language model series, covering models up to 405 billion parameters and supporting multilingual, multimodal, and tool‑use capabilities.

AILLaMALarge Language Model

0 likes · 115 min read

Llama 3: Open‑source Large Language Model Technical Report and Evaluation

Baobao Algorithm Notes

May 21, 2024 · Artificial Intelligence

How to Pre‑train a 20M‑Parameter LLaMA‑3 Mini Model with Hugging Face Trainer

This step‑by‑step guide shows how to use Hugging Face's Trainer API to pre‑train an ultra‑small LLaMA‑3 model (under 20 M parameters) on the TinyStories dataset, covering model configuration, tokenizer setup, data preprocessing, collators, training arguments, and inference results.

Hugging FaceLLaMALanguage Model

0 likes · 27 min read

How to Pre‑train a 20M‑Parameter LLaMA‑3 Mini Model with Hugging Face Trainer

Sohu Tech Products

Apr 24, 2024 · Artificial Intelligence

Evolution, Architecture, Training Data, Methods, and Performance of Meta's Llama Series (Llama 1, 2, 3)

Meta's Llama series has progressed from the 7‑65B Llama‑1 in early 2023 to the 8B and 70B Llama‑3 in 2024, scaling token counts from 1 T to over 15 T, adopting decoder‑only Transformers with RMSNorm, SwiGLU, RoPE and GQA, and adding supervised fine‑tuning, RLHF and DPO, resulting in state‑of‑the‑art benchmark performance and a vibrant open‑source ecosystem.

AILLaMALarge Language Models

0 likes · 25 min read

Evolution, Architecture, Training Data, Methods, and Performance of Meta's Llama Series (Llama 1, 2, 3)

DeWu Technology

Mar 13, 2024 · Artificial Intelligence

Extending Context Length in LLaMA Models: Structures, Challenges, and Techniques

The article reviews LLaMA’s Transformer and RoPE architecture, explains why its context windows (4K‑128K tokens) are limited, and evaluates industry‑proven extension techniques—including linear, NTK‑aware, and YaRN interpolation plus LongLoRA sparse attention—while addressing memory and quadratic‑cost challenges and presenting a KubeAI workflow for fine‑tuning and deployment.

AILLaMALongLoRA

0 likes · 17 min read

Extending Context Length in LLaMA Models: Structures, Challenges, and Techniques

Huawei Cloud Developer Alliance

Dec 14, 2023 · Artificial Intelligence

Unlocking LLaMA: Key Innovations, Architecture Insights, and MindSpore Inference Guide

This article reviews the LLaMA large‑language‑model series, covering its background, architectural innovations such as Add&Norm, SwiGLU, and RoPE, a known reversal‑curse bug, and provides step‑by‑step MindSpore Transformers code for model configuration, inference, and pipeline usage while previewing the upcoming LLaMA‑2 session.

AILLaMALarge Language Models

0 likes · 6 min read

Unlocking LLaMA: Key Innovations, Architecture Insights, and MindSpore Inference Guide

Baobao Algorithm Notes

Oct 25, 2023 · Artificial Intelligence

How Mixed Data Shapes LLaMA SFT: Scaling Trends, Conflict Zones, and the DMT Remedy

This article investigates how mixing data from mathematical reasoning, code generation, and general instruction-following tasks influences supervised fine‑tuning of LLaMA models, revealing distinct scaling curves, resource‑dependent performance conflicts, and a two‑stage DMT strategy that mitigates catastrophic forgetting while boosting overall capability.

DMT StrategyData ScalingLLaMA

0 likes · 14 min read

How Mixed Data Shapes LLaMA SFT: Scaling Trends, Conflict Zones, and the DMT Remedy

Ant R&D Efficiency

Sep 25, 2023 · Artificial Intelligence

Running LLaMA 7B Model Locally on a Single Machine

This guide shows how to download, convert, 4‑bit quantize, and run Meta’s 7‑billion‑parameter LLaMA model on a single 16‑inch Apple laptop using Python, torch, and the llama.cpp repository, demonstrating that the quantized model fits in memory and generates responses quickly, with optional scaling to larger models.

7B modelAILLaMA

0 likes · 5 min read

Running LLaMA 7B Model Locally on a Single Machine

Alimama Tech

Sep 12, 2023 · Artificial Intelligence

Megatron-LLaMA: High-Performance Large Language Model Training Framework

Megatron-LLaMA is an open‑source high‑performance training framework for LLaMA models, offering tensor, pipeline, and sequence parallelism, an overlapped optimizer, and near‑linear scalability, achieving up to 176% speedup on 32 GPUs and robust performance even with limited network bandwidth.

DeepSpeedGPU OptimizationLLaMA

0 likes · 10 min read

Megatron-LLaMA: High-Performance Large Language Model Training Framework

DaTaobao Tech

Sep 11, 2023 · Artificial Intelligence

Large Language Model Upgrade Paths and Architecture Selection

This article analyzes upgrade paths of major LLMs—ChatGLM, LLaMA, Baichuan—detailing performance, context length, and architectural changes, then examines essential capabilities, data cleaning, tokenizer and attention design, and offers practical guidance for balanced scaling and efficient model construction.

BaichuanChatGLMData preprocessing

0 likes · 32 min read

Large Language Model Upgrade Paths and Architecture Selection

Network Intelligence Research Center (NIRC)

May 10, 2023 · Artificial Intelligence

How LLaMA Preprocesses Training Data with CCNet Before Model Training

Before training large language models like LLaMA, MetaAI applies a multi‑stage CCNet pipeline that crawls web data, stores it in WET format, deduplicates paragraphs, detects and filters languages using fastText, and further refines content by similarity to Wikipedia and citation‑based linear models.

CCNetData preprocessingDeduplication

0 likes · 7 min read

How LLaMA Preprocesses Training Data with CCNet Before Model Training