Mar 13, 2024 · Artificial Intelligence

Extending Context Length in LLaMA Models: Structures, Challenges, and Techniques

The article reviews LLaMA’s Transformer and RoPE architecture, explains why its context windows (4K‑128K tokens) are limited, and evaluates industry‑proven extension techniques—including linear, NTK‑aware, and YaRN interpolation plus LongLoRA sparse attention—while addressing memory and quadratic‑cost challenges and presenting a KubeAI workflow for fine‑tuning and deployment.

LLaMALongLoRARoPE

0 likes · 17 min read

Extending Context Length in LLaMA Models: Structures, Challenges, and Techniques

Baidu Geek Talk

Dec 6, 2023 · Industry Insights

From MLOps to LMOps: Challenges and Solutions for Large‑Model Operations

This article reviews the evolution from MLOps to LMOps, outlines the core concepts, challenges, and key technologies such as large‑model inference optimization, prompt engineering, and context‑length extension, and offers a forward‑looking perspective on the future of AI operations.

AI OperationsLMOpsMLOps

0 likes · 23 min read

From MLOps to LMOps: Challenges and Solutions for Large‑Model Operations