Llama 2: Open Foundation and Fine‑Tuned Chat Models – Ghost Attention, RLHF Results, and Safety Evaluation
This article summarizes the Llama 2 series, describing the Ghost Attention technique for maintaining system‑message consistency across multi‑turn dialogs, presenting RLHF and human evaluation results, and discussing extensive safety pre‑training, benchmark assessments, and model release details.
The article introduces Llama 2, an open‑source family of large language models and chat variants, providing links to the original arXiv paper and code repositories.
Ghost Attention (GAtt) is presented as a simple training trick inspired by context distillation that keeps the model attentive to system‑message instructions throughout multi‑turn conversations, preventing the RLHF model from forgetting initial directives after a few turns.
The authors describe how synthetic system‑message data are constructed by merging user‑assistant dialogues with varied constraints (e.g., hobbies, languages, personas) and how loss masking for early‑turn tokens is used during fine‑tuning.
RLHF Results show that GAtt enables dialogs to stay consistent for over 20 turns before reaching the maximum context length, and quantitative analyses indicate improved attention activation on system messages compared with baseline models.
Model‑based evaluation uses reward models to select the best RLHF checkpoint, while human evaluation with >4,000 prompts compares Llama 2‑Chat against other open‑source models and closed‑source baselines (ChatGPT, PaLM). Results demonstrate that Llama 2‑Chat outperforms most open models and approaches proprietary systems in usefulness and safety.
The safety section details pre‑training data analysis, bias and toxicity measurements (TruthfulQA, ToxiGen, BOLD), and language distribution, highlighting that the dataset is primarily English with minimal harmful content.
Benchmark evaluations of truthfulness, toxicity, and bias are reported, showing that Llama 2 improves truthfulness over Llama 1 but exhibits mixed results on toxicity and bias, partly due to less aggressive data filtering.
In conclusion, the authors summarize the architectural choices (RoPE, RMSNorm, SwiGLU, AdamW), training scale (up to 2 trillion tokens, context length 4096), and safety‑aligned fine‑tuning methods, emphasizing responsible release and future work.
The article also lists URLs for various Llama 2 model checkpoints (7B, 13B, 70B, chat variants) and defines key terminology such as Red Teaming, PPO, RMSNorm, Ghost Attention, and others.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.