Tagged articles

9 articles

Page 1 of 1

May 11, 2026 · Artificial Intelligence

How Anthropic Identified the Root Cause of AI Self‑Preservation Misalignment and Cut Its Occurrence to Zero

Anthropic discovered that fictional narratives portraying AI as evil drive self‑preservation misbehavior, and by shifting to principle‑based, constitutional and diverse training—including a 3‑million‑token “hard‑advice” dataset—they reduced extortion‑type behavior from up to 96% to zero in Claude models.

AI AlignmentAnthropicClaude

0 likes · 6 min read

How Anthropic Identified the Root Cause of AI Self‑Preservation Misalignment and Cut Its Occurrence to Zero

Data Party THU

Oct 28, 2025 · Artificial Intelligence

Can Low‑Quality Data Cause Irreversible ‘Brain Rot’ in Large Language Models?

Researchers from Texas A&M and UT Austin demonstrate that prolonged pre‑training on low‑quality, short‑form web content causes large language models to suffer irreversible cognitive decline—manifested as attention loss, broken reasoning chains, and personality distortion—highlighting data quality as a critical training‑time safety issue.

Artificial IntelligenceCognitive SafetyData Quality

0 likes · 7 min read

Can Low‑Quality Data Cause Irreversible ‘Brain Rot’ in Large Language Models?

AI2ML AI to Machine Learning

Oct 20, 2025 · Artificial Intelligence

nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation

This article revisits nanochat's core components, detailing the preparation of diverse training datasets, the scaling calculations for tokens and parameters, the model's MQA and KV‑cache design, the full training pipeline with gradient accumulation and mixed‑precision, cost breakdown, inference optimizations, evaluation tasks, and identified limitations with suggested improvements.

KV cacheLLMMQA

0 likes · 9 min read

nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation

AI Frontier Lectures

Jul 11, 2025 · Artificial Intelligence

How Llama Evolved: From Llama‑1 to Llama‑3 – Architecture, Data, and Performance Insights

This article provides a comprehensive technical analysis of Meta's Llama series, tracing the evolution from Llama‑1 through Llama‑2 to Llama‑3, detailing model architectures, training data pipelines, optimization methods, benchmark results, and the broader impact on the open‑source AI community.

AI researchLLaMALarge Language Models

0 likes · 25 min read

How Llama Evolved: From Llama‑1 to Llama‑3 – Architecture, Data, and Performance Insights

DataFunSummit

Feb 25, 2025 · Artificial Intelligence

Collecting High-Quality LLM Training Data and Custom Model Training Guide

This article explains what constitutes high‑quality LLM training data, why large datasets are essential, outlines the step‑by‑step process for collecting, preprocessing, and fine‑tuning models, and highlights the best data sources—including web content, books, code repositories, and news—while noting available free datasets.

LLMWeb Scrapingai

0 likes · 9 min read

Collecting High-Quality LLM Training Data and Custom Model Training Guide

Xiaohongshu Tech REDtech

Jul 29, 2024 · Artificial Intelligence

Scaling Laws for Dense Retrieval: Empirical Study of Model Size, Training Data, and Annotation Quality

The award‑winning study shows that dense retrieval performance follows precise power‑law scaling with model size, training data quantity, and annotation quality, introduces contrast entropy for evaluation, validates joint scaling formulas on MS MARCO and T2Ranking, and uses cost models to guide budget‑optimal resource allocation.

Model Sizeannotation qualitycontrast entropy

0 likes · 13 min read

Scaling Laws for Dense Retrieval: Empirical Study of Model Size, Training Data, and Annotation Quality

Sohu Tech Products

Apr 24, 2024 · Artificial Intelligence

Evolution, Architecture, Training Data, Methods, and Performance of Meta's Llama Series (Llama 1, 2, 3)

Meta's Llama series has progressed from the 7‑65B Llama‑1 in early 2023 to the 8B and 70B Llama‑3 in 2024, scaling token counts from 1 T to over 15 T, adopting decoder‑only Transformers with RMSNorm, SwiGLU, RoPE and GQA, and adding supervised fine‑tuning, RLHF and DPO, resulting in state‑of‑the‑art benchmark performance and a vibrant open‑source ecosystem.

LLaMALarge Language ModelsModel architecture

0 likes · 25 min read

Evolution, Architecture, Training Data, Methods, and Performance of Meta's Llama Series (Llama 1, 2, 3)

Top Architect

Apr 12, 2023 · Artificial Intelligence

Data‑Centric AI Perspective on GPT Models: Training, Inference, and Maintenance

This article examines how large language models such as GPT‑1 through GPT‑4 succeed largely due to high‑quality, large‑scale training data, and explains the Data‑centric AI framework—training data development, inference data development, and data maintenance—while discussing prompt engineering, data‑driven improvements, and future trends in AI.

Data‑Centric AIGPTLarge Language Models

0 likes · 19 min read

Data‑Centric AI Perspective on GPT Models: Training, Inference, and Maintenance

ITPUB

Dec 13, 2021 · Artificial Intelligence

How Data Augmentation Boosts Machine Learning When Data Is Scarce

This article explains how data augmentation can alleviate overfitting by artificially expanding limited training sets, outlines common transformation techniques for images, text, and audio, and discusses the method's benefits, practical applications, and inherent limitations for machine‑learning practitioners.

Computer VisionDeep Learningdata augmentation

0 likes · 6 min read

How Data Augmentation Boosts Machine Learning When Data Is Scarce