Tagged articles

scaling laws

69 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Jun 27, 2026 · Artificial Intelligence

Why We Should Be Cautious About Scaling Laws in Deep Learning

The article reviews the history, theory, and empirical findings of scaling laws for neural language models, compares the Kaplan and Chinchilla formulations, discusses data‑limited regimes and fitting subtleties, and highlights why careful interpretation and resource allocation are essential for reliable predictions.

Data EfficiencyKaplanLanguage Models

0 likes · 26 min read

Why We Should Be Cautious About Scaling Laws in Deep Learning

21CTO

Jun 27, 2026 · Artificial Intelligence

Lilian Weng’s Deep Dive Overturns Three Years of Large‑Model Scaling Law Assumptions

In a ten‑thousand‑word analysis, former OpenAI safety VP Lilian Weng retraces the history of model scaling laws from Kaplan’s 2020 formulation, demonstrates how DeepMind’s Chinchilla overturns the original parameter‑to‑data ratio, uncovers two critical bugs in the Chinchilla paper, and warns that the impending 2026‑2028 data wall makes naïve scaling of parameters and compute unsustainable.

AI trainingchinchilladata wall

0 likes · 10 min read

Lilian Weng’s Deep Dive Overturns Three Years of Large‑Model Scaling Law Assumptions

PaperAgent

Jun 26, 2026 · Artificial Intelligence

Lilian Weng’s Deep Dive into Scaling Laws for Large‑Model Training

The article explains how scaling laws serve as a budget guide for training large language models, comparing Kaplan’s and Chinchilla’s findings, illustrating optimal parameter‑token trade‑offs, and highlighting the impact of data quality and duplication on model performance.

Compute BudgetData QualityKaplan

0 likes · 9 min read

Lilian Weng’s Deep Dive into Scaling Laws for Large‑Model Training

AI Architecture Hub

Jun 15, 2026 · Artificial Intelligence

Build Your Own LLM from Scratch: The 5 Essential Stages Behind GPT and Claude

This guide breaks down the complete workflow for building a large language model—from tokenization and pre‑training to data curation, scaling laws, alignment via RLHF/DPO, and robust evaluation—showing why architecture is less critical than data, scaling, and engineering.

AI EngineeringData preprocessingLLM training

0 likes · 12 min read

Build Your Own LLM from Scratch: The 5 Essential Stages Behind GPT and Claude

SuanNi

Jun 14, 2026 · Artificial Intelligence

How HRM-Text-1B Beats Scaling Laws with 0.1% Data and Hundreds‑Fold Compute Savings

HRM-Text-1B, a brain‑inspired hierarchical language model, achieves strong benchmark scores while using only 0.1% of the training tokens of comparable models, cutting compute costs by 96‑432× through a novel H/L module architecture, MagicNorm stabilization, and a focused instruction‑response training objective.

Efficient PretrainingHRM-TextHierarchical Architecture

0 likes · 9 min read

How HRM-Text-1B Beats Scaling Laws with 0.1% Data and Hundreds‑Fold Compute Savings

Top Architect

Jun 13, 2026 · Artificial Intelligence

What Is an Inference Large Language Model? A Visual Guide

The article explains inference‑type large language models, how they differ from traditional models by breaking questions into reasoning steps, the shift from training‑time to test‑time compute, scaling‑law insights, validation techniques, proposal‑distribution tricks, and the detailed training pipeline of DeepSeek‑R1, while also discussing failed experiments and future directions.

DeepSeek-R1inference modelslarge language models

0 likes · 20 min read

What Is an Inference Large Language Model? A Visual Guide

Machine Learning Algorithms & Natural Language Processing

Jun 12, 2026 · Artificial Intelligence

The Next Frontier for Large‑Scale LLM Agents: 17 Must‑Read Papers on Self‑Evolving Harnesses

This article surveys 17 recent core papers that explore how the system‑level harness surrounding large‑model agents can be automatically generated, evolved, and audited, covering topics such as system boundaries, failure‑driven improvement, memory and skill optimization, source‑level rewriting, scaling laws, aging, and safety.

Agent MemoryHarness EngineeringLLM Agents

0 likes · 18 min read

The Next Frontier for Large‑Scale LLM Agents: 17 Must‑Read Papers on Self‑Evolving Harnesses

Machine Heart

Jun 9, 2026 · Artificial Intelligence

Why Standard Vision‑Language Models + Scale Data Beat Specialized 3D Vision Designs (VLM³)

Meta’s VLM³ demonstrates that a plain vision‑language model, when trained on large‑scale data with simple camera‑focal‑length and pixel‑space normalization, matches or surpasses expert 3D vision models across monocular depth estimation, object‑level understanding, pixel‑matching and camera‑pose tasks, eliminating the need for task‑specific architectures, loss functions, data augmentations or regression formulations.

3D VisionDepth EstimationMeta

0 likes · 6 min read

Why Standard Vision‑Language Models + Scale Data Beat Specialized 3D Vision Designs (VLM³)

PaperAgent

Jun 9, 2026 · Artificial Intelligence

Why Small Models Can Never Match Large Models, Even with Unlimited Data

The article analyzes scaling laws and synthetic experiments to show that, due to power‑law data distributions and interference, some tasks remain unreachable for small models even with infinite data, a finding confirmed on real LLMs such as OLMo.

interferencelarge language modelsmodel capacity

0 likes · 10 min read

Why Small Models Can Never Match Large Models, Even with Unlimited Data

Alimama Tech

May 28, 2026 · Artificial Intelligence

13 KDD'26 Papers from Taobao: Scaling Laws, World Models and New AI Paradigms

The article highlights thirteen Taobao‑group papers accepted at KDD 2026, covering large‑model scaling laws, end‑to‑end generative recommendation, CTR prediction, interactive recommendation agents, LLM‑based pricing, robust auto‑bidding, two‑stage auctions, generative world models, multi‑attribution conversion, uplift modeling and long‑term causal estimation for e‑commerce systems.

CTR PredictionKDD 2026Recommendation Systems

0 likes · 29 min read

13 KDD'26 Papers from Taobao: Scaling Laws, World Models and New AI Paradigms

Data Party THU

May 11, 2026 · Artificial Intelligence

How a 1930‑Era AI Model Without Any Computer Knowledge Learned to Write Python

The talkie‑1930‑13b language model, trained exclusively on English texts published before 1931, surprisingly understands historical events, solves Python coding problems, and exhibits scaling‑law behavior, prompting a detailed comparison with its modern twin talkie‑web‑13b and an analysis of training pipelines, memory categories, and common deployment pitfalls.

AI memoryLLMPython code generation

0 likes · 10 min read

How a 1930‑Era AI Model Without Any Computer Knowledge Learned to Write Python

DataFunTalk

May 10, 2026 · Artificial Intelligence

How AI Is Powering One‑Person Billion‑Dollar Startups and Multi‑Agent Software Collaboration

In a Code with Claude interview, Anthropic co‑founders Dario and Daniela Amodei explain how exponential AI growth—evidenced by an 80× revenue surge—creates compute bottlenecks, drives a shift to multi‑agent collaboration, and forces product teams to rethink development through scaling laws and Amdahl's Law.

Amdahl's LawCompute BottleneckMulti-Agent Systems

0 likes · 26 min read

How AI Is Powering One‑Person Billion‑Dollar Startups and Multi‑Agent Software Collaboration

Data Party THU

May 2, 2026 · Artificial Intelligence

Finally, Researchers Uncover Deep Learning’s “Newton’s Law”

A new collaborative paper from top universities proposes a unified “Learning Mechanics” framework for deep learning, outlining five research strands—from solvable idealized models and extreme limits to empirical scaling laws and hyper‑parameter theory—while drawing analogies to classical physics and highlighting ten open challenges.

deep learninghyperparameter theorylearning mechanics

0 likes · 16 min read

Finally, Researchers Uncover Deep Learning’s “Newton’s Law”

Machine Heart

Apr 26, 2026 · Artificial Intelligence

Has Deep Learning Discovered Its Own “Newton’s Law”?

A new collaborative paper titled “There Will Be a Scientific Theory of Deep Learning” proposes a unified “Learning Mechanics” framework that connects solvable idealized models, tractable limits, empirical scaling laws, hyperparameter theory, and universal representation behavior, aiming to give deep learning a first‑principles scientific foundation.

deep learninghyperparameterslearning mechanics

0 likes · 14 min read

Has Deep Learning Discovered Its Own “Newton’s Law”?

Machine Learning Algorithms & Natural Language Processing

Apr 21, 2026 · Artificial Intelligence

How a 22‑Year‑Old Reversed‑Engineered Mythos into OpenMythos Using MoE and DeepSeek‑Inspired Attention

OpenMythos re‑creates the Claude Mythos architecture as a Recurrent‑Depth Transformer with MoE routing, achieving comparable performance to larger Transformers while using roughly half the parameters, and demonstrates systematic generalization and depth extrapolation through looped inference in latent space.

AI ArchitectureLooped Language ModelsMixture of Experts

0 likes · 6 min read

How a 22‑Year‑Old Reversed‑Engineered Mythos into OpenMythos Using MoE and DeepSeek‑Inspired Attention

Machine Heart

Apr 7, 2026 · Artificial Intelligence

How Qianxun Raised ¥3 B in 30 Days: AI‑Powered Robotics Secrets

Qianxun Intelligent secured ¥30 billion in funding within a month, leveraged a scaling‑law data engine and the Spirit v1.5 VLA model to achieve breakthrough robot performance, and demonstrated the commercial loop through deployments at JD.com retail and CATL battery lines.

Embodied AIQianxun Intelligentdata collection

0 likes · 12 min read

How Qianxun Raised ¥3 B in 30 Days: AI‑Powered Robotics Secrets

Lao Guo's Learning Space

Apr 2, 2026 · Artificial Intelligence

Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution

This article explains the full lifecycle of large language models in 2026, covering pretraining fundamentals, the limits of classic Scaling Laws, data‑centric advances, fine‑tuning strategies, RLHF, DPO, and the emerging post‑training methods GRPO, DAPO and RLVR, with concrete benchmarks and cost analyses.

DAPODPOGRPO

0 likes · 17 min read

Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution

DataFunSummit

Mar 29, 2026 · Artificial Intelligence

How Code Intelligence Is Evolving: From Foundation Models to Repository‑Level Agents

This article reviews the rapid evolution of code intelligence, covering the history of code foundation models, reinforcement‑learning optimizations, scaling‑law insights, the LoopCoder architecture, rigorous multi‑level evaluation suites, and the emergence of repository‑level code agents, while highlighting open‑source contributions such as Qwen‑Coder.

code evaluationcode-intelligencereinforcement learning

0 likes · 15 min read

How Code Intelligence Is Evolving: From Foundation Models to Repository‑Level Agents

Alimama Tech

Mar 26, 2026 · Industry Insights

How Alibaba’s Large User Model (LUM) Boosted CTR by 4.5% and Scaled to Billions of Parameters

The article analyzes the evolution from traditional modular recommendation models to a generative Large User Model (LUM), detailing its three‑stage paradigm, tokenization, training objectives, scaling‑law findings, offline and online experiments, and the AI‑infra innovations that enabled a 4.5% CTR lift in production.

CTR PredictionRecommendation Systemsgenerative modeling

0 likes · 18 min read

How Alibaba’s Large User Model (LUM) Boosted CTR by 4.5% and Scaled to Billions of Parameters

Machine Learning Algorithms & Natural Language Processing

Mar 24, 2026 · Artificial Intelligence

Jensen Huang Claims AGI Is Already Achieved, Ilya Is Wrong, Programmers to Reach 1 B

In a candid Lex Fridman interview, Nvidia CEO Jensen Huang asserts that AGI has already been realized, disputes Ilya Sutskever’s data‑limit claim, predicts a billion programmers, outlines scaling‑law dynamics, token‑priced AI services, data‑center energy strategies, and his hands‑on management philosophy for the AI era.

AGIAI managementData Centers

0 likes · 37 min read

Jensen Huang Claims AGI Is Already Achieved, Ilya Is Wrong, Programmers to Reach 1 B

SuanNi

Mar 4, 2026 · Artificial Intelligence

How to Fit Large Language Models into Cars and Robots: A Hardware‑Aware Scaling Law

This article presents a hardware‑aware co‑design framework for edge‑deployed large language models, revealing a scaling law that balances model accuracy and inference latency, and demonstrates how Pareto‑optimal architectures can be discovered quickly using roofline analysis and systematic search on devices like NVIDIA Jetson Orin.

AI inferencePareto optimizationRoofline Model

0 likes · 15 min read

How to Fit Large Language Models into Cars and Robots: A Hardware‑Aware Scaling Law

Machine Learning Algorithms & Natural Language Processing

Feb 23, 2026 · Artificial Intelligence

System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs

This article systematically dissects the engineering decisions behind frontier large‑language‑model training—covering architecture choices, attention variants, optimizer evolution, data‑curation strategies, scaling‑law insights, and post‑training SFT/RL pipelines—based on open‑source reports from seven leading AI laboratories.

Mixture of ExpertsModel Traininglarge language models

0 likes · 26 min read

System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs

Top Architect

Feb 14, 2026 · Artificial Intelligence

Why Test‑Time Compute Is the Next Breakthrough for Large Language Models

The article explains how inference‑oriented large language models shift the focus from training‑time resources to test‑time computation, detailing scaling laws, verification techniques, reinforcement‑learning pipelines such as DeepSeek‑R1, and methods for distilling reasoning abilities into smaller, consumer‑grade models.

Prompt Engineeringinference computelarge language models

0 likes · 19 min read

Why Test‑Time Compute Is the Next Breakthrough for Large Language Models

AI Cyberspace

Jan 18, 2026 · Artificial Intelligence

Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training

The article explains various learning paradigms (supervised, unsupervised, self‑supervised, semi‑supervised, and reinforcement), describes dataset types and quality considerations, outlines preprocessing steps like filtering, deduplication, and tokenization, and discusses scaling laws linking model size, data volume, and compute resources, with concrete examples and code.

Data preprocessingModel Trainingmachine learning

0 likes · 26 min read

Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training

JD Tech

Jan 13, 2026 · Artificial Intelligence

Mastering Large Language Models: Transformers, Scaling Laws, and MoE Explained

This extensive guide walks readers through the fundamentals of large language models, covering transformer architecture, pre‑training and fine‑tuning techniques, scaling laws, emergent abilities, mixture‑of‑experts designs, and practical comparisons, providing clear explanations, code snippets, and visual illustrations for deep learning practitioners.

Mixture of Expertsemergent abilitiesfine-tuning

0 likes · 47 min read

Mastering Large Language Models: Transformers, Scaling Laws, and MoE Explained

Architects Research Society

Jan 3, 2026 · Artificial Intelligence

2026: The Year AI Shifts from Scaling Hype to Practical, Small‑Model Innovation

The article forecasts that by 2026 AI will move away from sheer scale‑driven breakthroughs toward more usable, smaller models, world‑model learning, robust agents, and physical integration, emphasizing practical utility, augmentation of human work, and new job opportunities.

AIAugmentationphysical AI

0 likes · 7 min read

2026: The Year AI Shifts from Scaling Hype to Practical, Small‑Model Innovation

AI2ML AI to Machine Learning

Dec 29, 2025 · Artificial Intelligence

How Brin’s Return Powers Google’s First ‘Sword’: The TPU Hardware Revolution

The article examines Google’s AI resurgence after Sergey Brin’s comeback, detailing the evolution of TPU hardware from v1 to v7, the strategic focus on algorithmic efficiency, comparisons with Nvidia’s B200, the role of JAX/XLA, and how these advances create a powerful competitive moat for Google’s AI infrastructure.

AI hardwareGoogle TPUJAX

0 likes · 8 min read

How Brin’s Return Powers Google’s First ‘Sword’: The TPU Hardware Revolution

PaperAgent

Dec 22, 2025 · Artificial Intelligence

Can Budget‑Aware Tool Use Unlock Scalable AI Agents? A Deep Dive

This article analyzes recent Google research on test‑time scaling and agentization, introducing budget‑aware tool use and the BATS framework, presenting experimental results across 180 configurations, uncovering scaling laws, and offering a predictive model for optimal multi‑agent architectures.

AI AgentsBATS frameworkLLM Tool Use

0 likes · 7 min read

Can Budget‑Aware Tool Use Unlock Scalable AI Agents? A Deep Dive

Tencent Technical Engineering

Dec 1, 2025 · Artificial Intelligence

Do Machines Really Think? Inside Deep Reasoning, Scaling Laws & RLHF for LLMs

This article examines whether large language models truly think, explores the origins of deep reasoning through transformer architectures and scaling laws, reviews chain‑of‑thought and its variants, and analyzes how reinforcement learning from human feedback—including PPO, DPO, and GRPO—helps internalise step‑by‑step reasoning while pointing to future directions such as atomic thought, hierarchical models, and training‑free in‑context knowledge bases.

AI alignmentChain-of-ThoughtLLM

0 likes · 35 min read

Do Machines Really Think? Inside Deep Reasoning, Scaling Laws & RLHF for LLMs

AntTech

Nov 11, 2025 · Artificial Intelligence

Breaking the Efficiency Wall: Ant Group’s Bailing Model Paves the Way to AGI

At CNCC 2025, Ant Group’s Vice President Zhou Jun outlined the Bailing large‑model’s five‑layer architecture, hybrid linear attention, Ling Scaling Law, and novel training algorithms that dramatically cut costs and latency, achieving state‑of‑the‑art performance on math and code benchmarks while promoting open‑source collaboration toward AGI.

AGIMixture of ExpertsMultimodal AI

0 likes · 8 min read

Breaking the Efficiency Wall: Ant Group’s Bailing Model Paves the Way to AGI

Wu Shixiong's Large Model Academy

Oct 22, 2025 · Artificial Intelligence

Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment

This guide walks through the complete end‑to‑end process of training a large language model from scratch, covering data collection, cleaning, tokenization, pre‑training objectives and engineering, post‑training alignment methods, scaling laws, over‑fitting mitigation, and gradient‑stability techniques.

LLMalignmentgradient stability

0 likes · 9 min read

Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment

Data Party THU

Oct 21, 2025 · Artificial Intelligence

Can Linear‑Time LSTMs Beat Transformers? Scaling Laws Reveal the Answer

The paper presents a systematic scaling‑law study of the linear‑time xLSTM architecture versus quadratic‑time Transformers, evaluating parameter‑data loss surfaces, optimal model size under equal FLOP budgets, and inference latency components, and shows that xLSTM consistently offers better cost‑effectiveness across diverse contexts and budgets.

Inference OptimizationLinear Time ComplexityModel Efficiency

0 likes · 11 min read

Can Linear‑Time LSTMs Beat Transformers? Scaling Laws Reveal the Answer

21CTO

Sep 29, 2025 · Artificial Intelligence

Why Open‑Source Is the Key to China’s AI Future, According to Li Kaifu

Li Kaifu argues that open‑source large‑model ecosystems are essential for China to close the AI gap with the United States, highlighting DeepSeek’s impact, shifting scaling laws, and the emerging role of AI‑to‑AI teaching as the next development frontier.

China AIartificial-intelligencelarge language models

0 likes · 4 min read

Why Open‑Source Is the Key to China’s AI Future, According to Li Kaifu

Architects' Tech Alliance

Sep 23, 2025 · Artificial Intelligence

Why AI Chips Need High‑Speed Networks: From Scaling Laws to DPU Evolution

This report analyzes how the convergence of Moore's law slowdown and large‑model scaling laws creates a feedback loop between compute power and intelligence, driving the emergence of AI‑specific chips, high‑speed networking, and DPU architectures that together reshape modern AI infrastructure.

AI chipsDPUHigh-Speed Networking

0 likes · 25 min read

Why AI Chips Need High‑Speed Networks: From Scaling Laws to DPU Evolution

Fighter's World

Aug 15, 2025 · Artificial Intelligence

Why GPT‑5 Is Still Far From AGI Yet Near Scalable Profitability

The article analyzes GPT‑5’s release, its unified multi‑model architecture with a real‑time router, improved reasoning, coding and tool‑use capabilities, reduced hallucinations, and how these technical shifts reshape AI commercialization, investment logic, competition and enterprise adoption.

AI commercializationAgentic AIGPT-5

0 likes · 20 min read

Why GPT‑5 Is Still Far From AGI Yet Near Scalable Profitability

Data Party THU

Aug 5, 2025 · Artificial Intelligence

Why State Space Models May Outperform Transformers: A Deep Dive

The article provides a comprehensive technical analysis of state space models (SSM) versus Transformers, covering their core mechanisms, three essential design factors, training efficiency, scaling behavior, tokenization debates, and experimental evidence that highlights the trade‑offs and potential advantages of SSMs in modern AI systems.

MambaState Space ModelTokenization

0 likes · 21 min read

Why State Space Models May Outperform Transformers: A Deep Dive

Data Thinking Notes

Jul 30, 2025 · Artificial Intelligence

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

This article reviews the most influential papers in large language model research since 2017, covering foundational works such as the Transformer, GPT‑3, BERT, scaling laws, and recent innovations like FlashAttention, Mamba, and QLoRA, highlighting their core contributions and impact on AI development.

AI researchModel OptimizationTransformer

0 likes · 28 min read

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

Kuaishou Large Model

Jun 20, 2025 · Artificial Intelligence

How OneRec Revolutionizes Short-Video Recommendations with End-to-End Generative AI

OneRec, an end-to-end generative recommendation system from Kuaishou, uses an encoder-decoder architecture, reward-based preference alignment, and reinforcement learning to dramatically improve video recommendation efficiency, boosting user engagement and reducing operational costs while achieving scaling-law performance comparable to large language models.

EfficiencyGenerative AIKuaishou

0 likes · 18 min read

How OneRec Revolutionizes Short-Video Recommendations with End-to-End Generative AI

Kuaishou Tech

Jun 20, 2025 · Artificial Intelligence

How OneRec Redefines Recommendation with End‑to‑End Generative Modeling and RL Alignment

The OneRec system from Kuaishou replaces traditional cascade recommendation pipelines with an encoder‑decoder architecture, leverages reward‑based preference alignment via reinforcement learning, achieves ten‑fold FLOPs gains, cuts operational costs by 90%, and delivers significant user‑engagement improvements across short‑video and local‑service scenarios.

KuaishouOneRecgenerative modeling

0 likes · 17 min read

How OneRec Redefines Recommendation with End‑to‑End Generative Modeling and RL Alignment

AntTech

Jun 18, 2025 · Artificial Intelligence

How Ant Group’s Baoling Models Push Toward AGI with MoE and Multimodal Innovations

In a detailed AICon talk, Ant Group’s Baoling team leader Zhou Jun outlines their latest large‑model training techniques, MoE architecture optimizations, multimodal breakthroughs, open‑source releases, and the strategic roadmap needed to turn AI into a ubiquitous, “scan‑code‑level” everyday assistant.

AI InfrastructureMixture of ExpertsMultimodal AI

0 likes · 25 min read

How Ant Group’s Baoling Models Push Toward AGI with MoE and Multimodal Innovations

AI Cyberspace

May 20, 2025 · Artificial Intelligence

Why SuperNode and SuperPOD Are Critical for Scaling AI Models

This article explains the scaling laws behind large language models, the explosive growth of model sizes and compute demands, and why modern AI infrastructure must adopt SuperNode and SuperPOD architectures that combine high‑bandwidth Scale‑Up networks with flexible Scale‑Out networking to overcome bandwidth, latency, and power challenges.

AI scalingSuperPoDdistributed training

0 likes · 42 min read

Why SuperNode and SuperPOD Are Critical for Scaling AI Models

AI Frontier Lectures

May 12, 2025 · Artificial Intelligence

Can Scaling Reinforcement Learning Turn AI Models into Real Thinkers? Insights from Dan Roberts' AI Ascent Talk

In a recent AI Ascent presentation, OpenAI researcher Dan Roberts explained how scaling laws for both pre‑training and reinforcement learning reveal a new test‑time dimension of model performance, showcased the capabilities of the o1 and o3 models, and outlined a massive compute‑scaling strategy aimed at creating AI systems that can reason for years like Einstein.

AIFuture Predictionsmodel evaluation

0 likes · 9 min read

Can Scaling Reinforcement Learning Turn AI Models into Real Thinkers? Insights from Dan Roberts' AI Ascent Talk

21CTO

Apr 20, 2025 · Artificial Intelligence

Microsoft CTO Kevin Scott on AI Scaling Laws and the Rise of Specialized Agents

In a recent interview, Microsoft CTO Kevin Scott discusses the company’s AI progress, emphasizes that scaling laws have not yet reached their limits, predicts a future dominated by many specialized AI agents managed by knowledgeable product managers, and highlights the surprising impact of China’s DeepSeek project.

AIMicrosoftartificial-intelligence

0 likes · 3 min read

Microsoft CTO Kevin Scott on AI Scaling Laws and the Rise of Specialized Agents

Cognitive Technology Team

Mar 7, 2025 · Artificial Intelligence

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

This article traces the development of AI models—from early word embeddings like Word2Vec and ELMo, through transformer‑based encoders such as BERT and decoder‑only models like GPT‑1/2/3, to recent multimodal systems and scaling laws—explaining their architectures, training methods, and impact on modern AI applications.

AIEmbeddingMultimodal

0 likes · 22 min read

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

DataFunTalk

Feb 28, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3) and R1: Architecture, Training Strategies, Evaluation, and Distillation

An in‑depth overview of the DeepSeek LLM series (V1‑V3) and the R1 models, covering their architectures, scaling‑law experiments, data pipelines, training strategies—including MoE, MLA, FP8, multi‑step learning‑rate scheduling, reinforcement learning, and extensive evaluation results, as well as knowledge‑distillation techniques.

Mixture of Expertsscaling laws

0 likes · 36 min read

DeepSeek LLM Series (V1‑V3) and R1: Architecture, Training Strategies, Evaluation, and Distillation

Tencent Cloud Developer

Feb 27, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis

The DeepSeek technical overview details the evolution from the dense 67 B V1 model through the 236 B MoE‑based V2 and 671 B V3 with FP8 training, to the RL‑only R1 series that learns reasoning without supervision, highlighting innovations such as Grouped‑Query Attention, Multi‑Head Latent Attention, load‑balancing‑free MoE, Multi‑Token Prediction, and knowledge distillation, and reporting state‑of‑the‑art benchmark results and open‑source reproduction projects.

AI researchDeepSeekMixture of Experts

0 likes · 37 min read

DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis

NewBeeNLP

Feb 21, 2025 · Artificial Intelligence

Do Scaling Laws Still Hold? Analyzing Grok‑3, Deepseek and LLM Training Trends

The article examines whether pre‑training scaling laws remain valid, compares Grok‑3’s architecture and training strategy with Deepseek models, and explores how different scaling approaches—pre‑training, RL‑based, and test‑time—affect the cost‑effectiveness and intelligence of large language models.

AI researchGrok 3scaling laws

0 likes · 11 min read

Do Scaling Laws Still Hold? Analyzing Grok‑3, Deepseek and LLM Training Trends

Tencent Cloud Developer

Feb 6, 2025 · Artificial Intelligence

DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts

The article reviews DeepSeek’s V‑series papers, explaining how scaling‑law insights, Grouped Query Attention, a depth‑first design, loss‑free load balancing, multi‑token prediction and Multi‑Head Latent Attention together enable economical mixture‑of‑experts LLMs that rival closed‑source models while cutting compute and hardware costs.

DeepSeekGrouped Query AttentionMixture of Experts

0 likes · 13 min read

DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts

AIWalker

Jan 18, 2025 · Artificial Intelligence

How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data

Shanghai AI Laboratory’s InternLM 3.0 upgrade demonstrates that a refined 4 TB token dataset can boost a large‑language model’s performance beyond that of open‑source peers trained on 18 TB, cutting training cost by over 75% while merging regular dialogue with deep reasoning capabilities.

AI evaluationData EfficiencyInternLM

0 likes · 9 min read

How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data

AIWalker

Jan 16, 2025 · Artificial Intelligence

How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data

InternLM 3.0 (InternLM‑3) upgrades the Shusheng‑PuYu model by refining data to boost "thinking density", using only 4 TB of tokens to surpass peer open‑source models, cutting training cost by over 75% while merging ordinary dialogue with deep reasoning capabilities.

Data EfficiencyInternLMLarge Language Model

0 likes · 9 min read

Baobao Algorithm Notes

Dec 23, 2024 · Artificial Intelligence

From Zero to One: A Practical Guide to Pretraining Large Language Models

This comprehensive guide walks through every stage of building a large‑language‑model pretraining pipeline—from data sourcing, cleaning, and deduplication, to tokenizer design, model architecture choices, training framework selection, optimization tricks, and evaluation methods—providing actionable tips and pitfalls to avoid for both newcomers and seasoned practitioners.

LLM pretrainingdata collectionscaling laws

0 likes · 33 min read

From Zero to One: A Practical Guide to Pretraining Large Language Models

Architects' Tech Alliance

Dec 23, 2024 · Artificial Intelligence

Why High‑Quality, Massive, Diverse Data Fuels AI Breakthroughs

The article explains how breakthroughs in artificial intelligence depend on high‑quality, large‑scale, and diverse training data, outlines the data‑centric AI movement, details a six‑step workflow for building datasets, and surveys the data industry ecosystem supporting large language model development.

AI dataAnnotationData Quality

0 likes · 7 min read

Why High‑Quality, Massive, Diverse Data Fuels AI Breakthroughs

Fighter's World

Dec 21, 2024 · Artificial Intelligence

Is Pre‑training Coming to an End? Evaluating Data Sufficiency

The article examines Ilya Sutskever’s claim that pre‑training will end, argues that scaling laws still hold and data is not yet a bottleneck, highlights the scarcity of high‑quality frontier data, and explains why the industry is shifting toward inference‑time compute (o1) as a more sustainable path for large language models.

AI trendsInference‑time ComputePre‑training

0 likes · 13 min read

Is Pre‑training Coming to an End? Evaluating Data Sufficiency

Architects' Tech Alliance

Nov 10, 2024 · Industry Insights

AI Compute Infrastructure: Trends, Scaling Laws, and the Rise of Massive Clusters

The article analyzes the development of AI compute infrastructure, detailing the three‑level architecture from chip to cluster, the scaling law linking model parameters to compute demand, the rapid growth of massive “ten‑thousand‑card” clusters worldwide, and the emerging demand for inference workloads driving new deployment and scheduling strategies.

AI computeIndustry TrendsInference Demand

0 likes · 15 min read

AI Compute Infrastructure: Trends, Scaling Laws, and the Rise of Massive Clusters

DaTaobao Tech

Oct 30, 2024 · Artificial Intelligence

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

The article explains how OpenAI’s o1 model leverages chain‑of‑thought prompting, dual‑system cognitive theory, and new scaling laws—pre‑training on code/math and post‑training reinforcement with step‑wise reward models—to achieve superior reasoning, safety, and performance over GPT‑4, heralding a shift toward models that learn to think.

Chain-of-ThoughtLLMSafety

0 likes · 42 min read

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

Baobao Algorithm Notes

Sep 28, 2024 · Artificial Intelligence

Inside Llama 3: A Complete Guide to Modern LLM Training, Architecture, and Optimization

This article provides a thorough, yet concise, overview of Llama 3’s training pipeline, data handling, model architecture, scaling laws, post‑training techniques like SFT and DPO, and inference optimizations such as KV‑Cache, GQA, PagedAttention, and FP8 quantization, highlighting practical insights and benchmark results.

DPOKV cacheLLM training

0 likes · 32 min read

Inside Llama 3: A Complete Guide to Modern LLM Training, Architecture, and Optimization

DataFunSummit

Sep 24, 2024 · Artificial Intelligence

Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training

The article discusses the challenges of training ever‑larger AI models on internet‑scale data, critiques traditional batch ETL pipelines, and proposes a streaming data‑flow architecture with dynamic data selection and a shared‑memory/Alluxio middle layer to decouple data processing from model training, improving efficiency and scalability.

AI InfrastructureMultimodal Datadata pipelines

0 likes · 20 min read

Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training

Architects' Tech Alliance

Sep 4, 2024 · Fundamentals

Why Bigger Transformers Win: Scaling Laws and Parallel Computing Essentials

The article explains OpenAI's 2020 Scaling Laws that show larger transformer models, more data, and greater compute consistently improve performance, introduces the concept of emergent abilities at critical size thresholds, and outlines the core principles of parallel computing such as multi‑processor usage, task decomposition, concurrent execution, and inter‑processor communication.

Task Decompositioncommunicationconcurrency

0 likes · 6 min read

Why Bigger Transformers Win: Scaling Laws and Parallel Computing Essentials

DataFunSummit

Sep 1, 2024 · Artificial Intelligence

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

This article surveys data management for large language model training, covering an overview, pre‑training data composition, scaling‑law‑driven quantity control, quality filtering, deduplication, harmful‑content removal, instruction fine‑tuning strategies, dynamic data selection, and emerging research challenges such as bias mitigation, multimodal data handling, and synthetic‑data filtering.

Data Qualityinstruction fine-tuningpretraining

0 likes · 18 min read

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

Xiaohongshu Tech REDtech

Jul 29, 2024 · Artificial Intelligence

Scaling Laws for Dense Retrieval: Empirical Study of Model Size, Training Data, and Annotation Quality

The award‑winning study shows that dense retrieval performance follows precise power‑law scaling with model size, training data quantity, and annotation quality, introduces contrast entropy for evaluation, validates joint scaling formulas on MS MARCO and T2Ranking, and uses cost models to guide budget‑optimal resource allocation.

Information RetrievalModel Sizeannotation quality

0 likes · 13 min read

Scaling Laws for Dense Retrieval: Empirical Study of Model Size, Training Data, and Annotation Quality

NewBeeNLP

Jun 7, 2024 · Artificial Intelligence

Scaling Laws, Synthetic Data, and New Model Architectures: What’s Next?

In a recent round‑table, experts debated the validity of scaling laws, the role of synthetic and semi‑synthetic data in overcoming data scarcity, explored alternatives to Transformers such as RNN‑based models and MOE, and examined techniques for handling long‑context inference efficiently.

Mixture of Expertsmodel architecturescaling laws

0 likes · 12 min read

Scaling Laws, Synthetic Data, and New Model Architectures: What’s Next?

21CTO

Jun 2, 2024 · Artificial Intelligence

Geoff Hinton on Scaling Laws, Multimodal AI, and the Future of Intelligence

In a candid interview, Geoff Hinton reflects on his AI journey—from early disappointments in physiology and philosophy to breakthroughs in neural networks, scaling laws, multimodal learning, fast‑weight concepts, and the ethical challenges shaping the future of artificial intelligence.

AI ethicsGeoff HintonMultimodal AI

0 likes · 25 min read

Geoff Hinton on Scaling Laws, Multimodal AI, and the Future of Intelligence

NewBeeNLP

May 31, 2024 · Artificial Intelligence

Can Cleaned Web Data Rival Proprietary Corpora for LLM Training?

This article analyzes whether large‑scale web crawls, when meticulously filtered and deduplicated, can match or surpass the performance of high‑quality curated datasets in training large language models, covering dataset composition, processing pipelines, experimental results, scaling‑law implications, and future data‑efficiency strategies.

Dataset CleaningLLMWeb Data

0 likes · 23 min read

Can Cleaned Web Data Rival Proprietary Corpora for LLM Training?

NewBeeNLP

Mar 15, 2024 · Industry Insights

How Meta’s Generative Recommendation (GR) Is Redefining Feature Engineering

Meta’s new Generative Recommendation (GR) paper replaces a decade‑old hierarchical feature paradigm with an ultra‑long sequence transformer that directly fuses user profiles, behaviors, and targets, offering stronger feature crossing, richer information utilization, and massive compute gains, while revealing scaling‑law effects in recommendation systems.

MetaRecommendation Systemsgenerative models

0 likes · 9 min read

How Meta’s Generative Recommendation (GR) Is Redefining Feature Engineering

DataFunTalk

Mar 16, 2023 · Artificial Intelligence

Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling

The article summarizes the technical innovations behind GPT‑4, highlighting its multimodal abilities, improved alignment methods, scaling‑law‑based performance prediction, and remaining limitations, while referencing the official OpenAI technical report and community analyses.

AI researchGPT-4alignment

0 likes · 10 min read

Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling

Architect

Feb 9, 2023 · Artificial Intelligence

Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness

This article reviews recent research on the emergent abilities of large language models—such as chain‑of‑thought reasoning, knowledge retrieval without external sources, and robustness to distribution shifts—examining scaling laws, model size thresholds, and the open questions surrounding a potential paradigm shift from fine‑tuning to in‑context learning.

AI researchchain-of-thought promptingemergent abilities

0 likes · 23 min read

Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness

DataFunTalk

Jan 10, 2023 · Artificial Intelligence

Paradigm Shifts in Large Language Model Research and Future Directions

The article reviews the evolution of large language models from the pre‑GPT‑3 era to the present, analyzes the conceptual and technical gaps between Chinese and global research, and outlines key future research directions such as scaling laws, prompting techniques, multimodal training, and efficient model architectures.

AI researchChatGPTIn-Context Learning

0 likes · 73 min read

Paradigm Shifts in Large Language Model Research and Future Directions

Model Perspective

Nov 21, 2022 · Fundamentals

Why Ants Defy Gravity: The Science of Surface Area vs Volume

The article explains how ants can lift objects many times their weight by leveraging their huge surface‑to‑volume ratio, contrasting this with human scaling, and explores how surface area influences air resistance, falling speed, and even the challenges ants face with water due to surface tension.

ANTSPhysicsbiology

0 likes · 5 min read

Why Ants Defy Gravity: The Science of Surface Area vs Volume

Model Perspective

Oct 30, 2022 · Fundamentals

Why Giants Can’t Exist: The Physics of Scaling and Bone Strength

The article explains that when a human’s height is doubled, its volume and weight increase eightfold while bone strength only quadruples, making the legs unable to support the extra load, illustrating the scaling laws that prevent real giants from existing.

allometric scalingbiomechanicsphysics of bodies

0 likes · 4 min read

Why Giants Can’t Exist: The Physics of Scaling and Bone Strength