Tagged articles
59 articles
Page 1 of 1
Data Party THU
Data Party THU
May 11, 2026 · Artificial Intelligence

How a 1930‑Era AI Model Without Any Computer Knowledge Learned to Write Python

The talkie‑1930‑13b language model, trained exclusively on English texts published before 1931, surprisingly understands historical events, solves Python coding problems, and exhibits scaling‑law behavior, prompting a detailed comparison with its modern twin talkie‑web‑13b and an analysis of training pipelines, memory categories, and common deployment pitfalls.

AI memoryLLMPython code generation
0 likes · 10 min read
How a 1930‑Era AI Model Without Any Computer Knowledge Learned to Write Python
DataFunTalk
DataFunTalk
May 10, 2026 · Artificial Intelligence

How AI Is Powering One‑Person Billion‑Dollar Startups and Multi‑Agent Software Collaboration

In a Code with Claude interview, Anthropic co‑founders Dario and Daniela Amodei explain how exponential AI growth—evidenced by an 80× revenue surge—creates compute bottlenecks, drives a shift to multi‑agent collaboration, and forces product teams to rethink development through scaling laws and Amdahl's Law.

Amdahl's LawCompute BottleneckProduct Development
0 likes · 26 min read
How AI Is Powering One‑Person Billion‑Dollar Startups and Multi‑Agent Software Collaboration
Data Party THU
Data Party THU
May 2, 2026 · Artificial Intelligence

Finally, Researchers Uncover Deep Learning’s “Newton’s Law”

A new collaborative paper from top universities proposes a unified “Learning Mechanics” framework for deep learning, outlining five research strands—from solvable idealized models and extreme limits to empirical scaling laws and hyper‑parameter theory—while drawing analogies to classical physics and highlighting ten open challenges.

Deep Learninghyperparameter theorylearning mechanics
0 likes · 16 min read
Finally, Researchers Uncover Deep Learning’s “Newton’s Law”
Machine Heart
Machine Heart
Apr 26, 2026 · Artificial Intelligence

Has Deep Learning Discovered Its Own “Newton’s Law”?

A new collaborative paper titled “There Will Be a Scientific Theory of Deep Learning” proposes a unified “Learning Mechanics” framework that connects solvable idealized models, tractable limits, empirical scaling laws, hyperparameter theory, and universal representation behavior, aiming to give deep learning a first‑principles scientific foundation.

Deep LearningNeural Networkshyperparameters
0 likes · 14 min read
Has Deep Learning Discovered Its Own “Newton’s Law”?
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 21, 2026 · Artificial Intelligence

How a 22‑Year‑Old Reversed‑Engineered Mythos into OpenMythos Using MoE and DeepSeek‑Inspired Attention

OpenMythos re‑creates the Claude Mythos architecture as a Recurrent‑Depth Transformer with MoE routing, achieving comparable performance to larger Transformers while using roughly half the parameters, and demonstrates systematic generalization and depth extrapolation through looped inference in latent space.

AI ArchitectureLooped Language ModelsMixture of Experts
0 likes · 6 min read
How a 22‑Year‑Old Reversed‑Engineered Mythos into OpenMythos Using MoE and DeepSeek‑Inspired Attention
Machine Heart
Machine Heart
Apr 7, 2026 · Artificial Intelligence

How Qianxun Raised ¥3 B in 30 Days: AI‑Powered Robotics Secrets

Qianxun Intelligent secured ¥30 billion in funding within a month, leveraged a scaling‑law data engine and the Spirit v1.5 VLA model to achieve breakthrough robot performance, and demonstrated the commercial loop through deployments at JD.com retail and CATL battery lines.

Embodied AIQianxun IntelligentRobotics
0 likes · 12 min read
How Qianxun Raised ¥3 B in 30 Days: AI‑Powered Robotics Secrets
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 2, 2026 · Artificial Intelligence

Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution

This article explains the full lifecycle of large language models in 2026, covering pretraining fundamentals, the limits of classic Scaling Laws, data‑centric advances, fine‑tuning strategies, RLHF, DPO, and the emerging post‑training methods GRPO, DAPO and RLVR, with concrete benchmarks and cost analyses.

DAPODPOFine-tuning
0 likes · 17 min read
Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution
DataFunSummit
DataFunSummit
Mar 29, 2026 · Artificial Intelligence

How Code Intelligence Is Evolving: From Foundation Models to Repository‑Level Agents

This article reviews the rapid evolution of code intelligence, covering the history of code foundation models, reinforcement‑learning optimizations, scaling‑law insights, the LoopCoder architecture, rigorous multi‑level evaluation suites, and the emergence of repository‑level code agents, while highlighting open‑source contributions such as Qwen‑Coder.

Code IntelligenceSoftware Engineeringcode evaluation
0 likes · 15 min read
How Code Intelligence Is Evolving: From Foundation Models to Repository‑Level Agents
Alimama Tech
Alimama Tech
Mar 26, 2026 · Industry Insights

How Alibaba’s Large User Model (LUM) Boosted CTR by 4.5% and Scaled to Billions of Parameters

The article analyzes the evolution from traditional modular recommendation models to a generative Large User Model (LUM), detailing its three‑stage paradigm, tokenization, training objectives, scaling‑law findings, offline and online experiments, and the AI‑infra innovations that enabled a 4.5% CTR lift in production.

CTR predictionGenerative ModelingRecommendation Systems
0 likes · 18 min read
How Alibaba’s Large User Model (LUM) Boosted CTR by 4.5% and Scaled to Billions of Parameters
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 24, 2026 · Artificial Intelligence

Jensen Huang Claims AGI Is Already Achieved, Ilya Is Wrong, Programmers to Reach 1 B

In a candid Lex Fridman interview, Nvidia CEO Jensen Huang asserts that AGI has already been realized, disputes Ilya Sutskever’s data‑limit claim, predicts a billion programmers, outlines scaling‑law dynamics, token‑priced AI services, data‑center energy strategies, and his hands‑on management philosophy for the AI era.

AGIAI ManagementData Centers
0 likes · 37 min read
Jensen Huang Claims AGI Is Already Achieved, Ilya Is Wrong, Programmers to Reach 1 B
SuanNi
SuanNi
Mar 4, 2026 · Artificial Intelligence

How to Fit Large Language Models into Cars and Robots: A Hardware‑Aware Scaling Law

This article presents a hardware‑aware co‑design framework for edge‑deployed large language models, revealing a scaling law that balances model accuracy and inference latency, and demonstrates how Pareto‑optimal architectures can be discovered quickly using roofline analysis and systematic search on devices like NVIDIA Jetson Orin.

AI inferenceEdge ComputingPareto optimization
0 likes · 15 min read
How to Fit Large Language Models into Cars and Robots: A Hardware‑Aware Scaling Law
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 23, 2026 · Artificial Intelligence

System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs

This article systematically dissects the engineering decisions behind frontier large‑language‑model training—covering architecture choices, attention variants, optimizer evolution, data‑curation strategies, scaling‑law insights, and post‑training SFT/RL pipelines—based on open‑source reports from seven leading AI laboratories.

Mixture of ExpertsModel Traininglarge language models
0 likes · 26 min read
System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs
Top Architect
Top Architect
Feb 14, 2026 · Artificial Intelligence

Why Test‑Time Compute Is the Next Breakthrough for Large Language Models

The article explains how inference‑oriented large language models shift the focus from training‑time resources to test‑time computation, detailing scaling laws, verification techniques, reinforcement‑learning pipelines such as DeepSeek‑R1, and methods for distilling reasoning abilities into smaller, consumer‑grade models.

Prompt Engineeringinference computelarge language models
0 likes · 19 min read
Why Test‑Time Compute Is the Next Breakthrough for Large Language Models
AI Cyberspace
AI Cyberspace
Jan 18, 2026 · Artificial Intelligence

Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training

The article explains various learning paradigms (supervised, unsupervised, self‑supervised, semi‑supervised, and reinforcement), describes dataset types and quality considerations, outlines preprocessing steps like filtering, deduplication, and tokenization, and discusses scaling laws linking model size, data volume, and compute resources, with concrete examples and code.

Model Trainingdata preprocessingmachine learning
0 likes · 26 min read
Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training
JD Tech
JD Tech
Jan 13, 2026 · Artificial Intelligence

Mastering Large Language Models: Transformers, Scaling Laws, and MoE Explained

This extensive guide walks readers through the fundamentals of large language models, covering transformer architecture, pre‑training and fine‑tuning techniques, scaling laws, emergent abilities, mixture‑of‑experts designs, and practical comparisons, providing clear explanations, code snippets, and visual illustrations for deep learning practitioners.

Fine-tuningMixture of Expertsemergent abilities
0 likes · 47 min read
Mastering Large Language Models: Transformers, Scaling Laws, and MoE Explained
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Dec 29, 2025 · Artificial Intelligence

How Brin’s Return Powers Google’s First ‘Sword’: The TPU Hardware Revolution

The article examines Google’s AI resurgence after Sergey Brin’s comeback, detailing the evolution of TPU hardware from v1 to v7, the strategic focus on algorithmic efficiency, comparisons with Nvidia’s B200, the role of JAX/XLA, and how these advances create a powerful competitive moat for Google’s AI infrastructure.

AI hardwareGoogle TPUJAX
0 likes · 8 min read
How Brin’s Return Powers Google’s First ‘Sword’: The TPU Hardware Revolution
PaperAgent
PaperAgent
Dec 22, 2025 · Artificial Intelligence

Can Budget‑Aware Tool Use Unlock Scalable AI Agents? A Deep Dive

This article analyzes recent Google research on test‑time scaling and agentization, introducing budget‑aware tool use and the BATS framework, presenting experimental results across 180 configurations, uncovering scaling laws, and offering a predictive model for optimal multi‑agent architectures.

AI AgentsBATS frameworkLLM Tool Use
0 likes · 7 min read
Can Budget‑Aware Tool Use Unlock Scalable AI Agents? A Deep Dive
Tencent Technical Engineering
Tencent Technical Engineering
Dec 1, 2025 · Artificial Intelligence

Do Machines Really Think? Inside Deep Reasoning, Scaling Laws & RLHF for LLMs

This article examines whether large language models truly think, explores the origins of deep reasoning through transformer architectures and scaling laws, reviews chain‑of‑thought and its variants, and analyzes how reinforcement learning from human feedback—including PPO, DPO, and GRPO—helps internalise step‑by‑step reasoning while pointing to future directions such as atomic thought, hierarchical models, and training‑free in‑context knowledge bases.

AI AlignmentLLMRLHF
0 likes · 35 min read
Do Machines Really Think? Inside Deep Reasoning, Scaling Laws & RLHF for LLMs
AntTech
AntTech
Nov 11, 2025 · Artificial Intelligence

Breaking the Efficiency Wall: Ant Group’s Bailing Model Paves the Way to AGI

At CNCC 2025, Ant Group’s Vice President Zhou Jun outlined the Bailing large‑model’s five‑layer architecture, hybrid linear attention, Ling Scaling Law, and novel training algorithms that dramatically cut costs and latency, achieving state‑of‑the‑art performance on math and code benchmarks while promoting open‑source collaboration toward AGI.

AGIMixture of Expertslarge language models
0 likes · 8 min read
Breaking the Efficiency Wall: Ant Group’s Bailing Model Paves the Way to AGI
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Oct 22, 2025 · Artificial Intelligence

Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment

This guide walks through the complete end‑to‑end process of training a large language model from scratch, covering data collection, cleaning, tokenization, pre‑training objectives and engineering, post‑training alignment methods, scaling laws, over‑fitting mitigation, and gradient‑stability techniques.

AlignmentLLMgradient stability
0 likes · 9 min read
Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment
Data Party THU
Data Party THU
Oct 21, 2025 · Artificial Intelligence

Can Linear‑Time LSTMs Beat Transformers? Scaling Laws Reveal the Answer

The paper presents a systematic scaling‑law study of the linear‑time xLSTM architecture versus quadratic‑time Transformers, evaluating parameter‑data loss surfaces, optimal model size under equal FLOP budgets, and inference latency components, and shows that xLSTM consistently offers better cost‑effectiveness across diverse contexts and budgets.

Inference OptimizationLinear Time ComplexityTransformer
0 likes · 11 min read
Can Linear‑Time LSTMs Beat Transformers? Scaling Laws Reveal the Answer
21CTO
21CTO
Sep 29, 2025 · Artificial Intelligence

Why Open‑Source Is the Key to China’s AI Future, According to Li Kaifu

Li Kaifu argues that open‑source large‑model ecosystems are essential for China to close the AI gap with the United States, highlighting DeepSeek’s impact, shifting scaling laws, and the emerging role of AI‑to‑AI teaching as the next development frontier.

China AIartificial intelligencelarge language models
0 likes · 4 min read
Why Open‑Source Is the Key to China’s AI Future, According to Li Kaifu
Fighter's World
Fighter's World
Aug 15, 2025 · Artificial Intelligence

Why GPT‑5 Is Still Far From AGI Yet Near Scalable Profitability

The article analyzes GPT‑5’s release, its unified multi‑model architecture with a real‑time router, improved reasoning, coding and tool‑use capabilities, reduced hallucinations, and how these technical shifts reshape AI commercialization, investment logic, competition and enterprise adoption.

AI commercializationAgentic AIGPT-5
0 likes · 20 min read
Why GPT‑5 Is Still Far From AGI Yet Near Scalable Profitability
Data Party THU
Data Party THU
Aug 5, 2025 · Artificial Intelligence

Why State Space Models May Outperform Transformers: A Deep Dive

The article provides a comprehensive technical analysis of state space models (SSM) versus Transformers, covering their core mechanisms, three essential design factors, training efficiency, scaling behavior, tokenization debates, and experimental evidence that highlights the trade‑offs and potential advantages of SSMs in modern AI systems.

MambaState Space ModelTransformer
0 likes · 21 min read
Why State Space Models May Outperform Transformers: A Deep Dive
Data Thinking Notes
Data Thinking Notes
Jul 30, 2025 · Artificial Intelligence

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

This article reviews the most influential papers in large language model research since 2017, covering foundational works such as the Transformer, GPT‑3, BERT, scaling laws, and recent innovations like FlashAttention, Mamba, and QLoRA, highlighting their core contributions and impact on AI development.

AI researchModel OptimizationTransformer
0 likes · 28 min read
Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs
Kuaishou Large Model
Kuaishou Large Model
Jun 20, 2025 · Artificial Intelligence

How OneRec Revolutionizes Short-Video Recommendations with End-to-End Generative AI

OneRec, an end-to-end generative recommendation system from Kuaishou, uses an encoder-decoder architecture, reward-based preference alignment, and reinforcement learning to dramatically improve video recommendation efficiency, boosting user engagement and reducing operational costs while achieving scaling-law performance comparable to large language models.

Kuaishouefficiencygenerative AI
0 likes · 18 min read
How OneRec Revolutionizes Short-Video Recommendations with End-to-End Generative AI
Kuaishou Tech
Kuaishou Tech
Jun 20, 2025 · Artificial Intelligence

How OneRec Redefines Recommendation with End‑to‑End Generative Modeling and RL Alignment

The OneRec system from Kuaishou replaces traditional cascade recommendation pipelines with an encoder‑decoder architecture, leverages reward‑based preference alignment via reinforcement learning, achieves ten‑fold FLOPs gains, cuts operational costs by 90%, and delivers significant user‑engagement improvements across short‑video and local‑service scenarios.

Generative ModelingKuaishouOneRec
0 likes · 17 min read
How OneRec Redefines Recommendation with End‑to‑End Generative Modeling and RL Alignment
AntTech
AntTech
Jun 18, 2025 · Artificial Intelligence

How Ant Group’s Baoling Models Push Toward AGI with MoE and Multimodal Innovations

In a detailed AICon talk, Ant Group’s Baoling team leader Zhou Jun outlines their latest large‑model training techniques, MoE architecture optimizations, multimodal breakthroughs, open‑source releases, and the strategic roadmap needed to turn AI into a ubiquitous, “scan‑code‑level” everyday assistant.

AI InfrastructureMixture of Expertslarge language models
0 likes · 25 min read
How Ant Group’s Baoling Models Push Toward AGI with MoE and Multimodal Innovations
AI Cyberspace
AI Cyberspace
May 20, 2025 · Artificial Intelligence

Why SuperNode and SuperPOD Are Critical for Scaling AI Models

This article explains the scaling laws behind large language models, the explosive growth of model sizes and compute demands, and why modern AI infrastructure must adopt SuperNode and SuperPOD architectures that combine high‑bandwidth Scale‑Up networks with flexible Scale‑Out networking to overcome bandwidth, latency, and power challenges.

AI scalingDistributed TrainingSuperPoD
0 likes · 42 min read
Why SuperNode and SuperPOD Are Critical for Scaling AI Models
AI Frontier Lectures
AI Frontier Lectures
May 12, 2025 · Artificial Intelligence

Can Scaling Reinforcement Learning Turn AI Models into Real Thinkers? Insights from Dan Roberts' AI Ascent Talk

In a recent AI Ascent presentation, OpenAI researcher Dan Roberts explained how scaling laws for both pre‑training and reinforcement learning reveal a new test‑time dimension of model performance, showcased the capabilities of the o1 and o3 models, and outlined a massive compute‑scaling strategy aimed at creating AI systems that can reason for years like Einstein.

AIFuture PredictionsModel Evaluation
0 likes · 9 min read
Can Scaling Reinforcement Learning Turn AI Models into Real Thinkers? Insights from Dan Roberts' AI Ascent Talk
21CTO
21CTO
Apr 20, 2025 · Artificial Intelligence

Microsoft CTO Kevin Scott on AI Scaling Laws and the Rise of Specialized Agents

In a recent interview, Microsoft CTO Kevin Scott discusses the company’s AI progress, emphasizes that scaling laws have not yet reached their limits, predicts a future dominated by many specialized AI agents managed by knowledgeable product managers, and highlights the surprising impact of China’s DeepSeek project.

AIMicrosoftartificial intelligence
0 likes · 3 min read
Microsoft CTO Kevin Scott on AI Scaling Laws and the Rise of Specialized Agents
Cognitive Technology Team
Cognitive Technology Team
Mar 7, 2025 · Artificial Intelligence

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

This article traces the development of AI models—from early word embeddings like Word2Vec and ELMo, through transformer‑based encoders such as BERT and decoder‑only models like GPT‑1/2/3, to recent multimodal systems and scaling laws—explaining their architectures, training methods, and impact on modern AI applications.

AIEmbeddingTransformer
0 likes · 22 min read
From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution
DataFunTalk
DataFunTalk
Feb 28, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3) and R1: Architecture, Training Strategies, Evaluation, and Distillation

An in‑depth overview of the DeepSeek LLM series (V1‑V3) and the R1 models, covering their architectures, scaling‑law experiments, data pipelines, training strategies—including MoE, MLA, FP8, multi‑step learning‑rate scheduling, reinforcement learning, and extensive evaluation results, as well as knowledge‑distillation techniques.

Mixture of Expertsscaling laws
0 likes · 36 min read
DeepSeek LLM Series (V1‑V3) and R1: Architecture, Training Strategies, Evaluation, and Distillation
Tencent Cloud Developer
Tencent Cloud Developer
Feb 27, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis

The DeepSeek technical overview details the evolution from the dense 67 B V1 model through the 236 B MoE‑based V2 and 671 B V3 with FP8 training, to the RL‑only R1 series that learns reasoning without supervision, highlighting innovations such as Grouped‑Query Attention, Multi‑Head Latent Attention, load‑balancing‑free MoE, Multi‑Token Prediction, and knowledge distillation, and reporting state‑of‑the‑art benchmark results and open‑source reproduction projects.

AI researchDeepSeekMixture of Experts
0 likes · 37 min read
DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis
NewBeeNLP
NewBeeNLP
Feb 21, 2025 · Artificial Intelligence

Do Scaling Laws Still Hold? Analyzing Grok‑3, Deepseek and LLM Training Trends

The article examines whether pre‑training scaling laws remain valid, compares Grok‑3’s architecture and training strategy with Deepseek models, and explores how different scaling approaches—pre‑training, RL‑based, and test‑time—affect the cost‑effectiveness and intelligence of large language models.

AI researchGrok-3scaling laws
0 likes · 11 min read
Do Scaling Laws Still Hold? Analyzing Grok‑3, Deepseek and LLM Training Trends
Tencent Cloud Developer
Tencent Cloud Developer
Feb 6, 2025 · Artificial Intelligence

DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts

The article reviews DeepSeek’s V‑series papers, explaining how scaling‑law insights, Grouped Query Attention, a depth‑first design, loss‑free load balancing, multi‑token prediction and Multi‑Head Latent Attention together enable economical mixture‑of‑experts LLMs that rival closed‑source models while cutting compute and hardware costs.

DeepSeekGrouped Query AttentionMixture of Experts
0 likes · 13 min read
DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts
AIWalker
AIWalker
Jan 18, 2025 · Artificial Intelligence

How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data

Shanghai AI Laboratory’s InternLM 3.0 upgrade demonstrates that a refined 4 TB token dataset can boost a large‑language model’s performance beyond that of open‑source peers trained on 18 TB, cutting training cost by over 75% while merging regular dialogue with deep reasoning capabilities.

AI EvaluationInternLMdata efficiency
0 likes · 9 min read
How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data
AIWalker
AIWalker
Jan 16, 2025 · Artificial Intelligence

How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data

InternLM 3.0 (InternLM‑3) upgrades the Shusheng‑PuYu model by refining data to boost "thinking density", using only 4 TB of tokens to surpass peer open‑source models, cutting training cost by over 75% while merging ordinary dialogue with deep reasoning capabilities.

InternLMModel Evaluationdata efficiency
0 likes · 9 min read
How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 23, 2024 · Artificial Intelligence

From Zero to One: A Practical Guide to Pretraining Large Language Models

This comprehensive guide walks through every stage of building a large‑language‑model pretraining pipeline—from data sourcing, cleaning, and deduplication, to tokenizer design, model architecture choices, training framework selection, optimization tricks, and evaluation methods—providing actionable tips and pitfalls to avoid for both newcomers and seasoned practitioners.

LLM Pretrainingdata collectionscaling laws
0 likes · 33 min read
From Zero to One: A Practical Guide to Pretraining Large Language Models
Architects' Tech Alliance
Architects' Tech Alliance
Dec 23, 2024 · Artificial Intelligence

Why High‑Quality, Massive, Diverse Data Fuels AI Breakthroughs

The article explains how breakthroughs in artificial intelligence depend on high‑quality, large‑scale, and diverse training data, outlines the data‑centric AI movement, details a six‑step workflow for building datasets, and surveys the data industry ecosystem supporting large language model development.

AI dataData QualityData‑Centric AI
0 likes · 7 min read
Why High‑Quality, Massive, Diverse Data Fuels AI Breakthroughs
Fighter's World
Fighter's World
Dec 21, 2024 · Artificial Intelligence

Is Pre‑training Coming to an End? Evaluating Data Sufficiency

The article examines Ilya Sutskever’s claim that pre‑training will end, argues that scaling laws still hold and data is not yet a bottleneck, highlights the scarcity of high‑quality frontier data, and explains why the industry is shifting toward inference‑time compute (o1) as a more sustainable path for large language models.

AI trendsData WallInference‑time Compute
0 likes · 13 min read
Is Pre‑training Coming to an End? Evaluating Data Sufficiency
Architects' Tech Alliance
Architects' Tech Alliance
Nov 10, 2024 · Industry Insights

AI Compute Infrastructure: Trends, Scaling Laws, and the Rise of Massive Clusters

The article analyzes the development of AI compute infrastructure, detailing the three‑level architecture from chip to cluster, the scaling law linking model parameters to compute demand, the rapid growth of massive “ten‑thousand‑card” clusters worldwide, and the emerging demand for inference workloads driving new deployment and scheduling strategies.

AI computeInference DemandInfrastructure
0 likes · 15 min read
AI Compute Infrastructure: Trends, Scaling Laws, and the Rise of Massive Clusters
DaTaobao Tech
DaTaobao Tech
Oct 30, 2024 · Artificial Intelligence

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

The article explains how OpenAI’s o1 model leverages chain‑of‑thought prompting, dual‑system cognitive theory, and new scaling laws—pre‑training on code/math and post‑training reinforcement with step‑wise reward models—to achieve superior reasoning, safety, and performance over GPT‑4, heralding a shift toward models that learn to think.

LLMSafetychain-of-thought
0 likes · 42 min read
Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 28, 2024 · Artificial Intelligence

Inside Llama 3: A Complete Guide to Modern LLM Training, Architecture, and Optimization

This article provides a thorough, yet concise, overview of Llama 3’s training pipeline, data handling, model architecture, scaling laws, post‑training techniques like SFT and DPO, and inference optimizations such as KV‑Cache, GQA, PagedAttention, and FP8 quantization, highlighting practical insights and benchmark results.

DPOInferenceKV cache
0 likes · 32 min read
Inside Llama 3: A Complete Guide to Modern LLM Training, Architecture, and Optimization
DataFunSummit
DataFunSummit
Sep 24, 2024 · Artificial Intelligence

Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training

The article discusses the challenges of training ever‑larger AI models on internet‑scale data, critiques traditional batch ETL pipelines, and proposes a streaming data‑flow architecture with dynamic data selection and a shared‑memory/Alluxio middle layer to decouple data processing from model training, improving efficiency and scalability.

AI InfrastructureMultimodal Datadata pipelines
0 likes · 20 min read
Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training
Architects' Tech Alliance
Architects' Tech Alliance
Sep 4, 2024 · Fundamentals

Why Bigger Transformers Win: Scaling Laws and Parallel Computing Essentials

The article explains OpenAI's 2020 Scaling Laws that show larger transformer models, more data, and greater compute consistently improve performance, introduces the concept of emergent abilities at critical size thresholds, and outlines the core principles of parallel computing such as multi‑processor usage, task decomposition, concurrent execution, and inter‑processor communication.

communicationconcurrencyemergent abilities
0 likes · 6 min read
Why Bigger Transformers Win: Scaling Laws and Parallel Computing Essentials
DataFunSummit
DataFunSummit
Sep 1, 2024 · Artificial Intelligence

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

This article surveys data management for large language model training, covering an overview, pre‑training data composition, scaling‑law‑driven quantity control, quality filtering, deduplication, harmful‑content removal, instruction fine‑tuning strategies, dynamic data selection, and emerging research challenges such as bias mitigation, multimodal data handling, and synthetic‑data filtering.

Data Qualityinstruction fine-tuningpretraining
0 likes · 18 min read
Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jul 29, 2024 · Artificial Intelligence

Scaling Laws for Dense Retrieval: Empirical Study of Model Size, Training Data, and Annotation Quality

The award‑winning study shows that dense retrieval performance follows precise power‑law scaling with model size, training data quantity, and annotation quality, introduces contrast entropy for evaluation, validates joint scaling formulas on MS MARCO and T2Ranking, and uses cost models to guide budget‑optimal resource allocation.

Model Sizeannotation qualitycontrast entropy
0 likes · 13 min read
Scaling Laws for Dense Retrieval: Empirical Study of Model Size, Training Data, and Annotation Quality
NewBeeNLP
NewBeeNLP
Jun 7, 2024 · Artificial Intelligence

Scaling Laws, Synthetic Data, and New Model Architectures: What’s Next?

In a recent round‑table, experts debated the validity of scaling laws, the role of synthetic and semi‑synthetic data in overcoming data scarcity, explored alternatives to Transformers such as RNN‑based models and MOE, and examined techniques for handling long‑context inference efficiently.

Mixture of ExpertsModel architecturescaling laws
0 likes · 12 min read
Scaling Laws, Synthetic Data, and New Model Architectures: What’s Next?
21CTO
21CTO
Jun 2, 2024 · Artificial Intelligence

Geoff Hinton on Scaling Laws, Multimodal AI, and the Future of Intelligence

In a candid interview, Geoff Hinton reflects on his AI journey—from early disappointments in physiology and philosophy to breakthroughs in neural networks, scaling laws, multimodal learning, fast‑weight concepts, and the ethical challenges shaping the future of artificial intelligence.

AI ethicsDeep LearningGeoff Hinton
0 likes · 25 min read
Geoff Hinton on Scaling Laws, Multimodal AI, and the Future of Intelligence
NewBeeNLP
NewBeeNLP
May 31, 2024 · Artificial Intelligence

Can Cleaned Web Data Rival Proprietary Corpora for LLM Training?

This article analyzes whether large‑scale web crawls, when meticulously filtered and deduplicated, can match or surpass the performance of high‑quality curated datasets in training large language models, covering dataset composition, processing pipelines, experimental results, scaling‑law implications, and future data‑efficiency strategies.

Dataset CleaningLLMWeb Data
0 likes · 23 min read
Can Cleaned Web Data Rival Proprietary Corpora for LLM Training?
NewBeeNLP
NewBeeNLP
Mar 15, 2024 · Industry Insights

How Meta’s Generative Recommendation (GR) Is Redefining Feature Engineering

Meta’s new Generative Recommendation (GR) paper replaces a decade‑old hierarchical feature paradigm with an ultra‑long sequence transformer that directly fuses user profiles, behaviors, and targets, offering stronger feature crossing, richer information utilization, and massive compute gains, while revealing scaling‑law effects in recommendation systems.

Generative ModelsMetaRecommendation Systems
0 likes · 9 min read
How Meta’s Generative Recommendation (GR) Is Redefining Feature Engineering
DataFunTalk
DataFunTalk
Mar 16, 2023 · Artificial Intelligence

Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling

The article summarizes the technical innovations behind GPT‑4, highlighting its multimodal abilities, improved alignment methods, scaling‑law‑based performance prediction, and remaining limitations, while referencing the official OpenAI technical report and community analyses.

AI researchAlignmentGPT-4
0 likes · 10 min read
Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling
Architect
Architect
Feb 9, 2023 · Artificial Intelligence

Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness

This article reviews recent research on the emergent abilities of large language models—such as chain‑of‑thought reasoning, knowledge retrieval without external sources, and robustness to distribution shifts—examining scaling laws, model size thresholds, and the open questions surrounding a potential paradigm shift from fine‑tuning to in‑context learning.

AI researchchain-of-thought promptingemergent abilities
0 likes · 23 min read
Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness
DataFunTalk
DataFunTalk
Jan 10, 2023 · Artificial Intelligence

Paradigm Shifts in Large Language Model Research and Future Directions

The article reviews the evolution of large language models from the pre‑GPT‑3 era to the present, analyzes the conceptual and technical gaps between Chinese and global research, and outlines key future research directions such as scaling laws, prompting techniques, multimodal training, and efficient model architectures.

AI researchChatGPTIn-Context Learning
0 likes · 73 min read
Paradigm Shifts in Large Language Model Research and Future Directions
Model Perspective
Model Perspective
Nov 21, 2022 · Fundamentals

Why Ants Defy Gravity: The Science of Surface Area vs Volume

The article explains how ants can lift objects many times their weight by leveraging their huge surface‑to‑volume ratio, contrasting this with human scaling, and explores how surface area influences air resistance, falling speed, and even the challenges ants face with water due to surface tension.

ANTSPhysicsbiology
0 likes · 5 min read
Why Ants Defy Gravity: The Science of Surface Area vs Volume
Model Perspective
Model Perspective
Oct 30, 2022 · Fundamentals

Why Giants Can’t Exist: The Physics of Scaling and Bone Strength

The article explains that when a human’s height is doubled, its volume and weight increase eightfold while bone strength only quadruples, making the legs unable to support the extra load, illustrating the scaling laws that prevent real giants from existing.

allometric scalingbiomechanicsphysics of bodies
0 likes · 4 min read
Why Giants Can’t Exist: The Physics of Scaling and Bone Strength