AI2ML AI to Machine Learning
Author

AI2ML AI to Machine Learning

Original articles on artificial intelligence and machine learning, deep optimization. Less is more, life is simple! Shi Chunqi

48
Articles
0
Likes
0
Views
0
Comments
Recent Articles

Latest from AI2ML AI to Machine Learning

48 recent articles
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Dec 3, 2025 · Artificial Intelligence

2026 Forecast: How Large‑Model AI Will Evolve After 2025 Breakthroughs

The article reviews the major 2025 breakthroughs in multimodal, open‑source, and deployment technologies for large models and outlines four 2026 trends—including ToC vs. ToB service split, dual‑hand data generation, MoE routing advances, and AI4Science breakthroughs—that will shape the next wave of AI development.

AI deploymentAI4ScienceMixture of Experts
0 likes · 6 min read
2026 Forecast: How Large‑Model AI Will Evolve After 2025 Breakthroughs
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Nov 4, 2025 · Artificial Intelligence

Common Debugging Signals for Large Language Models

This article outlines the end‑to‑end workflow for large‑model training, highlights typical debugging challenges such as memory OOM, performance bottlenecks, and gradient issues, and provides concrete strategies, tools (DeepSpeed, Megatron, Torchtitan, veScale) and best‑practice checklists to help engineers diagnose and resolve problems efficiently.

DeepSpeedLLMMegatron
0 likes · 12 min read
Common Debugging Signals for Large Language Models
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Nov 3, 2025 · Artificial Intelligence

Smol Training Playbook: Secrets to Building World-Class LLMs

The article details the SmolLM3 3B‑parameter model, its architecture, dual‑mode inference, a three‑stage data‑curation strategy, rigorous ablation methods, preference optimisation (APO/DPO), model merging, and practical training‑stability tricks, offering a comprehensive guide for building high‑performing large language models.

APOLLM trainingcontext scaling
0 likes · 13 min read
Smol Training Playbook: Secrets to Building World-Class LLMs
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Oct 24, 2025 · Industry Insights

How Generative AI Is Fueling a New Wave of Insurance Fraud

Generative AI tools like DALL·E, Midjourney and deep‑fake platforms are enabling criminals to create highly realistic images, videos and documents, leading to a surge in sophisticated insurance fraud across auto, property, health and life lines, and forcing insurers to overhaul detection and regulatory practices.

AI detectiondeepfakegenerative AI
0 likes · 13 min read
How Generative AI Is Fueling a New Wave of Insurance Fraud
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Oct 24, 2025 · Artificial Intelligence

Beyond RAG: Three Emerging Knowledge‑Engineering Strategies (ICL, Online Learning, SLM)

The article outlines three post‑RAG knowledge‑engineering approaches—In‑Context Learning with dynamic few‑shot selection, Online Learning encompassing Meta‑Learning and Lifelong Learning to quickly adapt to new tasks, and the Small Language Model path that combines fine‑tuned task‑specific experts with LLM‑SLM collaboration for efficient, privacy‑preserving inference.

In-Context LearningKnowledge EngineeringLLM
0 likes · 4 min read
Beyond RAG: Three Emerging Knowledge‑Engineering Strategies (ICL, Online Learning, SLM)
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Oct 23, 2025 · Artificial Intelligence

Why Visually‑Rich Document Understanding Looks Like High‑End Docs: A Static Multimodal Overview

The article surveys the evolution of Visually‑Rich Document Understanding (VRDU), highlighting pioneering Chinese OCR research, the LayoutLM family, recent multimodal model breakthroughs, open‑source toolkits, and practical recommendations for handling diverse document types and tasks.

LayoutLMMultimodal OCRVisually-Rich Document Understanding
0 likes · 11 min read
Why Visually‑Rich Document Understanding Looks Like High‑End Docs: A Static Multimodal Overview
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Oct 20, 2025 · Artificial Intelligence

nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation

This article revisits nanochat's core components, detailing the preparation of diverse training datasets, the scaling calculations for tokens and parameters, the model's MQA and KV‑cache design, the full training pipeline with gradient accumulation and mixed‑precision, cost breakdown, inference optimizations, evaluation tasks, and identified limitations with suggested improvements.

LLMMQAPyTorch
0 likes · 9 min read
nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Oct 19, 2025 · Artificial Intelligence

Deep Dive into nanochat: Source Code, Model Size Calculations, and Optimization Techniques

This article provides a thorough analysis of nanochat’s source code, detailing transformer component differences, precise parameter‑size formulas, FlashNorm and ReLU² innovations, scaling‑law insights, memory‑usage estimations, and the distributed optimizer and training pipelines used to build the model.

Distributed TrainingLLMmemory estimation
0 likes · 20 min read
Deep Dive into nanochat: Source Code, Model Size Calculations, and Optimization Techniques
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Oct 15, 2025 · Artificial Intelligence

NanoChat Source Code Deep Dive: Karpathy’s Full‑Stack LLM Pipeline Explained

This article dissects NanoChat’s end‑to‑end LLM pipeline—from a lightweight 561M‑parameter transformer and custom Rust BPE tokenizer to Chinchilla‑scaled training, multi‑task fine‑tuning, optional RL on GSM8K, KV‑cache inference optimizations, and benchmark results that slightly surpass GPT‑2 Large.

CORE benchmarkChinchilla scalingFastAPI
0 likes · 10 min read
NanoChat Source Code Deep Dive: Karpathy’s Full‑Stack LLM Pipeline Explained