Author

AI2ML AI to Machine Learning

Original articles on artificial intelligence and machine learning, deep optimization. Less is more, life is simple! Shi Chunqi

Articles

Likes

Views

Comments

Latest from AI2ML AI to Machine Learning

48 recent articles

AI2ML AI to Machine Learning

Dec 3, 2025 · Artificial Intelligence

2026 Forecast: How Large‑Model AI Will Evolve After 2025 Breakthroughs

The article reviews the major 2025 breakthroughs in multimodal, open‑source, and deployment technologies for large models and outlines four 2026 trends—including ToC vs. ToB service split, dual‑hand data generation, MoE routing advances, and AI4Science breakthroughs—that will shape the next wave of AI development.

AI deploymentAI4ScienceMixture of Experts

0 likes · 6 min read

2026 Forecast: How Large‑Model AI Will Evolve After 2025 Breakthroughs

AI2ML AI to Machine Learning

Nov 5, 2025 · Artificial Intelligence

Why No Perfect VLM OCR Exists for Complex Financial Reports – An In‑Depth Model Comparison

The article evaluates several VLM‑based OCR models on complex financial statements, comparing speed, layout accuracy, and handling of irregular tables, and concludes that while some models excel in specific aspects, none yet deliver a flawless solution for all scenarios.

Infinity-ParserMinerU-VLMVLM OCR

0 likes · 8 min read

Why No Perfect VLM OCR Exists for Complex Financial Reports – An In‑Depth Model Comparison

AI2ML AI to Machine Learning

Nov 4, 2025 · Artificial Intelligence

Common Debugging Signals for Large Language Models

This article outlines the end‑to‑end workflow for large‑model training, highlights typical debugging challenges such as memory OOM, performance bottlenecks, and gradient issues, and provides concrete strategies, tools (DeepSpeed, Megatron, Torchtitan, veScale) and best‑practice checklists to help engineers diagnose and resolve problems efficiently.

DeepSpeedLLMMegatron

0 likes · 12 min read

Common Debugging Signals for Large Language Models

AI2ML AI to Machine Learning

Nov 3, 2025 · Artificial Intelligence

Smol Training Playbook: Secrets to Building World-Class LLMs

The article details the SmolLM3 3B‑parameter model, its architecture, dual‑mode inference, a three‑stage data‑curation strategy, rigorous ablation methods, preference optimisation (APO/DPO), model merging, and practical training‑stability tricks, offering a comprehensive guide for building high‑performing large language models.

APOLLM trainingcontext scaling

0 likes · 13 min read

Smol Training Playbook: Secrets to Building World-Class LLMs

AI2ML AI to Machine Learning

Oct 24, 2025 · Industry Insights

How Generative AI Is Fueling a New Wave of Insurance Fraud

Generative AI tools like DALL·E, Midjourney and deep‑fake platforms are enabling criminals to create highly realistic images, videos and documents, leading to a surge in sophisticated insurance fraud across auto, property, health and life lines, and forcing insurers to overhaul detection and regulatory practices.

AI detectiondeepfakegenerative AI

0 likes · 13 min read

How Generative AI Is Fueling a New Wave of Insurance Fraud

AI2ML AI to Machine Learning

Oct 24, 2025 · Artificial Intelligence

Beyond RAG: Three Emerging Knowledge‑Engineering Strategies (ICL, Online Learning, SLM)

The article outlines three post‑RAG knowledge‑engineering approaches—In‑Context Learning with dynamic few‑shot selection, Online Learning encompassing Meta‑Learning and Lifelong Learning to quickly adapt to new tasks, and the Small Language Model path that combines fine‑tuned task‑specific experts with LLM‑SLM collaboration for efficient, privacy‑preserving inference.

In-Context LearningKnowledge EngineeringLLM

0 likes · 4 min read

Beyond RAG: Three Emerging Knowledge‑Engineering Strategies (ICL, Online Learning, SLM)

AI2ML AI to Machine Learning

Oct 23, 2025 · Artificial Intelligence

Why Visually‑Rich Document Understanding Looks Like High‑End Docs: A Static Multimodal Overview

The article surveys the evolution of Visually‑Rich Document Understanding (VRDU), highlighting pioneering Chinese OCR research, the LayoutLM family, recent multimodal model breakthroughs, open‑source toolkits, and practical recommendations for handling diverse document types and tasks.

LayoutLMMultimodal OCRVisually-Rich Document Understanding

0 likes · 11 min read

Why Visually‑Rich Document Understanding Looks Like High‑End Docs: A Static Multimodal Overview

AI2ML AI to Machine Learning

Oct 20, 2025 · Artificial Intelligence

nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation

This article revisits nanochat's core components, detailing the preparation of diverse training datasets, the scaling calculations for tokens and parameters, the model's MQA and KV‑cache design, the full training pipeline with gradient accumulation and mixed‑precision, cost breakdown, inference optimizations, evaluation tasks, and identified limitations with suggested improvements.

LLMMQAPyTorch

0 likes · 9 min read

nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation

AI2ML AI to Machine Learning

Oct 19, 2025 · Artificial Intelligence

Deep Dive into nanochat: Source Code, Model Size Calculations, and Optimization Techniques

This article provides a thorough analysis of nanochat’s source code, detailing transformer component differences, precise parameter‑size formulas, FlashNorm and ReLU² innovations, scaling‑law insights, memory‑usage estimations, and the distributed optimizer and training pipelines used to build the model.

Distributed TrainingLLMmemory estimation

0 likes · 20 min read

Deep Dive into nanochat: Source Code, Model Size Calculations, and Optimization Techniques

AI2ML AI to Machine Learning

Oct 15, 2025 · Artificial Intelligence

NanoChat Source Code Deep Dive: Karpathy’s Full‑Stack LLM Pipeline Explained

This article dissects NanoChat’s end‑to‑end LLM pipeline—from a lightweight 561M‑parameter transformer and custom Rust BPE tokenizer to Chinchilla‑scaled training, multi‑task fine‑tuning, optional RL on GSM8K, KV‑cache inference optimizations, and benchmark results that slightly surpass GPT‑2 Large.

CORE benchmarkChinchilla scalingFastAPI

0 likes · 10 min read

NanoChat Source Code Deep Dive: Karpathy’s Full‑Stack LLM Pipeline Explained