AI2ML AI to Machine Learning
Author

AI2ML AI to Machine Learning

Original articles on artificial intelligence and machine learning, deep optimization. Less is more, life is simple! Shi Chunqi

48
Articles
0
Likes
0
Views
0
Comments
Recent Articles

Latest from AI2ML AI to Machine Learning

48 recent articles
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Feb 7, 2026 · Artificial Intelligence

Why the ‘Skills’ Approach Is the Third Major Compromise Shaping Enterprise AI in 2026

The article argues that embracing the Skills paradigm— a lightweight, low‑cost alternative to large‑scale model training—represents the third major compromise in the large‑model era, balancing reduced emergence and planning hallucinations against increased stability and engineering efficiency for enterprise AI deployments.

Agentic AIEnterprise AIMixture of Experts
0 likes · 8 min read
Why the ‘Skills’ Approach Is the Third Major Compromise Shaping Enterprise AI in 2026
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Feb 4, 2026 · Artificial Intelligence

Google’s Second Sword: Accelerating LLM Inference with Speculative Decoding and Cascades

The article analyzes Google’s shift from scaling‑law to efficiency‑law, detailing how speculative decoding, language‑model cascades, distillation, CALM, accurate quantized training, and the Mixture‑of‑Recursions architecture together form a multi‑layered strategy to cut inference cost, boost throughput, and sustain the company’s AI moat.

Google TPUInference AccelerationLanguage Model Cascades
0 likes · 8 min read
Google’s Second Sword: Accelerating LLM Inference with Speculative Decoding and Cascades
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Jan 9, 2026 · Industry Insights

Why 2026 Will Be the Year Insurance Tech Explodes

The article analyzes how the AI explosion, breakthroughs like DeepSeek R1, and successful case studies such as Lemonade and AIG’s Underwriter Assistant are driving a shift in insurance from scale expansion to risk‑focused, AI‑native transformation in 2026, outlining strategic frameworks, agile tribe structures, modular delivery, and risk‑tolerant innovation processes.

AIagiledigital transformation
0 likes · 20 min read
Why 2026 Will Be the Year Insurance Tech Explodes
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Dec 29, 2025 · Artificial Intelligence

How Brin’s Return Powers Google’s First ‘Sword’: The TPU Hardware Revolution

The article examines Google’s AI resurgence after Sergey Brin’s comeback, detailing the evolution of TPU hardware from v1 to v7, the strategic focus on algorithmic efficiency, comparisons with Nvidia’s B200, the role of JAX/XLA, and how these advances create a powerful competitive moat for Google’s AI infrastructure.

AI hardwareGoogle TPUInference efficiency
0 likes · 8 min read
How Brin’s Return Powers Google’s First ‘Sword’: The TPU Hardware Revolution
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Dec 27, 2025 · Artificial Intelligence

Why Jeff Dean Champions Speculative Decoding: The Underlying Ideas

Jeff Dean highlighted speculative decoding as a lossless inference acceleration technique that can boost large language model throughput by 2–3×, and the article breaks down its core concepts—including parallel token verification, draft‑target model collaboration, rejection sampling theory, and practical optimizations such as continuous batching and tree‑based verification.

Continuous BatchingDraft-Target ModelInference Acceleration
0 likes · 8 min read
Why Jeff Dean Champions Speculative Decoding: The Underlying Ideas
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Dec 22, 2025 · Artificial Intelligence

The Core Ideas Behind Paged Attention for KV‑Caching

This article explains how Paged Attention, introduced by the vLLM team, applies virtual‑memory techniques, non‑contiguous block mapping, copy‑on‑write reuse, distributed scheduling, and hardware‑level optimizations to improve KV‑cache efficiency and reduce memory fragmentation in large language model serving.

Copy-on-WriteDistributed SchedulingGPU Memory Management
0 likes · 6 min read
The Core Ideas Behind Paged Attention for KV‑Caching
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Dec 21, 2025 · Artificial Intelligence

Why KV Caching Is Critical for Efficient LLM Inference

The article breaks down the principles of KV caching in large language models, explaining how Q/K/V behavior differs between training and inference, the role of prompts, cache size trade‑offs, and the complexities of concurrent inference, all backed by concrete examples and references.

Concurrent InferenceLLM InferenceMemory Optimization
0 likes · 7 min read
Why KV Caching Is Critical for Efficient LLM Inference
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Dec 19, 2025 · Artificial Intelligence

The 9 Key Ideas Behind FlashAttention

FlashAttention accelerates transformer inference by combining nine techniques—including loss‑less attention, GPU memory‑pyramid optimization, SRAM‑reusing tiling, safe softmax scaling, online buffering, tile‑size constraints, parallel multiplication, reduced KV slicing, and integrated backward‑pass caching—to achieve efficient, high‑throughput computation on modern GPUs.

Attention MechanismFlashAttentionGPU Optimization
0 likes · 8 min read
The 9 Key Ideas Behind FlashAttention
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Dec 16, 2025 · Industry Insights

Why Computer Science Majors Must Embrace a Massive Paradigm Shift

The article argues that traditional storage‑centric computer science curricula are becoming obsolete as AI‑driven, compute‑centric paradigms dominate hardware, data‑center operations, and software ecosystems, urging universities and students to rapidly adopt new teaching focus and skills.

AI hardwareCUDAassociative memory
0 likes · 10 min read
Why Computer Science Majors Must Embrace a Massive Paradigm Shift