Author

AI2ML AI to Machine Learning

Original articles on artificial intelligence and machine learning, deep optimization. Less is more, life is simple! Shi Chunqi

Articles

Likes

Views

Comments

Latest from AI2ML AI to Machine Learning

48 recent articles

AI2ML AI to Machine Learning

Feb 7, 2026 · Artificial Intelligence

The Alarming Implication of Claude Opus 4.6: Offline Open‑Source LLMs Are the Strongest Corporate Moat

Claude Opus 4.6 showcases unprecedented finance‑scenario analysis and a powerful Skills integration, prompting the author to argue that enterprises should adopt offline open‑source large language models to safeguard their proprietary prompts and maintain a robust competitive moat.

ClaudeEnterpriseFinance

0 likes · 4 min read

The Alarming Implication of Claude Opus 4.6: Offline Open‑Source LLMs Are the Strongest Corporate Moat

AI2ML AI to Machine Learning

Feb 7, 2026 · Artificial Intelligence

Why the ‘Skills’ Approach Is the Third Major Compromise Shaping Enterprise AI in 2026

The article argues that embracing the Skills paradigm— a lightweight, low‑cost alternative to large‑scale model training—represents the third major compromise in the large‑model era, balancing reduced emergence and planning hallucinations against increased stability and engineering efficiency for enterprise AI deployments.

Agentic AIEnterprise AIMixture of Experts

0 likes · 8 min read

Why the ‘Skills’ Approach Is the Third Major Compromise Shaping Enterprise AI in 2026

AI2ML AI to Machine Learning

Feb 4, 2026 · Artificial Intelligence

Google’s Second Sword: Accelerating LLM Inference with Speculative Decoding and Cascades

The article analyzes Google’s shift from scaling‑law to efficiency‑law, detailing how speculative decoding, language‑model cascades, distillation, CALM, accurate quantized training, and the Mixture‑of‑Recursions architecture together form a multi‑layered strategy to cut inference cost, boost throughput, and sustain the company’s AI moat.

Google TPUInference AccelerationLanguage Model Cascades

0 likes · 8 min read

Google’s Second Sword: Accelerating LLM Inference with Speculative Decoding and Cascades

AI2ML AI to Machine Learning

Jan 9, 2026 · Industry Insights

Why 2026 Will Be the Year Insurance Tech Explodes

The article analyzes how the AI explosion, breakthroughs like DeepSeek R1, and successful case studies such as Lemonade and AIG’s Underwriter Assistant are driving a shift in insurance from scale expansion to risk‑focused, AI‑native transformation in 2026, outlining strategic frameworks, agile tribe structures, modular delivery, and risk‑tolerant innovation processes.

AIagiledigital transformation

0 likes · 20 min read

Why 2026 Will Be the Year Insurance Tech Explodes

AI2ML AI to Machine Learning

Dec 29, 2025 · Artificial Intelligence

How Brin’s Return Powers Google’s First ‘Sword’: The TPU Hardware Revolution

The article examines Google’s AI resurgence after Sergey Brin’s comeback, detailing the evolution of TPU hardware from v1 to v7, the strategic focus on algorithmic efficiency, comparisons with Nvidia’s B200, the role of JAX/XLA, and how these advances create a powerful competitive moat for Google’s AI infrastructure.

AI hardwareGoogle TPUInference efficiency

0 likes · 8 min read

How Brin’s Return Powers Google’s First ‘Sword’: The TPU Hardware Revolution

AI2ML AI to Machine Learning

Dec 27, 2025 · Artificial Intelligence

Why Jeff Dean Champions Speculative Decoding: The Underlying Ideas

Jeff Dean highlighted speculative decoding as a lossless inference acceleration technique that can boost large language model throughput by 2–3×, and the article breaks down its core concepts—including parallel token verification, draft‑target model collaboration, rejection sampling theory, and practical optimizations such as continuous batching and tree‑based verification.

Continuous BatchingDraft-Target ModelInference Acceleration

0 likes · 8 min read

Why Jeff Dean Champions Speculative Decoding: The Underlying Ideas

AI2ML AI to Machine Learning

Dec 22, 2025 · Artificial Intelligence

The Core Ideas Behind Paged Attention for KV‑Caching

This article explains how Paged Attention, introduced by the vLLM team, applies virtual‑memory techniques, non‑contiguous block mapping, copy‑on‑write reuse, distributed scheduling, and hardware‑level optimizations to improve KV‑cache efficiency and reduce memory fragmentation in large language model serving.

Copy-on-WriteDistributed SchedulingGPU Memory Management

0 likes · 6 min read

The Core Ideas Behind Paged Attention for KV‑Caching

AI2ML AI to Machine Learning

Dec 21, 2025 · Artificial Intelligence

Why KV Caching Is Critical for Efficient LLM Inference

The article breaks down the principles of KV caching in large language models, explaining how Q/K/V behavior differs between training and inference, the role of prompts, cache size trade‑offs, and the complexities of concurrent inference, all backed by concrete examples and references.

Concurrent InferenceLLM InferenceMemory Optimization

0 likes · 7 min read

Why KV Caching Is Critical for Efficient LLM Inference

AI2ML AI to Machine Learning

Dec 19, 2025 · Artificial Intelligence

The 9 Key Ideas Behind FlashAttention

FlashAttention accelerates transformer inference by combining nine techniques—including loss‑less attention, GPU memory‑pyramid optimization, SRAM‑reusing tiling, safe softmax scaling, online buffering, tile‑size constraints, parallel multiplication, reduced KV slicing, and integrated backward‑pass caching—to achieve efficient, high‑throughput computation on modern GPUs.

Attention MechanismFlashAttentionGPU Optimization

0 likes · 8 min read

AI2ML AI to Machine Learning

Dec 16, 2025 · Industry Insights

Why Computer Science Majors Must Embrace a Massive Paradigm Shift

The article argues that traditional storage‑centric computer science curricula are becoming obsolete as AI‑driven, compute‑centric paradigms dominate hardware, data‑center operations, and software ecosystems, urging universities and students to rapidly adopt new teaching focus and skills.

AI hardwareCUDAassociative memory

0 likes · 10 min read

Why Computer Science Majors Must Embrace a Massive Paradigm Shift