Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

141

Articles

Likes

Views

Comments

Latest from Old Zhang's AI Learning

100 recent articles max

Old Zhang's AI Learning

Mar 28, 2026 · Artificial Intelligence

vLLM, llama.cpp, and MLX Embrace Google’s TurboQuant: 8× Memory Savings for Local LLMs

The article reviews how the leading LLM inference frameworks—oMLX, mlx‑vlm, llama.cpp, and vLLM—are integrating Google’s TurboQuant compression, showing up to 79% KV‑cache memory reduction, near‑full‑precision decoding speed, and detailed integration steps for each project.

KV cacheLLM inferenceTurboQuant

0 likes · 8 min read

vLLM, llama.cpp, and MLX Embrace Google’s TurboQuant: 8× Memory Savings for Local LLMs

Old Zhang's AI Learning

Mar 28, 2026 · Artificial Intelligence

Qwen3.5-27B Outperforms the 397B Model in Tool Calling – Q6 Quantization Is Optimal

Using the open‑source ToolCall‑15 benchmark, the author shows that the 27‑billion‑parameter Qwen3.5 model consistently scores full marks while the 397‑billion‑parameter version fails on several tasks, and that the Q6 quantized variant offers the best trade‑off between size and tool‑calling accuracy.

AILLM benchmarkQwen3.5

0 likes · 7 min read

Qwen3.5-27B Outperforms the 397B Model in Tool Calling – Q6 Quantization Is Optimal

Old Zhang's AI Learning

Mar 27, 2026 · Artificial Intelligence

vLLM’s Four Major 2026 Updates: Semantic Router Athena, Nemotron 3 Super, P‑EAGLE, and Model Runner V2

The March 2026 vLLM release bundle introduces four substantial upgrades—Semantic Router v0.2 Athena, NVIDIA Nemotron 3 Super, the parallel speculative decoding P‑EAGLE, and a completely re‑architected Model Runner V2—each backed by concrete benchmarks, architectural diagrams, and code examples that demonstrate how the engine evolves from a pure inference engine to a full‑stack AI serving platform.

GPU accelerationModel Runner V2Nemotron-3-Super

0 likes · 17 min read

vLLM’s Four Major 2026 Updates: Semantic Router Athena, Nemotron 3 Super, P‑EAGLE, and Model Runner V2

Old Zhang's AI Learning

Mar 27, 2026 · Artificial Intelligence

Alibaba’s Logics-Parsing-v2 Sets New OCR Benchmark Records

Alibaba’s open‑source Logics-Parsing‑v2 achieves top scores on both LogicsDocBench (82.16) and OmniDocBench‑v1.5 (93.23), outperforms leading closed models, and introduces Parsing‑2.0 capabilities that handle flowcharts, music scores, code blocks, and chemical formulas with structured HTML output.

ABC notationLogics-Parsing-v2Mermaid

0 likes · 9 min read

Alibaba’s Logics-Parsing-v2 Sets New OCR Benchmark Records

Old Zhang's AI Learning

Mar 26, 2026 · Artificial Intelligence

Google’s TurboQuant Cuts KV‑Cache Memory 8× and Boosts LLM Inference Speed

Google’s TurboQuant reduces KV‑Cache memory by up to 4.6×, speeds 3‑bit attention computation up to 8× on H100, and delivers near‑zero accuracy loss across long‑context benchmarks, with open‑source implementations for Metal, vLLM and llama.cpp.

GoogleKV cacheLLM quantization

0 likes · 10 min read

Google’s TurboQuant Cuts KV‑Cache Memory 8× and Boosts LLM Inference Speed

Old Zhang's AI Learning

Mar 25, 2026 · Artificial Intelligence

Claude‑Opus‑4.6 Distilled Qwen3.5 v2: Faster Reasoning with Same Code Accuracy

The new Claude‑Opus‑4.6 distilled Qwen3.5‑v2 keeps code‑generation accuracy while cutting reasoning length by 24% and boosting per‑token correctness by 31.6%, offering a noticeable speed and cost advantage for local LLM deployment despite a 7.2% drop on MMLU‑Pro.

Claude OpusDistillationQwen3.5

0 likes · 7 min read

Claude‑Opus‑4.6 Distilled Qwen3.5 v2: Faster Reasoning with Same Code Accuracy

Old Zhang's AI Learning

Mar 25, 2026 · Information Security

Litellm Supply‑Chain Poisoning: Why You Must Stop Updating the Library Immediately

A malicious PyPI release of litellm (version 1.82.8) injects a .pth file that auto‑executes, harvests SSH keys, cloud credentials, and other secrets, encrypts them, exfiltrates to a fake domain, and can spread through Kubernetes, prompting urgent removal and credential rotation.

KubernetesLiteLLMPython

0 likes · 7 min read

Litellm Supply‑Chain Poisoning: Why You Must Stop Updating the Library Immediately

Old Zhang's AI Learning

Mar 23, 2026 · Artificial Intelligence

Why the Superpowers Skill Framework Is the Most Underrated AI Coding Tool

Superpowers, a GitHub project with over 100 k stars, provides a composable skill framework that transforms AI coding assistants from impulsive code generators into disciplined engineers by enforcing requirement discussions, task decomposition, sub‑agent driven development, and strict TDD, while supporting multiple platforms and offering extensible built‑in skills.

AI codingClaudeSKILL framework

0 likes · 9 min read

Why the Superpowers Skill Framework Is the Most Underrated AI Coding Tool

Old Zhang's AI Learning

Mar 23, 2026 · Artificial Intelligence

How Large‑Model Research Is Shifting: Insights from 120 Top Papers

The article reveals that large‑model research has moved from sheer scale to deeper capabilities and multimodal integration, highlighting ten hot directions and summarizing 120 recent top‑conference papers—including Spec‑VLA, Mobile‑O, OccTENS, and latent‑CoT studies—while offering free access to the full collection.

3D occupancy modelingMultimodal AISpeculative Decoding

0 likes · 7 min read

How Large‑Model Research Is Shifting: Insights from 120 Top Papers

Old Zhang's AI Learning

Mar 22, 2026 · Artificial Intelligence

Hands‑On Review: Unsloth Studio’s One‑Stop Local LLM Console (Windows‑Ready)

The author tests Unsloth Studio, a local web UI that unifies model download, execution, dataset handling, training, fine‑tuning and export, supporting GGUF and safetensors formats across Windows, macOS and Linux, and highlights its integrated tool‑calling, data‑recipe workflow, observability features, installation quirks, and target user scenarios.

GGUFTool CallingUnsloth Studio

0 likes · 9 min read

Hands‑On Review: Unsloth Studio’s One‑Stop Local LLM Console (Windows‑Ready)