Old Zhang's AI Learning
Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

227
Articles
0
Likes
687
Views
0
Comments
Recent Articles

Latest from Old Zhang's AI Learning

100 recent articles max
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 11, 2026 · Artificial Intelligence

Google’s 26B DiffusionGemma Model Delivers 1000+ Tokens/s – Runs on a 4090

DiffusionGemma, Google DeepMind’s 26B MoE model that generates 256‑token blocks via diffusion, achieves over 1000 tokens per second on H100/H200 GPUs, offers FP8 and NVFP4 quantized versions with near‑lossless accuracy, and can be deployed locally with vLLM Docker images, though it incurs higher first‑token latency and limited concurrency.

26B modelDiffusionGemmaFP8 quantization
0 likes · 10 min read
Google’s 26B DiffusionGemma Model Delivers 1000+ Tokens/s – Runs on a 4090
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 11, 2026 · Artificial Intelligence

Distilling Claude Opus: Qwen 9B Coding Model Runs on Consumer GPUs – Real‑World Benchmarks

The Qwopus3.5‑9B‑Coder model, fine‑tuned for agentic coding, tool calling and logical reasoning, offers three formats (Safetensors, GGUF, GGUF+MTP), runs on a 16 GB Mac mini via LM‑Studio, achieves up to 35% throughput gain with MTP, scores 85 on HermesAgent‑20, 100 on ToolCall‑15, and 53.89% on SWE‑bench, matching Claude Opus 4.6 in a 31‑tool adversarial test while highlighting its training tricks and current limitations.

Agentic CodingLLM BenchmarkQwen
0 likes · 11 min read
Distilling Claude Opus: Qwen 9B Coding Model Runs on Consumer GPUs – Real‑World Benchmarks
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 10, 2026 · Artificial Intelligence

Testing Anthropic’s Claude Fable 5: Two Queries Cost 90 CNY

The author evaluates Anthropic’s newly released Claude Fable 5 by running a fireworks‑generation prompt and a knowledge‑collection task, compares it with Qwen3.7‑Max, details token limits, safety switches, and total expenses of roughly $10 (≈90 CNY), concluding that price outweighs its raw capability.

AnthropicClaude Fable 5LLM evaluation
0 likes · 4 min read
Testing Anthropic’s Claude Fable 5: Two Queries Cost 90 CNY
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 10, 2026 · Artificial Intelligence

Anthropic’s Claude Fable 5 and Mythos 5: Twin Models with a Shockingly Low Price and New Safety Switches

Anthropic released Claude Fable 5 and Mythos 5 as twin large‑language‑model variants that share the same base but differ only in safety‑classifier settings, offering 1 M‑token context, 128 k‑token output, a halved price, and a three‑layer real‑time safety system that routes risky requests to Claude Opus 4.8.

AI safetyAnthropicClaude Fable 5
0 likes · 12 min read
Anthropic’s Claude Fable 5 and Mythos 5: Twin Models with a Shockingly Low Price and New Safety Switches
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 9, 2026 · Product Management

Open-Source AI Skills Marketplace for PMs: 68 Skills, 9 Plugins Cover the Full Lifecycle

PM Skills is an open‑source AI‑driven marketplace that encodes classic product‑management methodologies into 68 reusable Skills, bundles them into 9 Plugins, and adds new commands like /red-team-prd and an AI Shipping Kit to automate the entire product‑manager workflow from discovery to release, with concrete examples, installation guides, and a balanced look at strengths and limitations.

AIClaudeOpen-source
0 likes · 14 min read
Open-Source AI Skills Marketplace for PMs: 68 Skills, 9 Plugins Cover the Full Lifecycle
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 9, 2026 · Artificial Intelligence

Open-Source ASR That Runs Faster on CPU Than Whisper on GPU

FunASR is an industrial‑grade, open‑source speech‑recognition toolkit that combines VAD, transcription, punctuation, speaker diarization and emotion detection in one call, achieving up to 170× real‑time on GPU and 17× on CPU, outperforming Whisper while supporting 50+ languages and offering OpenAI‑compatible APIs.

ASRCPU performanceFunASR
0 likes · 13 min read
Open-Source ASR That Runs Faster on CPU Than Whisper on GPU
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 8, 2026 · Artificial Intelligence

Open‑Source AI Digital Employees for SMBs: One‑Line Install for Claude Code/Codex

The article introduces a set of open‑source AI Agent Skills that turn Claude Code, Qoder, Cursor and similar tools into digital employees for small‑to‑medium businesses, showing how to install them with a single command, configure them for tasks like email handling, PDF parsing, markdown‑to‑Excel conversion, and Excel editing, and combine them into fully automated workflows.

AI AgentClaude CodeOpen-source
0 likes · 17 min read
Open‑Source AI Digital Employees for SMBs: One‑Line Install for Claude Code/Codex
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 7, 2026 · Artificial Intelligence

Hands‑On LLM Local Deployment: vLLM Inference Optimizations Explained

The article explains why LLM inference is memory‑bound, introduces vLLM’s three core optimizations—Continuous Batching, PagedAttention, and Prefix Caching—shows how to launch a vLLM server, run Python code to benchmark performance, and examines KV‑Cache memory usage with concrete numbers.

Continuous BatchingKV cacheLLM inference
0 likes · 11 min read
Hands‑On LLM Local Deployment: vLLM Inference Optimizations Explained
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 6, 2026 · Artificial Intelligence

How to Build a Personal Knowledge Base with My Custom web‑pack Skill

This article explains how to construct a personal knowledge base using the author’s open‑source web‑pack Skill, which automates raw material collection, image localization, link expansion, and structured output, addressing the limitations of Obsidian’s Web Clipper and aligning with Karpathy’s LLM Wiki three‑layer architecture.

AI AgentsKnowledge ManagementLLM
0 likes · 9 min read
How to Build a Personal Knowledge Base with My Custom web‑pack Skill