Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

141

Articles

Likes

Views

Comments

Latest from Old Zhang's AI Learning

100 recent articles max

Old Zhang's AI Learning

Apr 14, 2026 · Artificial Intelligence

Qwen3.5-27B-DFlash Delivers Up to 5× Faster Inference Without Quality Loss

The DFlash approach replaces speculative decoding’s autoregressive drafter with a block diffusion model and injects target‑model hidden features into every KV‑cache layer, achieving up to 5× speed‑up for Qwen3.5‑27B on single‑GPU and 1.5–1.9× on high‑concurrency workloads while preserving output quality.

DFlashInference AccelerationQwen3.5

0 likes · 12 min read

Qwen3.5-27B-DFlash Delivers Up to 5× Faster Inference Without Quality Loss

Old Zhang's AI Learning

Apr 13, 2026 · Artificial Intelligence

How Harness Engineering Makes or Breaks AI Agents – Lessons from Hsu’s 2026 Lecture

The article explains Harness Engineering—a set of tools that control an AI agent’s cognitive framework, capability boundaries, and behavior flow—showing how proper harnesses can turn modest models into high‑performing agents, while poor harnesses cause failures, with concrete examples, benchmarks, and research citations.

AI AgentContext EngineeringHarness Engineering

0 likes · 12 min read

How Harness Engineering Makes or Breaks AI Agents – Lessons from Hsu’s 2026 Lecture

Old Zhang's AI Learning

Apr 13, 2026 · Artificial Intelligence

Fine‑Tune Any Large Model on Apple Silicon with mlx‑tune

The article introduces mlx‑tune, a community project that wraps the MLX library with Unsloth's API to enable local fine‑tuning of large language, vision, TTS, STT, OCR, and embedding models on Apple Silicon Macs, outlines its workflow from prototype to cloud, provides installation steps, code examples, and discusses its capabilities and limitations.

Apple SiliconUnsloth APIlarge language models

0 likes · 9 min read

Fine‑Tune Any Large Model on Apple Silicon with mlx‑tune

Old Zhang's AI Learning

Apr 12, 2026 · Artificial Intelligence

How to Deploy MiniMax-M2.7 Quantized Models Locally on macOS and Linux

This guide explains the 22 GGUF quantized versions of MiniMax-M2.7 released by Unsloth, compares their accuracy and size, recommends the UD‑Q4_K_XL model for best quality‑to‑size trade‑off, and provides step‑by‑step instructions for local deployment via Unsloth Studio, llama.cpp, API server, or the MLX native solution, along with important pitfalls and performance‑tuning tips.

Dynamic 2.0Local DeploymentMLX

0 likes · 14 min read

How to Deploy MiniMax-M2.7 Quantized Models Locally on macOS and Linux

Old Zhang's AI Learning

Apr 12, 2026 · Artificial Intelligence

Deploy the Open‑Source MiniMax‑M2.7 Model Locally: Step‑by‑Step Guide

MiniMax‑M2.7, the newly open‑sourced 230‑billion‑parameter MoE model, offers self‑evolution, professional software engineering and agent capabilities, and can be deployed locally using Ollama, vLLM, SGLang or Docker with 4‑8 H200 GPUs, while the article details hardware needs, performance gains and tool‑calling/Thinking features.

GPULLMMiniMax M2.7

0 likes · 11 min read

Deploy the Open‑Source MiniMax‑M2.7 Model Locally: Step‑by‑Step Guide

Old Zhang's AI Learning

Apr 11, 2026 · Artificial Intelligence

Mastering SGLang: KV Cache and RadixAttention for Faster LLM Inference

This article reviews the DeepLearning.ai short course on SGLang, explains why large‑language‑model inference is slow, details how KV Cache reduces the computation from O(n²) to O(n), introduces RadixAttention for cross‑request caching, and presents code examples and benchmark results showing up to 10× speedup in real‑world RAG scenarios.

KV cacheLLM inferencePerformance optimization

0 likes · 13 min read

Mastering SGLang: KV Cache and RadixAttention for Faster LLM Inference

Old Zhang's AI Learning

Apr 10, 2026 · Artificial Intelligence

How a 9B‑parameter Qwen3.5 model achieves full‑auto data analysis on a consumer GPU

The open‑source CoPaw‑Flash‑9B‑DataAnalyst‑LoRA model, fine‑tuned via LoRA, can autonomously load, explore, statistically analyze, visualize, and generate structured reports for CSV/Excel/JSON datasets, achieving a 90% success rate with an average of 26 iteration rounds, and it runs on a single consumer‑grade GPU using vLLM and the Data Analyst framework.

AgentData AnalystGPU

0 likes · 10 min read

How a 9B‑parameter Qwen3.5 model achieves full‑auto data analysis on a consumer GPU

Old Zhang's AI Learning

Apr 9, 2026 · Artificial Intelligence

2026: The Real Turning Point for AI Coding Agents – Harness Explained

In 2026 the decisive factor for AI coding agents shifts from model size to the quality of their harness, as experiments show that redesigning the edit tool can boost success rates ten‑fold, while a growing open‑source harness ecosystem and Anthropic's managed agents illustrate the emerging competitive landscape.

AI agentsHarnessbenchmark

0 likes · 17 min read

2026: The Real Turning Point for AI Coding Agents – Harness Explained

Old Zhang's AI Learning

Apr 8, 2026 · Artificial Intelligence

GLM‑5.1 Outperforms Claude Opus in Benchmarks – The Open‑Source LLM’s Edge

GLM‑5.1, the new 744 B‑parameter open‑source LLM from Zhipu, tops SWE‑Bench Pro with a score of 58.4, outpacing Claude Opus, GPT‑5.4 and Gemini, excels at long‑duration autonomous tasks, yet shows gaps in single‑turn generation and pure mathematical reasoning.

Agent ProgrammingGLM-5.1benchmarking

0 likes · 22 min read

GLM‑5.1 Outperforms Claude Opus in Benchmarks – The Open‑Source LLM’s Edge

Old Zhang's AI Learning

Apr 7, 2026 · Industry Insights

Why OpenAI Has Lost Its Mission: A Deep Dive into Recent Decisions

The article analyzes OpenAI's recent strategic shifts, contrasting its declining product focus and safety commitments with Anthropic's focused growth, using revenue data, internal memos, and industry reports to argue that the company is now driven by deal‑making rather than its original AI mission.

AIAnthropicOpenAI

0 likes · 16 min read

Why OpenAI Has Lost Its Mission: A Deep Dive into Recent Decisions