Old Zhang's AI Learning
Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

141
Articles
0
Likes
3
Views
0
Comments
Recent Articles

Latest from Old Zhang's AI Learning

100 recent articles max
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 21, 2026 · Artificial Intelligence

Prefill-as-a-Service Boosts LLM Inference Throughput by 54%

A joint Moonshot AI and Tsinghua study shows that the Prefill-as-a-Service (PrfaaS) architecture, enabled by hybrid‑attention models that shrink KVCache size, can offload long Prefill work to a remote cluster and, with dual‑timescale scheduling, achieve a 54% throughput gain over homogeneous PD deployment and 32% over naive heterogeneous setups.

Distributed inferenceHybrid attentionKVCache optimization
0 likes · 12 min read
Prefill-as-a-Service Boosts LLM Inference Throughput by 54%
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 21, 2026 · Artificial Intelligence

Is DeepSeek V4 Really Launching Next Week? Inside Its Core Architecture

Analyzing the credibility of Yifan Zhang’s brief “V4, next week” tweet, the article examines five supporting signals, details three newly revealed architecture components—Sparse MQA, Fused MoE Mega Kernel, and Manifold‑Constrained Hyper‑Connections—and summarizes V4’s rumored specifications, pricing, and strategic implications.

AI ArchitectureDeepSeekFused MoE
0 likes · 7 min read
Is DeepSeek V4 Really Launching Next Week? Inside Its Core Architecture
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 21, 2026 · Artificial Intelligence

GitHub Copilot Pro+ Changes Reveal Aggressive Pricing Tactics

The article analyzes GitHub's recent Copilot Pro+ policy shift—pausing new registrations, tightening usage caps, and dropping Opus 4.6 for a less capable 4.7 model—highlighting how timing, reduced model quality, and steep consumption multipliers sparked user outrage.

AI Coding AssistantClaude OpusGitHub Copilot
0 likes · 5 min read
GitHub Copilot Pro+ Changes Reveal Aggressive Pricing Tactics
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 20, 2026 · Artificial Intelligence

Three New Ways Anthropic Leverages Claude Opus 4.7: Code Guide, Desktop Buddy, and Design Tool

Anthropic's Opus 4.7 upgrade brings a best‑practice Claude Code guide with a new default xhigh effort level and adaptive thinking, an open‑source desktop pet for BLE‑enabled hardware interaction, and Claude Design—a React‑powered AI design suite that streamlines UI prototyping, wireframing, and marketing asset creation.

AI designClaudeClaude Code
0 likes · 13 min read
Three New Ways Anthropic Leverages Claude Opus 4.7: Code Guide, Desktop Buddy, and Design Tool
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 20, 2026 · Artificial Intelligence

Kimi K2.6: The Most Powerful Open-Source Agent Model – Architecture, Benchmarks, and Deployment Guide

Kimi K2.6, an open-source 1-trillion-parameter MoE model, expands Agent capabilities with 256K context, multimodal inputs, and the ability to coordinate 300 sub-Agents over 4,000 steps, achieving top scores on benchmarks like Terminal-Bench 2.0, SWE-Bench Pro, and BrowseComp, while offering flexible deployment via vLLM, SGLang, and KTransformers.

Agent ModelKTransformersKimi K2.6
0 likes · 11 min read
Kimi K2.6: The Most Powerful Open-Source Agent Model – Architecture, Benchmarks, and Deployment Guide
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 19, 2026 · Artificial Intelligence

Qwen3.6-35B: 4‑bit Quantization, DFlash Speedup, Claude Opus Distillation

The article reviews three optimization paths for the Qwen3.6‑35B model—four‑bit AWQ quantization variants, the DFlash speculative decoding accelerator, and a Claude Opus‑based distillation—detailing their implementation steps, benchmark results, and guidance on selecting the best version for different hardware and performance needs.

AIDFlashDistillation
0 likes · 11 min read
Qwen3.6-35B: 4‑bit Quantization, DFlash Speedup, Claude Opus Distillation
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 19, 2026 · Artificial Intelligence

From Zero to Deployment: A Complete Qwen3.5 Fine‑Tuning Guide

This guide shows how to fine‑tune Qwen3.5 models—from 0.8B to 122B—using Unsloth Studio or pure code, covering text SFT, vision fine‑tuning, MoE models, reinforcement‑learning (GRPO), extensive GGUF quantization benchmarks, hardware requirements, export formats, and deployment tips.

Fine-tuningLLMQwen3.5
0 likes · 12 min read
From Zero to Deployment: A Complete Qwen3.5 Fine‑Tuning Guide