Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

141

Articles

Likes

Views

Comments

Latest from Old Zhang's AI Learning

100 recent articles max

Old Zhang's AI Learning

Apr 22, 2026 · Artificial Intelligence

Testing NVIDIA‑Accelerated Qwen3.6‑35B on Dual RTX 4090: Real‑World Performance

This article evaluates the Red Hat‑produced NVFP4‑quantized Qwen3.6‑35B model deployed with vLLM inside Docker on a dual‑RTX 4090 server, presenting accuracy gains, memory usage, initialization times, GPU compatibility notes, and practical deployment recommendations.

DockerNVFP4Quantization

0 likes · 8 min read

Testing NVIDIA‑Accelerated Qwen3.6‑35B on Dual RTX 4090: Real‑World Performance

Old Zhang's AI Learning

Apr 21, 2026 · Artificial Intelligence

Prefill-as-a-Service Boosts LLM Inference Throughput by 54%

A joint Moonshot AI and Tsinghua study shows that the Prefill-as-a-Service (PrfaaS) architecture, enabled by hybrid‑attention models that shrink KVCache size, can offload long Prefill work to a remote cluster and, with dual‑timescale scheduling, achieve a 54% throughput gain over homogeneous PD deployment and 32% over naive heterogeneous setups.

Distributed inferenceHybrid attentionKVCache optimization

0 likes · 12 min read

Prefill-as-a-Service Boosts LLM Inference Throughput by 54%

Old Zhang's AI Learning

Apr 21, 2026 · Artificial Intelligence

Is DeepSeek V4 Really Launching Next Week? Inside Its Core Architecture

Analyzing the credibility of Yifan Zhang’s brief “V4, next week” tweet, the article examines five supporting signals, details three newly revealed architecture components—Sparse MQA, Fused MoE Mega Kernel, and Manifold‑Constrained Hyper‑Connections—and summarizes V4’s rumored specifications, pricing, and strategic implications.

AI ArchitectureDeepSeekFused MoE

0 likes · 7 min read

Is DeepSeek V4 Really Launching Next Week? Inside Its Core Architecture

Old Zhang's AI Learning

Apr 21, 2026 · Artificial Intelligence

GitHub Copilot Pro+ Changes Reveal Aggressive Pricing Tactics

The article analyzes GitHub's recent Copilot Pro+ policy shift—pausing new registrations, tightening usage caps, and dropping Opus 4.6 for a less capable 4.7 model—highlighting how timing, reduced model quality, and steep consumption multipliers sparked user outrage.

AI Coding AssistantClaude OpusGitHub Copilot

0 likes · 5 min read

GitHub Copilot Pro+ Changes Reveal Aggressive Pricing Tactics

Old Zhang's AI Learning

Apr 20, 2026 · Artificial Intelligence

Three New Ways Anthropic Leverages Claude Opus 4.7: Code Guide, Desktop Buddy, and Design Tool

Anthropic's Opus 4.7 upgrade brings a best‑practice Claude Code guide with a new default xhigh effort level and adaptive thinking, an open‑source desktop pet for BLE‑enabled hardware interaction, and Claude Design—a React‑powered AI design suite that streamlines UI prototyping, wireframing, and marketing asset creation.

AI designClaudeClaude Code

0 likes · 13 min read

Three New Ways Anthropic Leverages Claude Opus 4.7: Code Guide, Desktop Buddy, and Design Tool

Old Zhang's AI Learning

Apr 20, 2026 · Artificial Intelligence

Kimi K2.6: The Most Powerful Open-Source Agent Model – Architecture, Benchmarks, and Deployment Guide

Kimi K2.6, an open-source 1-trillion-parameter MoE model, expands Agent capabilities with 256K context, multimodal inputs, and the ability to coordinate 300 sub-Agents over 4,000 steps, achieving top scores on benchmarks like Terminal-Bench 2.0, SWE-Bench Pro, and BrowseComp, while offering flexible deployment via vLLM, SGLang, and KTransformers.

Agent ModelKTransformersKimi K2.6

0 likes · 11 min read

Kimi K2.6: The Most Powerful Open-Source Agent Model – Architecture, Benchmarks, and Deployment Guide

Old Zhang's AI Learning

Apr 20, 2026 · Artificial Intelligence

Generate Professional Architecture Diagrams with One Sentence Using Claude Skill

The open‑source Architecture Diagram Generator, a Claude Skill from Cocoon AI, lets you describe your system in natural language—English or Chinese—and instantly produces a dark‑theme, self‑contained HTML diagram, with no design skills, JavaScript, or Mermaid required.

AIArchitecture DiagramClaude

0 likes · 6 min read

Generate Professional Architecture Diagrams with One Sentence Using Claude Skill

Old Zhang's AI Learning

Apr 20, 2026 · Artificial Intelligence

Qwen3.6-35B Quantized Model on vLLM: Local Deployment and Performance Benchmark

The article details how to deploy the 4‑bit quantized Qwen3.6-35B model with vLLM 0.17 (and 0.19.1 patch) on a Docker container, compares its memory usage and token‑generation speed to Qwen3.5‑35B, and shares practical scripts and observed performance of roughly 150 tokens per second.

DockerLLM deploymentPerformance Benchmark

0 likes · 5 min read

Qwen3.6-35B Quantized Model on vLLM: Local Deployment and Performance Benchmark

Old Zhang's AI Learning

Apr 19, 2026 · Artificial Intelligence

Qwen3.6-35B: 4‑bit Quantization, DFlash Speedup, Claude Opus Distillation

The article reviews three optimization paths for the Qwen3.6‑35B model—four‑bit AWQ quantization variants, the DFlash speculative decoding accelerator, and a Claude Opus‑based distillation—detailing their implementation steps, benchmark results, and guidance on selecting the best version for different hardware and performance needs.

AIDFlashDistillation

0 likes · 11 min read

Qwen3.6-35B: 4‑bit Quantization, DFlash Speedup, Claude Opus Distillation

Old Zhang's AI Learning

Apr 19, 2026 · Artificial Intelligence

From Zero to Deployment: A Complete Qwen3.5 Fine‑Tuning Guide

This guide shows how to fine‑tune Qwen3.5 models—from 0.8B to 122B—using Unsloth Studio or pure code, covering text SFT, vision fine‑tuning, MoE models, reinforcement‑learning (GRPO), extensive GGUF quantization benchmarks, hardware requirements, export formats, and deployment tips.

Fine-tuningLLMQwen3.5

0 likes · 12 min read

From Zero to Deployment: A Complete Qwen3.5 Fine‑Tuning Guide