Old Zhang's AI Learning
Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

229
Articles
0
Likes
711
Views
0
Comments
Recent Articles

Latest from Old Zhang's AI Learning

100 recent articles max
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 7, 2026 · Artificial Intelligence

Hands‑On LLM Local Deployment: vLLM Inference Optimizations Explained

The article explains why LLM inference is memory‑bound, introduces vLLM’s three core optimizations—Continuous Batching, PagedAttention, and Prefix Caching—shows how to launch a vLLM server, run Python code to benchmark performance, and examines KV‑Cache memory usage with concrete numbers.

Continuous BatchingKV cacheLLM inference
0 likes · 11 min read
Hands‑On LLM Local Deployment: vLLM Inference Optimizations Explained
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 6, 2026 · Artificial Intelligence

How to Build a Personal Knowledge Base with My Custom web‑pack Skill

This article explains how to construct a personal knowledge base using the author’s open‑source web‑pack Skill, which automates raw material collection, image localization, link expansion, and structured output, addressing the limitations of Obsidian’s Web Clipper and aligning with Karpathy’s LLM Wiki three‑layer architecture.

AI agentsAutomationKnowledge Management
0 likes · 9 min read
How to Build a Personal Knowledge Base with My Custom web‑pack Skill
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 5, 2026 · Frontend Development

Open-Source Browser‑Based Word Editor: Introducing docx‑editor

The article reviews the newly released docx‑editor, a client‑side WYSIWYG .docx editor built on ProseMirror with React and Vue adapters, detailing its architecture, installation, usage examples, real‑time collaboration via Yjs, AI Agent SDK integration, and practical pros and cons based on hands‑on testing.

AI AgentFrontendReact
0 likes · 15 min read
Open-Source Browser‑Based Word Editor: Introducing docx‑editor
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 4, 2026 · R&D Management

Why Claude Code Hires Only Dreamers and Deep System Experts

The article analyzes how Claude Code’s AI‑native engineering team re‑engineers its processes—shifting bottlenecks from coding to verification, adopting JIT planning, redefining code review roles, and hiring only creative dreamers and deep systems experts—to stay agile in the era where code is cheap.

AI NativeClaude CodeJIT planning
0 likes · 12 min read
Why Claude Code Hires Only Dreamers and Deep System Experts
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 2, 2026 · Artificial Intelligence

Turn Local LLMs into Actionable Agents – Unsloth Opens the MCP Path

Unsloth now lets locally‑run large language models act as real agents by exposing a Model Context Protocol (MCP) interface through a no‑code Studio UI or a llama.cpp + mcp‑cli command line, supporting tool calling, file access, web search, and multi‑model connections with detailed setup steps, hardware guidance, and security cautions.

AI agentsMCPModel Context Protocol
0 likes · 17 min read
Turn Local LLMs into Actionable Agents – Unsloth Opens the MCP Path
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 2, 2026 · Fundamentals

Lightning‑Fast Open‑Source Local PDF Parser: LiteParse Processes 400‑Page PDFs in 1 Second

LiteParse, an open‑source Rust‑based local PDF parser from the LlamaIndex team, extracts text from a 400‑page PDF in about one second, offers multi‑language bindings, flexible OCR, bounding‑box output, and Agent Skill integration, while its limitations include basic table handling and complex layout support.

Agent SkillLiteParseLocal processing
0 likes · 9 min read
Lightning‑Fast Open‑Source Local PDF Parser: LiteParse Processes 400‑Page PDFs in 1 Second
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 1, 2026 · Artificial Intelligence

NVIDIA Unveils Nemotron 3 Ultra: The Largest US Open‑Source LLM Boosting Agent Capabilities

NVIDIA released Nemotron 3 Ultra, a 550 B‑parameter open‑source LLM with 55 B active MoE parameters, hybrid Mamba‑Transformer architecture, 1 M token context, and three core innovations that deliver superior MMLU, code, math scores and up to 5× throughput versus rivals, though weights are not yet public.

Large Language ModelMambaMoE
0 likes · 8 min read
NVIDIA Unveils Nemotron 3 Ultra: The Largest US Open‑Source LLM Boosting Agent Capabilities
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 1, 2026 · Artificial Intelligence

Opus‑Distilled Qwen3.5‑Coder Scores 100/100 Tool Calls, 1.4‑2.2× Faster with MTP, 128K Context on Consumer GPU

The article introduces Qwopus3.5‑4B‑Coder‑MTP‑GGUF, a 4‑billion‑parameter agent model fine‑tuned for code debugging, tool calling, and structured reasoning, explains its novel Trace Inversion, high‑quality trajectory data, and Curriculum SFT training, details MTP acceleration, benchmark results, quantization options, and step‑by‑step local deployment instructions.

AgentGGUFMTP
0 likes · 10 min read
Opus‑Distilled Qwen3.5‑Coder Scores 100/100 Tool Calls, 1.4‑2.2× Faster with MTP, 128K Context on Consumer GPU
Old Zhang's AI Learning
Old Zhang's AI Learning
May 31, 2026 · Artificial Intelligence

Scaling AI Agents with Claude Code’s Dynamic Workflows: From Subagents to 1,000 Agents

Claude Code’s Dynamic Workflows move the AI programming assistant from a single‑round subagent model to a JavaScript‑driven orchestration that can run up to 1,000 agents in the background, offering non‑blocking execution, adversarial quality checks, and reusable scripts while highlighting token costs and practical limits.

AI agentsAutomationClaude Code
0 likes · 13 min read
Scaling AI Agents with Claude Code’s Dynamic Workflows: From Subagents to 1,000 Agents
Old Zhang's AI Learning
Old Zhang's AI Learning
May 31, 2026 · Artificial Intelligence

Qwen3.6-35B-A3B NVFP4: A Stable, Highly Compressed Quantized Model

NVIDIA's NVFP4 quantization reduces Qwen3.6-35B-A3B's memory footprint by threefold with almost no accuracy loss, offers plug‑and‑play deployment via vLLM, and outperforms other 4‑bit formats on Hopper/Blackwell GPUs, making it a practical choice for production AI workloads.

MoENVFP4Quantization
0 likes · 13 min read
Qwen3.6-35B-A3B NVFP4: A Stable, Highly Compressed Quantized Model