Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

141

Articles

Likes

Views

Comments

Latest from Old Zhang's AI Learning

100 recent articles max

Old Zhang's AI Learning

Apr 7, 2026 · Artificial Intelligence

vLLM 0.19.0: HuggingFace v5 Support, Multimodal Boosts, and CPU KV Cache Offload

The vLLM 0.19.0 release adds first‑day Gemma 4 support, merges zero‑bubble asynchronous scheduling with speculative decoding, matures Model Runner V2, introduces full‑CUDA‑graph acceleration for ViT, generalizes DBO, brings CPU KV cache offload, and expands hardware and Transformers compatibility, offering substantial performance and flexibility gains for production LLM inference.

CPU KV offloadGPUGemma 4

0 likes · 18 min read

vLLM 0.19.0: HuggingFace v5 Support, Multimodal Boosts, and CPU KV Cache Offload

Old Zhang's AI Learning

Apr 6, 2026 · Artificial Intelligence

Ollama 0.19 Boosts Apple Silicon LLM Inference with MLX Engine and NVFP4

Ollama 0.19 replaces its inference backend with Apple’s MLX framework and adopts NVIDIA’s NVFP4 4‑bit quantization, delivering up to a 93% speed increase on M5 chips while keeping accuracy comparable to cloud‑based deployments, and adds three cache upgrades for smoother agent interactions.

Apple SiliconLLM inferenceMLX

0 likes · 10 min read

Ollama 0.19 Boosts Apple Silicon LLM Inference with MLX Engine and NVFP4

Old Zhang's AI Learning

Apr 5, 2026 · Artificial Intelligence

LLM‑Powered Knowledge Management: Insights from Karpathy, Lex Fridman, and kepano

The article analyzes three leading AI experts' approaches to personal knowledge management—Karpathy’s five‑module LLM pipeline, Lex Fridman’s interactive voice‑driven consumption, and kepano’s cautionary separation of AI‑generated content—while detailing the author’s own downstream content‑production workflow that turns raw material into articles, videos, and social posts.

AI agentsContent ProductionLLM

0 likes · 13 min read

LLM‑Powered Knowledge Management: Insights from Karpathy, Lex Fridman, and kepano

Old Zhang's AI Learning

Apr 4, 2026 · Artificial Intelligence

Deploy Gemma 4 Locally: Ollama, llama.cpp, MLX, vLLM + TurboQuant Optimization

The article reviews the four Gemma 4 model variants, analyzes their architecture and benchmark results versus Qwen3.5, and provides step‑by‑step instructions for local deployment using Ollama, llama.cpp, MLX and vLLM, while highlighting TurboQuant memory and weight compression techniques.

AI benchmarkingGemma 4Local Deployment

0 likes · 15 min read

Deploy Gemma 4 Locally: Ollama, llama.cpp, MLX, vLLM + TurboQuant Optimization

Old Zhang's AI Learning

Apr 3, 2026 · Artificial Intelligence

Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent

The newly released Qwopus3.5‑v3 model combines higher‑quality reasoning chains, dedicated tool‑calling reinforcement learning, and an act‑then‑refine paradigm, delivering a 5‑point HumanEval boost, a 1.43‑point MMLU‑Pro gain, 31.7% faster inference and 24% lower token cost, while remaining runnable on a 3090 or a 16 GB MacBook, with easy deployment via GGUF, LM Studio, Ollama or llama.cpp.

Claude OpusDistillationHumanEval

0 likes · 12 min read

Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent

Old Zhang's AI Learning

Apr 1, 2026 · Artificial Intelligence

Running Large Models Locally on Mac: The Most Powerful Current Solution

This article reviews the JANG quantization format, the vMLX inference engine with a five‑layer cache stack, and the MLX Studio GUI, showing how their combination enables 397B‑parameter models to fit on 128 GB Apple Silicon Macs, achieve up to 224× faster first‑token latency for 100K context, and provide a full‑featured local AI experience.

Apple SiliconJANGMLX Studio

0 likes · 8 min read

Running Large Models Locally on Mac: The Most Powerful Current Solution

Old Zhang's AI Learning

Apr 1, 2026 · Artificial Intelligence

LFClaw: Windows‑Only AI Agent with One‑Click Install and Local Model Privacy

The article reviews LFClaw, a Windows‑only AI agent client that offers a one‑click installation, automatic configuration, and local model deployment with hardware‑aware recommendations, while showcasing its file‑management, automation, scheduling, and AI‑driven productivity features through step‑by‑step screenshots.

AI AgentAutomationLFClaw

0 likes · 6 min read

LFClaw: Windows‑Only AI Agent with One‑Click Install and Local Model Privacy

Old Zhang's AI Learning

Mar 31, 2026 · Artificial Intelligence

Turning a Bluetooth Speaker into a Smart Assistant with Qwen 3.5‑Omni

The author demonstrates a proof‑of‑concept that combines Qwen 3.5‑Omni's real‑time internet search and audio output with a locally hosted voice‑wake‑up model to transform a Bluetooth speaker into an always‑on smart assistant, while noting latency challenges and the potential of a sub‑10B open‑source alternative.

AI integrationBluetoothLarge Language Model

0 likes · 2 min read

Turning a Bluetooth Speaker into a Smart Assistant with Qwen 3.5‑Omni

Old Zhang's AI Learning

Mar 30, 2026 · Operations

WeCom CLI Launches with 12 Built‑In AI Skills for Direct Enterprise Chat Automation

The newly released wecom‑cli, an open‑source Rust‑based command‑line tool from the official WeCom team, provides twelve AI‑agent skills that let tools like Claude Code or Cursor manage contacts, todos, meetings, messages, schedules, documents, and smart sheets directly from the terminal, streamlining office automation and improving credential security.

AI AgentAutomationCLI

0 likes · 12 min read

WeCom CLI Launches with 12 Built‑In AI Skills for Direct Enterprise Chat Automation

Old Zhang's AI Learning

Mar 29, 2026 · Artificial Intelligence

Fully Automated Code and Paper Generation: Claude, Codex, and Autoresearch Variants

The article examines Karpathy's Autoresearch project and its community forks—Codex Autoresearch, Claude Autoresearch, and AutoResearchClaw—detailing their design, experiment loops, core rules, installation steps, and a comparative analysis of capabilities, targets, and limitations for autonomous AI-driven research and development.

AI agentsClaudeCodex

0 likes · 18 min read

Fully Automated Code and Paper Generation: Claude, Codex, and Autoresearch Variants