Old Zhang's AI Learning
Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

141
Articles
0
Likes
3
Views
0
Comments
Recent Articles

Latest from Old Zhang's AI Learning

100 recent articles max
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 7, 2026 · Artificial Intelligence

vLLM 0.19.0: HuggingFace v5 Support, Multimodal Boosts, and CPU KV Cache Offload

The vLLM 0.19.0 release adds first‑day Gemma 4 support, merges zero‑bubble asynchronous scheduling with speculative decoding, matures Model Runner V2, introduces full‑CUDA‑graph acceleration for ViT, generalizes DBO, brings CPU KV cache offload, and expands hardware and Transformers compatibility, offering substantial performance and flexibility gains for production LLM inference.

CPU KV offloadGPUGemma 4
0 likes · 18 min read
vLLM 0.19.0: HuggingFace v5 Support, Multimodal Boosts, and CPU KV Cache Offload
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 5, 2026 · Artificial Intelligence

LLM‑Powered Knowledge Management: Insights from Karpathy, Lex Fridman, and kepano

The article analyzes three leading AI experts' approaches to personal knowledge management—Karpathy’s five‑module LLM pipeline, Lex Fridman’s interactive voice‑driven consumption, and kepano’s cautionary separation of AI‑generated content—while detailing the author’s own downstream content‑production workflow that turns raw material into articles, videos, and social posts.

AI agentsContent ProductionLLM
0 likes · 13 min read
LLM‑Powered Knowledge Management: Insights from Karpathy, Lex Fridman, and kepano
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 3, 2026 · Artificial Intelligence

Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent

The newly released Qwopus3.5‑v3 model combines higher‑quality reasoning chains, dedicated tool‑calling reinforcement learning, and an act‑then‑refine paradigm, delivering a 5‑point HumanEval boost, a 1.43‑point MMLU‑Pro gain, 31.7% faster inference and 24% lower token cost, while remaining runnable on a 3090 or a 16 GB MacBook, with easy deployment via GGUF, LM Studio, Ollama or llama.cpp.

Claude OpusDistillationHumanEval
0 likes · 12 min read
Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 1, 2026 · Artificial Intelligence

Running Large Models Locally on Mac: The Most Powerful Current Solution

This article reviews the JANG quantization format, the vMLX inference engine with a five‑layer cache stack, and the MLX Studio GUI, showing how their combination enables 397B‑parameter models to fit on 128 GB Apple Silicon Macs, achieve up to 224× faster first‑token latency for 100K context, and provide a full‑featured local AI experience.

Apple SiliconJANGMLX Studio
0 likes · 8 min read
Running Large Models Locally on Mac: The Most Powerful Current Solution
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 1, 2026 · Artificial Intelligence

LFClaw: Windows‑Only AI Agent with One‑Click Install and Local Model Privacy

The article reviews LFClaw, a Windows‑only AI agent client that offers a one‑click installation, automatic configuration, and local model deployment with hardware‑aware recommendations, while showcasing its file‑management, automation, scheduling, and AI‑driven productivity features through step‑by‑step screenshots.

AI AgentAutomationLFClaw
0 likes · 6 min read
LFClaw: Windows‑Only AI Agent with One‑Click Install and Local Model Privacy
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 31, 2026 · Artificial Intelligence

Turning a Bluetooth Speaker into a Smart Assistant with Qwen 3.5‑Omni

The author demonstrates a proof‑of‑concept that combines Qwen 3.5‑Omni's real‑time internet search and audio output with a locally hosted voice‑wake‑up model to transform a Bluetooth speaker into an always‑on smart assistant, while noting latency challenges and the potential of a sub‑10B open‑source alternative.

AI integrationBluetoothLarge Language Model
0 likes · 2 min read
Turning a Bluetooth Speaker into a Smart Assistant with Qwen 3.5‑Omni
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 30, 2026 · Operations

WeCom CLI Launches with 12 Built‑In AI Skills for Direct Enterprise Chat Automation

The newly released wecom‑cli, an open‑source Rust‑based command‑line tool from the official WeCom team, provides twelve AI‑agent skills that let tools like Claude Code or Cursor manage contacts, todos, meetings, messages, schedules, documents, and smart sheets directly from the terminal, streamlining office automation and improving credential security.

AI AgentAutomationCLI
0 likes · 12 min read
WeCom CLI Launches with 12 Built‑In AI Skills for Direct Enterprise Chat Automation
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 29, 2026 · Artificial Intelligence

Fully Automated Code and Paper Generation: Claude, Codex, and Autoresearch Variants

The article examines Karpathy's Autoresearch project and its community forks—Codex Autoresearch, Claude Autoresearch, and AutoResearchClaw—detailing their design, experiment loops, core rules, installation steps, and a comparative analysis of capabilities, targets, and limitations for autonomous AI-driven research and development.

AI agentsClaudeCodex
0 likes · 18 min read
Fully Automated Code and Paper Generation: Claude, Codex, and Autoresearch Variants