Tagged articles
7 articles
Page 1 of 1
Machine Heart
Machine Heart
May 13, 2026 · Artificial Intelligence

Super‑Charging MiniCPM‑V 4.6 on One RTX 4090: 1B‑Parameter Multimodal Model Sets New Efficiency Bar

MiniCPM‑V 4.6, a 1.3 B‑parameter multimodal LLM, outperforms larger rivals such as Qwen3.5‑0.8B and Gemma 4 on both accuracy and speed, thanks to early ViT token compression and 4×/16× visual token reduction, delivering sub‑100 ms latency and over 2.6 k token/s throughput on a single RTX 4090 while also running offline on mobile devices.

MiniCPM-VRTX 4090Token Compression
0 likes · 16 min read
Super‑Charging MiniCPM‑V 4.6 on One RTX 4090: 1B‑Parameter Multimodal Model Sets New Efficiency Bar
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 18, 2026 · Artificial Intelligence

Running Claude‑Opus‑4.6‑Distilled Qwen3.5 27B on a Single RTX 4090 with llama.cpp: 46 tokens/s Performance

The article details a hands‑on test of the Claude‑Opus‑4.6‑distilled Qwen3.5 27B model running on a single RTX 4090 via llama.cpp, showing a steady 46 tokens per second generation speed, a 64K context window, and a step‑by‑step Docker‑based setup while comparing it to GLM‑4.7‑Flash‑AWQ‑4bit and discussing llama.cpp’s limitations for multi‑GPU inference.

Claude OpusDockerLLM inference
0 likes · 5 min read
Running Claude‑Opus‑4.6‑Distilled Qwen3.5 27B on a Single RTX 4090 with llama.cpp: 46 tokens/s Performance
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 2, 2026 · Artificial Intelligence

Why the Qwen3.5 Series Makes Qwen3.5-27B the No‑Brainer Choice

The author reviews the Qwen3.5 model family, showing that the 27‑billion‑parameter dense Qwen3.5-27B offers the best balance of size, stability, low‑cost local deployment, and comprehensive capabilities, making it the default pick for most users.

AI benchmarkingRTX 4090large language model
0 likes · 6 min read
Why the Qwen3.5 Series Makes Qwen3.5-27B the No‑Brainer Choice
Design Hub
Design Hub
Dec 27, 2025 · Artificial Intelligence

Speed vs. Quality: Z-Image + Nunchaku Boosts Portrait Generation by 300%

Testing shows that adding the open‑source Nunchaku accelerator to the Z‑Image portrait model triples generation speed on an RTX 4090, but the faster output exhibits noticeable drops in facial detail and overall aesthetic, prompting a detailed walkthrough of installation, model download, and workflow integration.

AI image generationComfyUINunchaku
0 likes · 6 min read
Speed vs. Quality: Z-Image + Nunchaku Boosts Portrait Generation by 300%