Tagged articles
4 articles
Page 1 of 1
Machine Heart
Machine Heart
May 7, 2026 · Artificial Intelligence

Nvidia Endorses TokenSpeed: A Light‑Speed Agent Inference Engine Built in Two Months

TokenSpeed, an open‑source LLM inference engine designed for agent workloads, delivers TensorRT‑LLM‑level performance and vLLM‑level ease of use, outperforms TensorRT‑LLM by up to 11% throughput and halves latency on speculative decoding, and has earned Nvidia’s public recommendation.

Agent workloadsLLM inferenceNVIDIA Blackwell
0 likes · 8 min read
Nvidia Endorses TokenSpeed: A Light‑Speed Agent Inference Engine Built in Two Months
DevOps Coach
DevOps Coach
Apr 23, 2026 · Artificial Intelligence

Can Gemma 4 on a MacBook Pro or NVIDIA Blackwell Replace Cloud LLMs? A Hands‑On Performance Study

The author benchmarks Gemma 4 locally on a 24 GB M4 Pro MacBook Pro (llama.cpp) and on a Dell GB10 with an NVIDIA Blackwell GPU (Ollama), comparing token speed, tool‑call reliability, and task completion against cloud GPT‑5.4, showing the Mac runs faster per token but the Blackwell system achieves higher first‑pass success with fewer retries, and that the jump from Gemma 3 to Gemma 4 dramatically improves agentic coding viability.

Agentic CodingGemma 4MacBook Pro
0 likes · 15 min read
Can Gemma 4 on a MacBook Pro or NVIDIA Blackwell Replace Cloud LLMs? A Hands‑On Performance Study
Open Source Linux
Open Source Linux
Jul 5, 2025 · Fundamentals

Why Nvidia’s Blackwell GPU Beats AMD RDNA4: Deep Dive into GB202 Architecture

This article examines Nvidia's massive GB202 Blackwell GPU—its 750 mm² die, 922 billion transistors, 192 SMs, and extensive memory subsystem—while comparing its compute units, instruction caches, atomics, and bandwidth against AMD's RDNA4‑based RX 9070, highlighting architectural trade‑offs, performance metrics, and future GPU competition.

AMD RDNA4GB202GPU architecture
0 likes · 20 min read
Why Nvidia’s Blackwell GPU Beats AMD RDNA4: Deep Dive into GB202 Architecture