Machine Heart
Machine Heart
Apr 1, 2026 · Artificial Intelligence

TurboQuant’s Alleged Misconduct: Google’s Reply Sparks Bigger Controversy

The TurboQuant paper on LLM quantization has ignited a heated debate over alleged academic misconduct, with the authors’ OpenReview rebuttal drawing criticism for downplaying prior work, misrepresenting benchmarks, and prompting broader concerns about research integrity in AI.

AI research integrityLLM quantizationRaBitQ
0 likes · 9 min read
TurboQuant’s Alleged Misconduct: Google’s Reply Sparks Bigger Controversy
Old Meng AI Explorer
Old Meng AI Explorer
Dec 29, 2025 · Artificial Intelligence

Run 100B LLMs on a Laptop: How BitNet’s 1‑bit Quantization Makes It Possible

BitNet’s 1‑bit quantization shrinks model size and compute needs by tenfold, enabling ordinary CPUs and low‑power ARM devices to run 2B‑100B language models locally with acceptable speed, low power consumption, and near‑original quality, while providing simple installation and optional GPU acceleration.

BitNetCPU inferenceLLM quantization
0 likes · 10 min read
Run 100B LLMs on a Laptop: How BitNet’s 1‑bit Quantization Makes It Possible
Old Meng AI Explorer
Old Meng AI Explorer
Dec 25, 2025 · Artificial Intelligence

Run 100B LLM on a Laptop: BitNet’s 1‑Bit Quantization Enables CPU‑Only AI

BitNet, Microsoft’s open‑source 1‑bit quantization framework, shrinks model size by up to ten‑fold and lets ordinary CPUs—including i7 laptops and ARM tablets—run 2B‑100B language models at usable speeds while cutting power consumption dramatically, offering a practical, GPU‑free solution for local AI.

BitNetCPU inferenceLLM quantization
0 likes · 9 min read
Run 100B LLM on a Laptop: BitNet’s 1‑Bit Quantization Enables CPU‑Only AI
Tencent Technical Engineering
Tencent Technical Engineering
Oct 10, 2025 · Artificial Intelligence

How Tequila’s 1.58‑Bit Quantization Overcomes the Dead‑Zone Trap in LLMs

Tequila introduces a novel 1.58‑bit ternary quantization for large language models that tackles the dead‑zone trap by reactivating zero‑weight biases with dynamic offline offsets, achieving near‑full‑precision performance, faster convergence, and up to three‑fold CPU inference speedups.

AI inferenceLLM quantizationdynamic bias
0 likes · 9 min read
How Tequila’s 1.58‑Bit Quantization Overcomes the Dead‑Zone Trap in LLMs
Architect
Architect
Mar 5, 2025 · Artificial Intelligence

How Does Quantization Shrink LLMs? A Deep Dive into GPTQ, GGUF, and Techniques

This article explains why large language models need quantization, describes the core concepts, classification schemes, symmetric and asymmetric methods, handling of outliers, and compares post‑training quantization (PTQ) with quantization‑aware training (QAT), while detailing popular techniques such as GPTQ, GGUF, and BitNet.

AI hardwareGGUFGPTQ
0 likes · 25 min read
How Does Quantization Shrink LLMs? A Deep Dive into GPTQ, GGUF, and Techniques