Tagged articles

2-bit quantization

2 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Jun 2, 2026 · Artificial Intelligence

OSCAR Beats TurboQuant: 2‑Bit KV‑Cache for Fast, Stable Long‑Context Inference

OSCAR presents an attention‑aware rotation scheme that compresses KV caches to true 2‑bit, cutting memory usage by up to 8× and boosting decode throughput by up to 7×, while preserving inference quality within a few points of BF16 across multiple models and long‑context benchmarks, outperforming TurboQuant.

2-bit quantizationKV cacheOSCAR

0 likes · 13 min read

OSCAR Beats TurboQuant: 2‑Bit KV‑Cache for Fast, Stable Long‑Context Inference

Node.js Tech Stack

May 9, 2026 · Artificial Intelligence

Redis Founder Crafts DeepSeek V4 AI Inference Engine, Node.js Star Applauds

Redis creator Salvatore Sanfilippo (antirez) released DS4, a Metal‑only C inference engine tailored for DeepSeek V4 Flash on high‑end Macs, featuring narrow model focus, 2‑bit quantization, disk‑based KV cache, benchmark speeds around 26 tokens/s, and a dual OpenAI/Anthropic compatible server.

2-bit quantizationAI inference engineDeepSeek-V4

0 likes · 13 min read