Tagged articles

long-context inference

3 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Jun 2, 2026 · Artificial Intelligence

OSCAR Beats TurboQuant: 2‑Bit KV‑Cache for Fast, Stable Long‑Context Inference

OSCAR presents an attention‑aware rotation scheme that compresses KV caches to true 2‑bit, cutting memory usage by up to 8× and boosting decode throughput by up to 7×, while preserving inference quality within a few points of BF16 across multiple models and long‑context benchmarks, outperforming TurboQuant.

2-bit quantizationKV cacheOSCAR

0 likes · 13 min read

OSCAR Beats TurboQuant: 2‑Bit KV‑Cache for Fast, Stable Long‑Context Inference

Old Zhang's AI Learning

May 1, 2026 · Artificial Intelligence

DeepSeek‑V4 Local Deployment: How SGLang Overcomes the Architecture Challenges

The article analyzes DeepSeek‑V4's architectural innovations—including mixed sparse attention, mHC, and native FP4 weights—explains SGLang's ShadowRadix, HiSparse, and in‑graph speculative decoding solutions, presents benchmark gains, provides Docker deployment steps, and warns of key pitfalls for long‑context inference.

DeepSeek-V4HiSparseSGLang

0 likes · 15 min read

DeepSeek‑V4 Local Deployment: How SGLang Overcomes the Architecture Challenges

Baobao Algorithm Notes

Jan 21, 2025 · Artificial Intelligence

Inside Kimi 1.5: Four Innovations That Supercharge Long‑Context Multimodal Reasoning

The article analyzes Kimi 1.5’s technical report, detailing its four core innovations, long‑to‑short inference tricks, reinforcement‑learning infrastructure, and benchmark results that show it out‑performing competing models in long‑context and multimodal tasks.

Kimi 1.5Multimodal Reasoninglong-context inference

0 likes · 11 min read

Inside Kimi 1.5: Four Innovations That Supercharge Long‑Context Multimodal Reasoning

long-context inference

OSCAR Beats TurboQuant: 2‑Bit KV‑Cache for Fast, Stable Long‑Context Inference

DeepSeek‑V4 Local Deployment: How SGLang Overcomes the Architecture Challenges

Inside Kimi 1.5: Four Innovations That Supercharge Long‑Context Multimodal Reasoning

Inside Kimi 1.5: Four Innovations That Supercharge Long‑Context Multimodal Reasoning