Tagged articles

inference speedup

2 articles · Page 1 of 1

Sep 30, 2025 · Artificial Intelligence

SpikingBrain-1.0 Achieves 100× Faster Inference with Brain‑Inspired Spiking Architecture

SpikingBrain-1.0, the first domestically‑produced brain‑inspired spiking large model, links spiking neuron dynamics to linear attention, delivering over 100× faster first‑token latency on 4‑million‑token sequences, 23.4% FLOP utilization, 69% sparsity, and a one‑click deployment tutorial on HyperAI.

Large Language ModelSpikingBrain-1.0brain-inspired AI

0 likes · 7 min read

SpikingBrain-1.0 Achieves 100× Faster Inference with Brain‑Inspired Spiking Architecture

AIWalker

Feb 19, 2025 · Artificial Intelligence

DeepSeek’s NSA Attention Cuts Inference Time 11× – CEO Liang Co‑author

DeepSeek introduces the NSA sparse attention mechanism, combining dynamic hierarchical sparsity, coarse token compression and fine token selection to achieve up to 11.6× faster inference, lower pre‑training cost, and superior benchmark performance across general, long‑context, and chain‑of‑thought tasks.

DeepSeekLLM OptimizationNSA

0 likes · 9 min read

DeepSeek’s NSA Attention Cuts Inference Time 11× – CEO Liang Co‑author