AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Dec 19, 2025 · Artificial Intelligence

The 9 Key Ideas Behind FlashAttention

FlashAttention accelerates transformer inference by combining nine techniques—including loss‑less attention, GPU memory‑pyramid optimization, SRAM‑reusing tiling, safe softmax scaling, online buffering, tile‑size constraints, parallel multiplication, reduced KV slicing, and integrated backward‑pass caching—to achieve efficient, high‑throughput computation on modern GPUs.

Attention MechanismFlashAttentionGPU Optimization
0 likes · 8 min read
The 9 Key Ideas Behind FlashAttention
iQIYI Technical Product Team
iQIYI Technical Product Team
May 24, 2019 · Industry Insights

iQIYI’s 8K VR Live Streaming: Cutting Bitrate 75% and Eliminating Motion Latency

The article examines iQIYI’s 8K VR live‑streaming pipeline, detailing how 5G connectivity, tiled encoding, ROI‑focused transmission, and hardware‑accelerated processing reduce bitrate by 75 % and bring motion‑to‑photon latency down to zero, while addressing resolution, bandwidth, and latency challenges of immersive VR broadcasts.

5G8K streamingIndustry Insight
0 likes · 9 min read
iQIYI’s 8K VR Live Streaming: Cutting Bitrate 75% and Eliminating Motion Latency