Xiaohongshu Tech REDtech
May 18, 2026 · Artificial Intelligence
CCD‑Aware Thread Orchestration Shatters Multi‑Core CPU Vector Search Performance Ceiling
The paper presents a CCD‑level load‑aware thread orchestration framework that boosts vector ANNS throughput up to 3.7×, cuts P999 tail latency by 30%‑90%, reduces L3 cache miss rates by 6%‑30% and CPU stall time by 20%‑80% on AMD EPYC multi‑chiplet CPUs.
ANNSCCDCPU cache
0 likes · 19 min read
