Xiaohongshu Tech REDtech
May 15, 2023 · Artificial Intelligence
GPU-Accelerated Inference Optimization for Large-Scale Machine Learning at Xiaohongshu
Xiaohongshu transformed its recommendation, advertising, and search inference pipeline by migrating to GPU‑centric hardware, deploying a custom TensorFlow‑Core Lambda service, and applying system‑level, virtualization, and compute‑level optimizations—including NUMA binding, kernel fusion, dynamic scaling, and FP16 quantization—achieving roughly 30× compute capacity growth, over 10% user‑metric gains, and more than 50% cluster‑resource savings.
Deep LearningGPU optimizationMachine Learning Inference
0 likes · 20 min read