Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jul 3, 2024 · Operations

How to Eliminate Network Hash Collisions in Large‑Model Training

This article examines the impact of GPU communication bottlenecks on large‑model training, analyzes hash‑collision issues in high‑performance networks, and presents three practical solutions—including increasing RDMA streams, affinity‑aware scheduling, and dynamic load balancing—to boost effective network bandwidth up to 95%.

Hash CollisionRDMAdynamic load balancing
0 likes · 11 min read
How to Eliminate Network Hash Collisions in Large‑Model Training
Meituan Technology Team
Meituan Technology Team
Sep 6, 2018 · Backend Development

Design and Implementation of Oceanus Custom HTTP Routing Framework

Oceanus is Meituan’s Nginx‑based HTTP routing framework that abstracts routing policies, stores them in MySQL, updates them asynchronously, evaluates Lua‑rendered conditions at runtime, and dynamically switches upstream groups without Nginx reload, enabling flexible traffic control for flash‑sale testing, regional isolation, gray releases, and other complex scenarios.

HTTP routingLuaNginx
0 likes · 16 min read
Design and Implementation of Oceanus Custom HTTP Routing Framework