Tagged articles

Large‑scale data processing

2 articles · Page 1 of 1

Dec 24, 2024 · Big Data

Magnet: A Push‑Based Shuffle Service that Scales to Petabyte‑Level Data Processing

LinkedIn’s massive Spark workloads suffer from shuffle bottlenecks caused by tiny shuffle blocks, unreliable RPC connections, and data skew, so the authors design Magnet—a push‑merge shuffle service that merges blocks into large chunks, improves disk I/O, tolerates failures, and cuts end‑to‑end job time by nearly 30% regardless of hardware.

Disk I/O optimizationLarge‑scale data processingPush‑based service

0 likes · 56 min read

Magnet: A Push‑Based Shuffle Service that Scales to Petabyte‑Level Data Processing

ITPUB

Jun 27, 2022 · Big Data

How Kuaishou Guarantees Real‑Time Data Warehouse Performance at Billion‑Scale Events

This article details Kuaishou's real‑time data warehouse architecture, the business challenges of massive traffic and diverse requirements, and the forward‑ and reverse‑assurance strategies—including lifecycle standards, monitoring, fault‑injection testing, and a Spring Festival case study—that together ensure high stability, low latency, and sub‑0.5% accuracy for billion‑scale streaming workloads.

Fault InjectionFlink streamingKuaishou

0 likes · 22 min read

How Kuaishou Guarantees Real‑Time Data Warehouse Performance at Billion‑Scale Events