Optimizing Hadoop MapReduce Jobs for eBay CAL System to Reduce Execution Time and Resource Usage
This article describes how eBay's Central Application Logging (CAL) system generates massive daily logs, the challenges of Hadoop MapReduce job performance and resource consumption, and the step‑by‑step optimizations—reducing GC time, mitigating data skew, and improving algorithms—that cut execution time by over 60%, lowered cluster resource usage, and raised job success rates to nearly 100%.
Abstract: eBay's CAL system collects petabyte‑scale logs and uses Hadoop MapReduce jobs to generate reports providing API latency percentiles, service call relationships, and database operations. Optimizing these jobs is crucial due to growing data volume.
Why Optimize: The CAL MapReduce jobs originally consumed about 50% of the Hadoop cluster, with only 19% usable during a 9‑hour window, and a success rate of 92.5%.
Current State: The job faces large data sets, high resource usage, and a 92.5% success rate.
Execution Time Optimization: Execution time depends on the slowest Mapper and Reducer tasks. Formulas relate execution time to task counts and record numbers. Reducing GC time, avoiding data skew, and improving algorithms were targeted.
Resource Usage Optimization: Memory usage is proportional to task execution time. Adjustments to container memory sizes, task counts, and time‑window eviction of old CAL transactions reduced memory pressure.
Solutions:
GC Reduction: Implemented time‑window eviction for CAL transactions and used Combiner to lower data transferred between Mapper and Reducer.
Data Skew Mitigation: Applied CombineFileInputFormat to merge small files, halving Mapper tasks, and refined partitioning using both report and metric names.
Algorithm Improvements: Reordered key composition to metric+timestamp , cached SQL parsing results, and optimized input distribution, dramatically cutting execution time.
Results: Execution time decreased by over 60%, resource usage dropped from 50% to 19% of the cluster, and job success rate increased from 92.5% to ~99.9%.
Conclusion: Optimizing Hadoop MR jobs for CAL improved performance, resource efficiency, and reliability, demonstrating the importance of systematic profiling, GC management, data skew handling, and algorithmic refinements.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
