Big Data 11 min read

Optimizing Hadoop MapReduce Jobs for eBay CAL System to Reduce Execution Time and Resource Usage

This article describes how eBay's Central Application Logging (CAL) system generates massive daily logs, the challenges of Hadoop MapReduce job performance and resource consumption, and the step‑by‑step optimizations—reducing GC time, mitigating data skew, and improving algorithms—that cut execution time by over 60%, lowered cluster resource usage, and raised job success rates to nearly 100%.

Big Data Technology & Architecture

Feb 13, 2020

Optimizing Hadoop MapReduce Jobs for eBay CAL System to Reduce Execution Time and Resource Usage

Abstract: eBay's CAL system collects petabyte‑scale logs and uses Hadoop MapReduce jobs to generate reports providing API latency percentiles, service call relationships, and database operations. Optimizing these jobs is crucial due to growing data volume.

Why Optimize: The CAL MapReduce jobs originally consumed about 50% of the Hadoop cluster, with only 19% usable during a 9‑hour window, and a success rate of 92.5%.

Current State: The job faces large data sets, high resource usage, and a 92.5% success rate.

Execution Time Optimization: Execution time depends on the slowest Mapper and Reducer tasks. Formulas relate execution time to task counts and record numbers. Reducing GC time, avoiding data skew, and improving algorithms were targeted.

Resource Usage Optimization: Memory usage is proportional to task execution time. Adjustments to container memory sizes, task counts, and time‑window eviction of old CAL transactions reduced memory pressure.

Solutions:

GC Reduction: Implemented time‑window eviction for CAL transactions and used Combiner to lower data transferred between Mapper and Reducer.

Data Skew Mitigation: Applied CombineFileInputFormat to merge small files, halving Mapper tasks, and refined partitioning using both report and metric names.

Algorithm Improvements: Reordered key composition to metric+timestamp , cached SQL parsing results, and optimized input distribution, dramatically cutting execution time.

Results: Execution time decreased by over 60%, resource usage dropped from 50% to 19% of the cluster, and job success rate increased from 92.5% to ~99.9%.

Conclusion: Optimizing Hadoop MR jobs for CAL improved performance, resource efficiency, and reliability, demonstrating the importance of systematic profiling, GC management, data skew handling, and algorithmic refinements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Big Data Resource Management Data Skew GC MapReduce Hadoop

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.