Big Data 11 min read

Handling java.lang.OutOfMemoryError in Hadoop MapReduce

This article explains the four locations where java.lang.OutOfMemoryError can occur in Hadoop's MapReduce framework—client, ApplicationMaster, Map, and Reduce phases—and provides configuration adjustments and best‑practice solutions to mitigate each type of OOM issue.

HomeTech

Dec 24, 2021

Handling java.lang.OutOfMemoryError in Hadoop MapReduce

Hadoop has become the de‑facto standard for big‑data processing, with YARN for resource management, HDFS for storage, and MapReduce for computation. This article focuses on diagnosing and resolving java.lang.OutOfMemoryError (OOM) incidents that arise while using the MapReduce framework.

OOM errors occur when the Java Virtual Machine cannot allocate memory and the garbage collector fails to reclaim space. The causes are broadly classified as memory leaks—where allocated memory is not released—and memory overflow—where the program legitimately needs more heap space.

1. Hadoop Client

When submitting a JAR to the Hadoop client, OOM may appear, as shown in the stack trace below. The remedy is to increase the client JVM heap size via the HADOOP_CLIENT_OPTS environment variable, either permanently in hadoop-env.sh or temporarily on the command line:

export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS"

2. ApplicationMaster (APPMaster) Phase

If the ApplicationMaster runs out of memory, adjust the YARN configuration in mapred-site.xml to allocate more memory and JVM options, for example:

<property>
  <name>yarn.app.mapreduce.am.resource.mb</name>
  <value>516</value>
</property>
<property>
  <name>yarn.app.mapreduce.am.command-opts</name>
  <value>-Xmx2048m</value>
</property>

3. Map Phase

OOM during the Map stage can be mitigated by increasing the map task memory and JVM options in mapred-site.xml:

<property>
  <name>mapreduce.map.memory.mb</name>
  <value>2048</value>
</property>
<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Xmx1024m</value>
  <final>true</final>
</property>

4. Reduce Phase

Similarly, OOM in the Reduce stage is addressed by configuring mapreduce.reduce.memory.mb and mapreduce.reduce.java.opts in mapred-site.xml:

<property>
  <name>mapreduce.reduce.memory.mb</name>
  <value>2048</value>
</property>
<property>
  <name>mapreduce.reduce.java.opts</name>
  <value>-Xmx1024m</value>
  <final>true</final>
</property>

By applying these configuration changes, most OOM incidents can be temporarily resolved, allowing the job to continue while developers use Java profiling tools (e.g., MAT, JProfiler) to locate and fix the underlying code issues.

Author : Han Yu, Cloud Platform Department, Big Data Development Engineer.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance Tuning MapReduce Hadoop OutOfMemoryError

Written by

HomeTech

HomeTech tech sharing

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.