Big Data 6 min read

Resolving JDK Version Mismatch in Spark Streaming Jobs with Elasticsearch on YARN

This guide explains how a Spark Streaming job failed due to an incorrect JDK version, demonstrates how to identify the mismatch from ApplicationMaster logs, and provides the correct Spark‑submit configuration to set JAVA_HOME for both driver and executor so the job runs successfully.

Big Data Technology Architecture

Jul 20, 2019

Resolving JDK Version Mismatch in Spark Streaming Jobs with Elasticsearch on YARN

When submitting a Spark Streaming task in a test environment, the job repeatedly failed because of a JDK version problem.

The component versions in use were:

1 Spark 2.1
2 Elasticsearch 6.3.2
3 JDK1.8.0_162

ApplicationMaster logs showed the error:

java.lang.UnsupportedClassVersionError: org/elasticsearch/client/RestHighLevelClient : Unsupported major.minor version 52.0

The error clearly points to a JDK version mismatch. Although the Elasticsearch 6.3.2 documentation recommends Java 1.8.0_131 or later, our installed JDK 1.8.0_162 already satisfies this requirement.

The real issue was that Spark itself was not using the intended Java version; the Java version used at compile‑time differed from the one used at runtime.

Below is the original submission script, which only exported JAVA_HOME but did not affect the runtime environment:

#!/bin/bash

export JAVA_HOME=/usr/java/jdk1.8.0_162
spark2-submit \
  --master yarn \
  --deploy-mode cluster \
  --executor-cores 1 \
  --executor-memory 1g \
  --driver-memory 1g \
  --conf spark.dynamicAllocation.maxExecutors=2 \
  --conf spark.locality.wait.node=0 \
  --conf spark.executorEnv.JAVA_HOME=/usr/java/jdk1.8.0_162 \
  --conf spark.driverEnv.JAVA_HOME=/usr/java/jdk1.8.0_162 \
  --files app.properties \
  --jars protobuf-java-3.0.2.jar \
  --class com.bigdata.note.sink.es.streaming.Sink2TestES \
  --name Sink2TestES \
  data-sink-es.jar

Checking the ApplicationMaster logs revealed that the job was actually running with Java 1.7.0_67:

19/07/19 16:59:10 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_67
19/07/19 16:59:10 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
19/07/19 16:59:10 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.7.0_67-cloudera/jre

The correct way to set the JDK for both the driver and the executors is to pass the environment variables via Spark configuration parameters spark.yarn.appMasterEnv.JAVA_HOME and spark.executorEnv.JAVA_HOME in the spark2-submit command.

The updated script that resolves the issue is:

#!/bin/bash

spark2-submit \
  --master yarn \
  --deploy-mode cluster \
  --executor-cores 1 \
  --executor-memory 1g \
  --driver-memory 1g \
  --conf spark.yarn.appMasterEnv.JAVA_HOME=/usr/java/jdk1.8.0_162 \
  --conf spark.executorEnv.JAVA_HOME=/usr/java/jdk1.8.0_162 \
  --conf spark.dynamicAllocation.maxExecutors=2 \
  --conf spark.locality.wait.node=0 \
  --conf spark.executor.userClassPathFirst=true \
  --conf spark.driver.userClassPathFirst=true \
  --files app.properties \
  --jars protobuf-java-3.0.2.jar \
  --class com.bigdata.note.sink.es.streaming.Sink2TestES \
  --name Sink2TestES \
  data-sink-es.jar

By setting these two parameters, the job correctly uses Java 1.8.0_162 on both driver and executor nodes, and the Spark Streaming task runs without the previous class‑version error.

Additional references and related articles are listed at the end of the original document.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data JDK

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.