Resolving JDK Version Mismatch in Spark Streaming Jobs with Elasticsearch on YARN
This guide explains how a Spark Streaming job failed due to an incorrect JDK version, demonstrates how to identify the mismatch from ApplicationMaster logs, and provides the correct Spark‑submit configuration to set JAVA_HOME for both driver and executor so the job runs successfully.
When submitting a Spark Streaming task in a test environment, the job repeatedly failed because of a JDK version problem.
The component versions in use were:
1 Spark 2.1
2 Elasticsearch 6.3.2
3 JDK1.8.0_162ApplicationMaster logs showed the error:
java.lang.UnsupportedClassVersionError: org/elasticsearch/client/RestHighLevelClient : Unsupported major.minor version 52.0The error clearly points to a JDK version mismatch. Although the Elasticsearch 6.3.2 documentation recommends Java 1.8.0_131 or later, our installed JDK 1.8.0_162 already satisfies this requirement.
The real issue was that Spark itself was not using the intended Java version; the Java version used at compile‑time differed from the one used at runtime.
Below is the original submission script, which only exported JAVA_HOME but did not affect the runtime environment:
#!/bin/bash
export JAVA_HOME=/usr/java/jdk1.8.0_162
spark2-submit \
--master yarn \
--deploy-mode cluster \
--executor-cores 1 \
--executor-memory 1g \
--driver-memory 1g \
--conf spark.dynamicAllocation.maxExecutors=2 \
--conf spark.locality.wait.node=0 \
--conf spark.executorEnv.JAVA_HOME=/usr/java/jdk1.8.0_162 \
--conf spark.driverEnv.JAVA_HOME=/usr/java/jdk1.8.0_162 \
--files app.properties \
--jars protobuf-java-3.0.2.jar \
--class com.bigdata.note.sink.es.streaming.Sink2TestES \
--name Sink2TestES \
data-sink-es.jarChecking the ApplicationMaster logs revealed that the job was actually running with Java 1.7.0_67:
19/07/19 16:59:10 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_67
19/07/19 16:59:10 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
19/07/19 16:59:10 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.7.0_67-cloudera/jreThe correct way to set the JDK for both the driver and the executors is to pass the environment variables via Spark configuration parameters spark.yarn.appMasterEnv.JAVA_HOME and spark.executorEnv.JAVA_HOME in the spark2-submit command.
The updated script that resolves the issue is:
#!/bin/bash
spark2-submit \
--master yarn \
--deploy-mode cluster \
--executor-cores 1 \
--executor-memory 1g \
--driver-memory 1g \
--conf spark.yarn.appMasterEnv.JAVA_HOME=/usr/java/jdk1.8.0_162 \
--conf spark.executorEnv.JAVA_HOME=/usr/java/jdk1.8.0_162 \
--conf spark.dynamicAllocation.maxExecutors=2 \
--conf spark.locality.wait.node=0 \
--conf spark.executor.userClassPathFirst=true \
--conf spark.driver.userClassPathFirst=true \
--files app.properties \
--jars protobuf-java-3.0.2.jar \
--class com.bigdata.note.sink.es.streaming.Sink2TestES \
--name Sink2TestES \
data-sink-es.jarBy setting these two parameters, the job correctly uses Java 1.8.0_162 on both driver and executor nodes, and the Spark Streaming task runs without the previous class‑version error.
Additional references and related articles are listed at the end of the original document.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.