Big Data 12 min read

Integrating Kerberos with Spark on CDH: Configuration, Deployment, and Troubleshooting Guide

This guide explains how to prepare a CDH‑based Spark environment for Kerberos authentication, covering prerequisite knowledge, classpath adjustments, HBase configuration files, Spark‑Env settings, user permission grants, Spark‑Submit execution, and common troubleshooting steps.

Big Data Technology & Architecture

Aug 22, 2020

Integrating Kerberos with Spark on CDH: Configuration, Deployment, and Troubleshooting Guide

Before starting, assume familiarity with Spark programs, spark‑submit scripts, CDH Kerberos integration, and HBase (no Spark‑HBase code is provided).

Implementation Steps

Verify CDH Spark2 installation; after installation you should see the Spark UI.

Integrate Spark with Kerberos by adding required HBase JARs to /etc/extra-lib/hbase/classpath.txt on every node and copying the file to all cluster nodes.

Update Spark2 configuration in Cloudera Manager:

In spark2-conf/spark-env.sh add the security‑related block.

Set

export SPARK_DIST_CLASSPATH="$(paste -sd: /etc/extra-lib/hbase/classpath.txt)"

in both the service and client advanced configuration sections.

Restart Spark2 and deploy the client.

Add hbase_site.xml to $SPARK_HOME/conf (and also to yarn‑conf) and distribute it to other nodes.

Testing the Spark Job

Grant the required HBase table permissions to the user (e.g., deng_yb) using HBase shell commands:

grant 'deng_yb','RW','U:DAY_ORG_CMP_OSI'
grant 'deng_yb','RW','U:DAY_ORG_PRO_CATE_SPARK'

Obtain a Kerberos ticket for the user with kinit and verify with klist.

Run the Spark application via spark2-submit (example command shown below):

spark2-submit --master yarn --deploy-mode cluster \
  --executor-memory 4G --total-executor-cores 4 \
  --driver-memory 4g --class com.W.Main \
  /usr/local/W/bi-bdap-0.1.0-SNAPSHOT.jar

After submission, check the ApplicationMaster (AM) logs and then the executor logs to confirm successful execution.

Common Issues and Solutions

Authentication failures often stem from missing HBase JARs in $SPARK_CLASSPATH. Ensure all required JARs (e.g., htrace-core-3.0.4.jar, hbase-client.jar, etc.) are listed in classpath.txt.

HBase connection timeouts usually indicate that hbase_site.xml is absent from $SPARK_HOME/conf. Copy the file to the Spark configuration directory on every node.

Conclusion

Running Spark in a Kerberos‑secured CDH cluster requires that the Spark classpath includes all necessary HBase JARs and that hbase_site.xml is present in $SPARK_HOME/conf (or supplied via program configuration). Once these conditions are met, Spark jobs can access HBase tables securely.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data HBase security YARN Spark Kerberos CDH

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.