Integrating Kerberos with Spark on CDH: Configuration, Deployment, and Troubleshooting Guide
This guide explains how to prepare a CDH‑based Spark environment for Kerberos authentication, covering prerequisite knowledge, classpath adjustments, HBase configuration files, Spark‑Env settings, user permission grants, Spark‑Submit execution, and common troubleshooting steps.
Before starting, assume familiarity with Spark programs, spark‑submit scripts, CDH Kerberos integration, and HBase (no Spark‑HBase code is provided).
Implementation Steps
Verify CDH Spark2 installation; after installation you should see the Spark UI.
Integrate Spark with Kerberos by adding required HBase JARs to /etc/extra-lib/hbase/classpath.txt on every node and copying the file to all cluster nodes.
Update Spark2 configuration in Cloudera Manager:
In spark2-conf/spark-env.sh add the security‑related block.
Set
export SPARK_DIST_CLASSPATH="$(paste -sd: /etc/extra-lib/hbase/classpath.txt)"in both the service and client advanced configuration sections.
Restart Spark2 and deploy the client.
Add hbase_site.xml to $SPARK_HOME/conf (and also to yarn‑conf) and distribute it to other nodes.
Testing the Spark Job
Grant the required HBase table permissions to the user (e.g., deng_yb) using HBase shell commands:
grant 'deng_yb','RW','U:DAY_ORG_CMP_OSI'
grant 'deng_yb','RW','U:DAY_ORG_PRO_CATE_SPARK'Obtain a Kerberos ticket for the user with kinit and verify with klist.
Run the Spark application via spark2-submit (example command shown below):
spark2-submit --master yarn --deploy-mode cluster \
--executor-memory 4G --total-executor-cores 4 \
--driver-memory 4g --class com.W.Main \
/usr/local/W/bi-bdap-0.1.0-SNAPSHOT.jarAfter submission, check the ApplicationMaster (AM) logs and then the executor logs to confirm successful execution.
Common Issues and Solutions
Authentication failures often stem from missing HBase JARs in $SPARK_CLASSPATH. Ensure all required JARs (e.g., htrace-core-3.0.4.jar, hbase-client.jar, etc.) are listed in classpath.txt.
HBase connection timeouts usually indicate that hbase_site.xml is absent from $SPARK_HOME/conf. Copy the file to the Spark configuration directory on every node.
Conclusion
Running Spark in a Kerberos‑secured CDH cluster requires that the Spark classpath includes all necessary HBase JARs and that hbase_site.xml is present in $SPARK_HOME/conf (or supplied via program configuration). Once these conditions are met, Spark jobs can access HBase tables securely.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
