Resolving Oozie Shell Scheduling Issues for Flink Jobs on CDH 6.3 with Kerberos Authentication
The article describes how to troubleshoot and fix Oozie shell‑action failures when submitting Flink jobs on a CDH 6.3 cluster with Kerberos, detailing environment‑variable conflicts, error messages, and the final solution using a clean environment and custom FLINK_CONF_DIR settings.
The author encountered the same Oozie script scheduling problem for Flink tasks as described in an original blog post, and set out to reproduce and solve it.
The cluster runs CDH 6.3.0 with Flink 1.8.1; all components are secured with Kerberos and LDAP, so job submission is done via a unified Oozie shell action, requiring a per‑job keytab for fine‑grained permission control.
Initial Oozie shell script:
#!/bin/bash
flink run -m yarn-cluster flinktest.jarResulted in "flink command not find" because the command path was missing.
Using an absolute path produced a ClusterDeploymentException with a full stack trace:
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:387)
... (additional stack frames) ...
Caused by: org.apache.hadoop.yarn.exceptions.InvalidApplicationMasterRequestException: Application Master is already registThe failure was traced to Oozie overwriting HADOOP_CONF_DIR; manually exporting the correct directory inside the shell script allowed the job to be submitted, though success was intermittent.
Intermittent failures showed a ResourceManagerException indicating that the ResourceManager could not start, again caused by Oozie overriding many environment variables.
Solution: prepend env -i to the Flink command to clear all inherited variables, ensuring the shell uses the login user's environment:
#!/bin/bash
env -i /flink run -m yarn-cluster flinktest.jarThis cleared the environment and allowed the shell action to submit the Flink job successfully.
Kerberos remained an issue because Flink reads Kerberos settings from FLINK_CONF_DIR/flink-conf.yaml. To use a per‑job keytab, the author uploaded a custom conf directory with a modified flink-conf.yaml:
security.kerberos.login.keytab = .
security.kerberos.login.principal = xxxSince Oozie copies uploaded files to the execution directory, the keytab path can be relative (e.g., ./).
#!/bin/bash
env -i FLINK_CONF_DIR=./conf /flink run -m yarn-cluster ./flinktest.jarWith this script, the job runs using the specified keytab, completing the resolution of Oozie‑based Flink submission on a Kerberos‑secured CDH cluster.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
