Big Data 7 min read

Resolving Oozie Shell Scheduling Issues for Flink Jobs on CDH 6.3 with Kerberos Authentication

The article describes how to troubleshoot and fix Oozie shell‑action failures when submitting Flink jobs on a CDH 6.3 cluster with Kerberos, detailing environment‑variable conflicts, error messages, and the final solution using a clean environment and custom FLINK_CONF_DIR settings.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Resolving Oozie Shell Scheduling Issues for Flink Jobs on CDH 6.3 with Kerberos Authentication

The author encountered the same Oozie script scheduling problem for Flink tasks as described in an original blog post, and set out to reproduce and solve it.

The cluster runs CDH 6.3.0 with Flink 1.8.1; all components are secured with Kerberos and LDAP, so job submission is done via a unified Oozie shell action, requiring a per‑job keytab for fine‑grained permission control.

Initial Oozie shell script:

#!/bin/bash
flink run -m yarn-cluster flinktest.jar

Resulted in "flink command not find" because the command path was missing.

Using an absolute path produced a ClusterDeploymentException with a full stack trace:

org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
    at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:387)
    ... (additional stack frames) ...
Caused by: org.apache.hadoop.yarn.exceptions.InvalidApplicationMasterRequestException: Application Master is already regist

The failure was traced to Oozie overwriting HADOOP_CONF_DIR; manually exporting the correct directory inside the shell script allowed the job to be submitted, though success was intermittent.

Intermittent failures showed a ResourceManagerException indicating that the ResourceManager could not start, again caused by Oozie overriding many environment variables.

Solution: prepend env -i to the Flink command to clear all inherited variables, ensuring the shell uses the login user's environment:

#!/bin/bash
env -i /flink run -m yarn-cluster flinktest.jar

This cleared the environment and allowed the shell action to submit the Flink job successfully.

Kerberos remained an issue because Flink reads Kerberos settings from FLINK_CONF_DIR/flink-conf.yaml. To use a per‑job keytab, the author uploaded a custom conf directory with a modified flink-conf.yaml:

security.kerberos.login.keytab = .
security.kerberos.login.principal = xxx

Since Oozie copies uploaded files to the execution directory, the keytab path can be relative (e.g., ./).

#!/bin/bash
env -i FLINK_CONF_DIR=./conf /flink run -m yarn-cluster ./flinktest.jar

With this script, the job runs using the specified keytab, completing the resolution of Oozie‑based Flink submission on a Kerberos‑secured CDH cluster.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkYARNOozieshell scriptKerberosCDH
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.