Running Flink on Kerberos-secured YARN: Authentication and Configuration Guide
This article explains why Kerberos is needed for Hadoop clusters, details the Kerberos authentication workflow, and provides step‑by‑step instructions for configuring Flink to run on a Kerberos‑protected YARN environment using delegation tokens or keytab files, along with proxy‑user settings.
Flink, as a next‑generation big‑data processing engine, often runs on YARN clusters that require Kerberos authentication; the article first outlines the security issues in early Hadoop versions and why Kerberos provides machine‑level authentication for both server‑to‑server and client‑to‑server interactions.
It then describes the core Kerberos concepts—Principal, KDC, Ticket, AS, and TGS—and walks through the three exchange phases (AS, TGS, and CS) with illustrative diagrams, showing how a client obtains a Ticket‑Granting Ticket (TGT), requests service tickets, and finally authenticates to a server.
For running Flink on a Kerberos‑enabled YARN, two practical methods are presented:
Using a delegation token: obtain a TGT via
kinit wanghuan70
Password for [email protected]:or kinit -kt wanghuan70.keytab wanghuan70, verify with
$ klist
Ticket cache: FILE:/tmp/krb5cc_2124
Default principal: [email protected]
..., then enable ticket cache in security.kerberos.login.use-ticket-cache: true in flink-conf.yaml. This works for short‑lived jobs.
Using a keytab: upload the keytab to HDFS, let the ApplicationMaster copy it to containers, and call UserGroupInformation.loginUserFromKeytab() during AM startup. Configure Flink with
security.kerberos.login.keytab: /home/hadoop_runner/hadoop-3.2.1/etc/hadoop/krb5.keytab
security.kerberos.login.principal: superuser
security.kerberos.login.contexts: Client. A background thread must periodically renew the TGT.
The article also covers setting Hadoop proxy users for job submission, noting that environment variables like export HADOOP_USER_NAME=hdfs or HADOOP_PROXY_USER must be propagated to TaskManager processes via Flink options such as:
env.java.opts: -DHADOOD_PROXY_USER=hdfs
env.ssh.opts: export HADOOP_PROXY_USER=hdfsFinally, it mentions a known limitation in Flink 1.10 where configuring a keytab overrides proxy‑user settings, and references the ongoing Flink‑11271 improvement proposal.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
