Why Hadoop Clusters Need Strong Security and How Kerberos Protects Them

This article explains the security risks facing Hadoop clusters, outlines common attack methods, introduces Kerberos authentication, and describes Transwarp Data Hub's multi‑layer security architecture—including Guardian, KRB5LDAP, and authorization controls—to help administrators secure their big‑data environments.

StarRing Big Data Open Lab
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Why Hadoop Clusters Need Strong Security and How Kerberos Protects Them

Why Hadoop Clusters Need Security

Ransomware groups are targeting Hadoop clusters by exploiting insecure configurations rather than traditional vulnerabilities, allowing attackers to manipulate data easily. In China, over 8,300 Hadoop clusters expose the 50070 port publicly, highlighting a severe security problem.

Common Attack Vectors

If the HDFS NameNode lacks user authentication, attackers can impersonate users and access the file system.

If the HDFS DataNode lacks authentication, knowing a block ID lets attackers read or corrupt that block.

If the YARN ResourceManager is unauthenticated, attackers can misuse cluster resources.

If the YARN NodeManager is unauthenticated, attackers can kill or modify applications.

If Kafka lacks security authentication, a fake consumer can read any producer’s data, causing leakage.

Unencrypted network traffic enables eavesdropping on confidential data.

Kerberos Basics

Kerberos, developed by MIT, is the most widely used network authentication protocol for distributed systems. In a secure Hadoop environment, all services authenticate via Kerberos (except Inceptor, which can also use LDAP).

Ticket : Credential used to access services. Two types are needed:

Ticket‑Granting Ticket (TGT): Obtained by the user with a principal and password or keytab; cached on the client.

Service Ticket: Issued automatically based on the cached TGT to access specific services.

Principal : The identity (similar to a username) that receives a TGT.

Password and Keytab : Authentication can use a password or a keytab file containing the principal’s secret.

Kerberos authentication flow: the client requests a ticket from the Authentication Server (AS), receives an encrypted ticket containing a session key, and presents this ticket to the service for verification and encrypted communication.

Kerberos authentication flow
Kerberos authentication flow

TDH Security Mechanisms

TDH provides three independent authentication layers:

Operating‑system level authentication on each server for user and service access.

KRB5LDAP system for service‑to‑service and user‑to‑service authentication.

Transwarp Manager authentication for manager access.

These layers are isolated, requiring administrators to maintain separate credentials for each, which is cumbersome.

Guardian Service

Guardian unifies the three authentication mechanisms, allowing a single username/password to access the OS, Transwarp Manager, and cluster services. It synchronizes credential information, simplifying management. While legacy command‑line tools for KRB5LDAP and Kerberos remain supported, using Guardian is recommended for unified control.

KRB5LDAP Overview

KRB5LDAP manages inter‑service and user‑to‑service access, providing Kerberos authentication for most services and LDAP authentication for Inceptor. Deployment is optional but essential in multi‑tenant scenarios for resource isolation.

Authorization in TDH

Four services enforce fine‑grained permissions:

HDFS: rwx permissions on files and directories.

Hyperbase: RWCXA (Read, Write, Create, Exec, Admin) at global, namespace, table, column family, and column qualifier levels.

Inceptor: SQL‑based SELECT/CREATE/DELETE/UPDATE/INSERT permissions at global, database, table, and row levels.

Kafka: RWCD (Read, Write, Create, Delete) permissions at global or topic level.

Transwarp Security Manual

The manual, available on Transpedia, contains three parts: an administrator guide for installing and configuring security components, a user guide for managing personal credentials in a secure cluster, and a component guide detailing authorization for each TDH service.

Conclusion and Outlook

The article outlines Transwarp’s approach to securing Hadoop clusters through Kerberos, KRB5LDAP, and Guardian. While Hadoop and Spark ecosystems still face fragmented permission models, Transwarp continues to improve its platform to offer a secure, efficient, and stable big‑data foundation for enterprises.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

HadoopKerberosTDH
StarRing Big Data Open Lab
Written by

StarRing Big Data Open Lab

Focused on big data technology research, exploring the Big Data era | [email protected]

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.