Why Hadoop Clusters Need Strong Security and How Kerberos Protects Them
This article explains the security risks facing Hadoop clusters, outlines common attack methods, introduces Kerberos authentication, and describes Transwarp Data Hub's multi‑layer security architecture—including Guardian, KRB5LDAP, and authorization controls—to help administrators secure their big‑data environments.
Why Hadoop Clusters Need Security
Ransomware groups are targeting Hadoop clusters by exploiting insecure configurations rather than traditional vulnerabilities, allowing attackers to manipulate data easily. In China, over 8,300 Hadoop clusters expose the 50070 port publicly, highlighting a severe security problem.
Common Attack Vectors
If the HDFS NameNode lacks user authentication, attackers can impersonate users and access the file system.
If the HDFS DataNode lacks authentication, knowing a block ID lets attackers read or corrupt that block.
If the YARN ResourceManager is unauthenticated, attackers can misuse cluster resources.
If the YARN NodeManager is unauthenticated, attackers can kill or modify applications.
If Kafka lacks security authentication, a fake consumer can read any producer’s data, causing leakage.
Unencrypted network traffic enables eavesdropping on confidential data.
Kerberos Basics
Kerberos, developed by MIT, is the most widely used network authentication protocol for distributed systems. In a secure Hadoop environment, all services authenticate via Kerberos (except Inceptor, which can also use LDAP).
Ticket : Credential used to access services. Two types are needed:
Ticket‑Granting Ticket (TGT): Obtained by the user with a principal and password or keytab; cached on the client.
Service Ticket: Issued automatically based on the cached TGT to access specific services.
Principal : The identity (similar to a username) that receives a TGT.
Password and Keytab : Authentication can use a password or a keytab file containing the principal’s secret.
Kerberos authentication flow: the client requests a ticket from the Authentication Server (AS), receives an encrypted ticket containing a session key, and presents this ticket to the service for verification and encrypted communication.
TDH Security Mechanisms
TDH provides three independent authentication layers:
Operating‑system level authentication on each server for user and service access.
KRB5LDAP system for service‑to‑service and user‑to‑service authentication.
Transwarp Manager authentication for manager access.
These layers are isolated, requiring administrators to maintain separate credentials for each, which is cumbersome.
Guardian Service
Guardian unifies the three authentication mechanisms, allowing a single username/password to access the OS, Transwarp Manager, and cluster services. It synchronizes credential information, simplifying management. While legacy command‑line tools for KRB5LDAP and Kerberos remain supported, using Guardian is recommended for unified control.
KRB5LDAP Overview
KRB5LDAP manages inter‑service and user‑to‑service access, providing Kerberos authentication for most services and LDAP authentication for Inceptor. Deployment is optional but essential in multi‑tenant scenarios for resource isolation.
Authorization in TDH
Four services enforce fine‑grained permissions:
HDFS: rwx permissions on files and directories.
Hyperbase: RWCXA (Read, Write, Create, Exec, Admin) at global, namespace, table, column family, and column qualifier levels.
Inceptor: SQL‑based SELECT/CREATE/DELETE/UPDATE/INSERT permissions at global, database, table, and row levels.
Kafka: RWCD (Read, Write, Create, Delete) permissions at global or topic level.
Transwarp Security Manual
The manual, available on Transpedia, contains three parts: an administrator guide for installing and configuring security components, a user guide for managing personal credentials in a secure cluster, and a component guide detailing authorization for each TDH service.
Conclusion and Outlook
The article outlines Transwarp’s approach to securing Hadoop clusters through Kerberos, KRB5LDAP, and Guardian. While Hadoop and Spark ecosystems still face fragmented permission models, Transwarp continues to improve its platform to offer a secure, efficient, and stable big‑data foundation for enterprises.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRing Big Data Open Lab
Focused on big data technology research, exploring the Big Data era | [email protected]
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
