Big Data 12 min read

Exploring Big Data Cluster Security: Evaluation of Kerberos, Apache Sentry, and Apache Ranger

The article evaluates Kerberos, Apache Sentry, and Apache Ranger for securing Meitu’s large‑scale Hadoop ecosystem, highlighting Ranger’s comprehensive, fine‑grained, policy‑based authorization across HDFS, HBase, Hive, YARN, Storm, and Kafka, and describing its configuration, LDAP integration, and custom SDK implementation.

Meitu Technology
Meitu Technology
Meitu Technology
Exploring Big Data Cluster Security: Evaluation of Kerberos, Apache Sentry, and Apache Ranger

The article presents an initial exploration of security for a large‑scale big‑data cluster at Meitu, focusing on the need for fine‑grained access control across core Hadoop components such as HDFS, HBase, Hive, YARN, Storm, and Kafka.

Key requirements include multi‑component support, column‑level or path‑level permission granularity, and the use of open‑source, community‑driven solutions that require minimal architectural changes.

Background: Big Data Security Components

Three widely used security solutions are compared:

Kerberos – a symmetric‑key based authentication protocol that provides SSO for Hadoop services.

Apache Sentry – Cloudera’s role‑based, fine‑grained authorization framework (limited component coverage).

Apache Ranger – Hortonworks’ policy‑based authorization system with extensive component support and audit capabilities.

Kerberos

Kerberos authenticates clients via a Ticket Granting Ticket (TGT) and service tickets, preventing impersonation of DataNode, RegionServer, etc. Its drawbacks are temporary tickets requiring re‑authentication and lack of fine‑grained resource control without additional LDAP integration.

Apache Sentry

Sentry offers fine‑grained HDFS metadata control and column‑level Hive permissions, simplifies management through role‑based policies, provides a unified UI, and integrates with Kerberos. However, it does not support many components (e.g., HBase, YARN, Kafka, Storm).

Apache Ranger

Ranger provides fine‑grained, policy‑driven access control, supports a wide range of components (HDFS, HBase, Hive, YARN, Kafka, Storm), integrates with Kerberos, and offers REST APIs for custom development. Its permission model defines user‑resource‑permission triples using AllowACL/DenyACL rules.

Permission Model

Users and groups are represented as principals. Resources differ per component (e.g., HDFS FilePath, HBase Table/Column‑family/Column, Hive Database/Table/Column, YARN Queue). Permissions are expressed as Allow/ Deny ACLs, with component‑specific actions such as Read/Write/Execute for HDFS, Create/Admin for HBase, Select/Create/Update/Drop/Alter for Hive, and submit‑app/admin‑queue for YARN.

Implementation Details

Configuration changes for each component are shown below.

HDFS – modify hdfs-site.xml to enable permissions and set the authorizer class:

<property> <name>dfs.permissions.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>true</value> </property> <property> <name>dfs.namenode.inode.attributes.provider.class</name> <value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value> </property>

HBase – update hbase-site.xml to enable security and specify Ranger coprocessors:

<property> <name>hbase.security.authorization</name> <value>true</value> </property> <property> <name>hbase.coprocessor.master.classes</name> <value>org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value> </property> <property> <name>hbase.coprocessor.region.classes</name> <value>org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value> </property>

Hive – set hiveserver2-site.xml to enable Ranger as the authorizer:

<property> <name>hive.security.authorization.enabled</name> <value>true</value> </property> <property> <name>hive.security.authorization.manager</name> <value>org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory</value> </property>

YARN – adjust yarn-site.xml to enable ACLs and specify Ranger’s authorizer:

<property> <name>yarn.acl.enable</name> <value>true</value> </property> <property> <name>yarn.authorization-provider</name> <value>org.apache.ranger.authorization.yarn.authorizer.RangerYarnAuthorizer</value> </property>

These configuration snippets are loaded at runtime, with Ranger pulling policies via REST APIs, caching them locally, and applying them to each service within roughly 30 seconds.

Meitu’s Practice

Based on the evaluation, Meitu selected Apache Ranger because it supports most of their stack, provides audit logs for troubleshooting, and offers an independent user model that simplifies integration with other systems. They implemented a lightweight migration strategy, added LDAP synchronization to apply group policies to individual users, built a custom SDK for Ranger API calls, and customized Ranger to align with internal standards.

The first‑phase work includes:

Extending Ranger‑Admin to map group policies to users.

Developing an SDK for easy invocation of Ranger services from the Meitu data platform.

Tailoring Ranger’s codebase to meet Meitu’s operational and maintenance requirements.

Future articles will detail the second and third phases of this security rollout.

access controlHadoopBig Data SecurityApache RangerApache SentryKerberos
Meitu Technology
Written by

Meitu Technology

Curating Meitu's technical expertise, valuable case studies, and innovation insights. We deliver quality technical content to foster knowledge sharing between Meitu's tech team and outstanding developers worldwide.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.