Big Data 13 min read

Practical Guide to Apache Sentry and Kerberos Integration for Hadoop Access Control

This article explains the principles, architecture, features, and step‑by‑step deployment of Apache Sentry with Kerberos to provide role‑based access control across Hadoop components such as Hive, Impala, and HDFS, including command‑line examples and visual diagrams.

HaoDF Tech Team

Aug 19, 2020

Practical Guide to Apache Sentry and Kerberos Integration for Hadoop Access Control

Continuing from the previous article on Kerberos fundamentals, this piece introduces Apache Sentry, an open‑source Apache project that provides fine‑grained authorization for Hadoop components and works together with Kerberos for authentication.

1. What Is Apache Sentry?

Apache Sentry offers permission control for big‑data components. After Kerberos verifies identities, Sentry manages which permissions a user or group has on various resources.

Sentry, originally released by Cloudera and graduated to a top‑level Apache project in March 2016, is a role‑based authorization module that integrates with Hive, Impala, Solr, HDFS, and HBase, providing a pluggable engine for defining and enforcing access rules.

2. Key Features

a) User and Group Mapping – Sentry relies on underlying authentication systems (Kerberos or LDAP) to identify users and uses Hadoop’s group‑mapping mechanism to see the same groups as other components.

Example: Users A and B belong to the finance group; a role Analyst is created with SELECT privilege on the Sales table, and the role is granted to the finance group.

b) Role‑Based Access Control (RBAC) – RBAC simplifies management of large numbers of users and data objects. Adding a new employee to the finance group automatically grants them access to the Sales table.

c) Unified Authorization – Once defined, access‑control rules apply across multiple data‑access tools.

3. Sentry Architecture

The architecture consists of four main components:

Sentry Server : RPC server that stores authorization metadata (in a MySQL Sentry database) and provides secure APIs.

Policy Metadata : Stores permission policies.

Data Engine : Components such as Hive or Impala that request authorization via Sentry plugins.

Sentry Plugin : Runs inside each data engine, fetching metadata from the Sentry Server to evaluate access requests.

4. Integration with the Hadoop Ecosystem

Sentry works with multiple Hadoop components. The Sentry Server stores authorization metadata and exposes APIs for secure retrieval and modification.

a) Hive and Sentry

Query Authorization : Sentry’s policy engine inserts a hook into HiveServer2; after query compilation, the hook extracts the accessed objects and converts them into SQL‑level permission requests.

Policy Operations : During compilation, Hive calls Sentry’s authorization task factory, which sends RPC requests to the Sentry Server to modify policies.

b) Impala and Sentry

Impala follows a similar flow but caches Sentry metadata in its Catalog service, allowing faster local authorization.

c) Sentry‑HDFS

Sentry extends its authorization checks to Hive‑stored data accessed via HDFS, Pig, MapReduce, or Spark, mapping Sentry privileges to HDFS ACLs (e.g., SELECT → read, INSERT → write, ALL → read + write).

5. Practical Usage of Sentry with Kerberos

a) Integration Background

Hadoop super‑users (hive, hdfs, hbase) and Kerberos principals (e.g., hive/_HOST) must exist both in Linux and in the Kerberos KDC for Sentry to recognize them.

b) Sentry with Hive

Navigate to the HiveServer2 process directory, obtain hive.keytab, and use kinit to acquire a TGT. Then, using Beeline, create roles and grant privileges.

cd /var/run/cloudera-scm-agent/process/ && cd `ls -t1 | grep -e "-HIVESERVER2"`
kinit -kt hive.keytab hive/cdh
# Create a read‑only user
useradd userread
kadmin.local > addprinc userread (password: userread)
# Generate keytab
kadmin.local > xst -k dev.keytab -norandkey [email protected]
# Login to Hive via Beeline
beeline > !connect jdbc:hive2://localhost:10000/;principal=hive/cdh1
beeline > create role readrole;
beeline > grant select on server server1 to role readrole;
beeline > grant role readrole to group userread;
# Grant specific privileges
grant select/insert/create on server/database/table servername/databasename/tablename to role somerole;

Because Impala belongs to the Hive group, granting privileges to userread in Hive also applies to Impala.

c) Sentry with HDFS

Using the HUE UI, log in as the HDFS super‑user, navigate to the security/authorization page, and assign read/write permissions on the /Data directory to the dev user. After saving, a shield icon appears next to the directory, indicating successful permission assignment.

6. Conclusion

The article has covered the theory, deployment, and practical usage of Kerberos and Apache Sentry within a production big‑data cluster. By integrating these security layers, the data assets of the organization become more reliable and protected.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Access Control Hadoop big data security Apache Sentry Kerberos

Written by

HaoDF Tech Team

HaoDF Online tech practice and sharing—join us to discuss and help create quality healthcare through technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.