Mastering Apache Ranger: Secure Hadoop Data Access with Real‑World Examples

This guide explains Apache Ranger’s role as a centralized security framework for Hadoop, detailing its core features, architecture, workflow, policy creation, auditing, field‑level masking, row‑level filtering, and how to automate policy management via its REST API and Java code.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering Apache Ranger: Secure Hadoop Data Access with Real‑World Examples

Ranger, meaning “park ranger,” serves as the administrator for the Hadoop ecosystem, providing a centralized security management framework that enables fine‑grained data access control for components such as HDFS, Hive, HBase and Yarn.

Key functions of Apache Ranger

Centralized management of all security tasks through a web UI or REST API.

Fine‑grained control over Hadoop component operations.

Standardized authorization mechanisms.

Support for role‑based and attribute‑based access control.

Comprehensive auditing of user access and administrative actions.

The latest version is 2.1.0, with 1.2.0 being widely used.

Ranger Architecture

Ranger consists of three main components:

Ranger Admin : Core module with a web UI and REST API for defining security policies.

Agent Plugin : Embedded in Hadoop components, it pulls policies from Ranger Admin, enforces them, and records audit logs.

User Sync : Synchronizes OS users/groups to Ranger’s database.

Ranger Workflow

Users interact with Ranger via the Ranger Admin UI. After a policy is created and saved, the Agent Plugin fetches the policy (default every 30 seconds) and caches it locally. When a user requests data from a Hadoop component, the Agent Plugin performs authentication, returns the result to the component, and enforces the policy. Changes in the Admin UI are propagated to the plugins on the next fetch.

For Hive, two interfaces are provided for custom authorization:

org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizerFactory
org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizer

HiveAuthorizer runs a PolicyRefresher thread that periodically pulls policies from Ranger Admin, writes them to a temporary JSON file, and uses the cached policies for authorization.

Practical Ranger Operations

Ranger’s UI allows administrators to define users, roles, and permissions, providing a more user‑friendly alternative to traditional Unix/Linux permission models. Temporary policies can be created for short‑term authorizations and removed afterward.

Example: Creating an HDFS policy that grants users admin and test0822-2 access to specific directories such as /user, /user/rangerpath/, /user/rangerpath/data, and /user/rangerpath/data/allday. The policy does not enable recursion, so only the listed directories are accessible.

The Audit tab displays login records, policy evaluation logs, and Agent Plugin status.

Ranger also supports fine‑grained Hive policies, including table‑level, column‑level, field‑level encryption, and row‑level filtering. For example, a masking policy hides the lname column of the customer table for user damp, showing hashed values instead.

A row‑level filter can prevent damp from seeing rows where fname='Sheri'.

Note: Users must log into Ranger Admin with the OS account that corresponds to the Hive administrator; otherwise, policy configuration will fail.

Batch Operations on Ranger Policies

Manually adding policies is error‑prone and inefficient. A Java utility can batch‑create, update, delete, and query policies via the Ranger REST API, dramatically improving operational efficiency.

public ApiResult execRangerApi(String url, String method, String requestBody) {<br/>    HadoopConfig.Ranger ranger = this.hadoop.getRanger();<br/>    String baseUrl = ranger.getApiBaseUrl();<br/>    String user = ranger.getUser();<br/>    String password = ranger.getPassword();<br/>    String fullUrl = baseUrl + url;<br/>    String auth = user + ":" + password;<br/>    String authInfo = DatatypeConverter.printBase64Binary(auth.getBytes());<br/>    HttpRequest request = null;<br/>    if (method.equalsIgnoreCase("GET")) {<br/>        request = HttpRequest.get(fullUrl);<br/>    } else if (method.equalsIgnoreCase("POST")) {<br/>        request = HttpRequest.post(fullUrl);<br/>    } else if (method.equalsIgnoreCase("PUT")) {<br/>        request = HttpRequest.put(fullUrl);<br/>    } else if (method.equalsIgnoreCase("DELETE")) {<br/>        request = HttpRequest.delete(fullUrl);<br/>    }<br/>    ((HttpRequest)((HttpRequest)((HttpRequest)request.header("Authorization", "Basic " + authInfo))<br/>        .header("Accept", "application/json"))<br/>        .header("Content-Type", "application/json"))<br/>        .header("X-XSRF-HEADER", "valid");<br/>    if (requestBody != null && !requestBody.isEmpty())<br/>        request.body(requestBody);<br/>    HttpResponse response = request.execute();<br/>    ApiResult result = new ApiResult(this);<br/>    result.setHttpCode(response.getStatus());<br/>    result.setBodyRaw(response.body());<br/>    return result;<br/>}
public void savePolicy(String policyName, List<String> paths, boolean isPathAdd, String appUser, List<PolicyAccess> accesses, boolean isReclusive) {<br/>    ApiResult result = null;<br/>    Policy policy = getPolicyByName(policyName);<br/>    ...<br/>    Gson gson = new Gson();<br/>    if (isNewPolicy) {<br/>        logger.info("create policy, content:" + gson.toJson(policy));<br/>        result = execRangerApi("/public/v2/api/policy/", "POST", gson.toJson(policy));<br/>        if (result.getHttpCode() != 200)<br/>            throw new DMCException(String.format("create policy failed! ranger return : %d, %s", result.getHttpCode(), result.getBodyRaw()));<br/>        logger.info("create policy ok! " + policyName);<br/>    } else {<br/>        logger.info("edit policy, content:" + gson.toJson(policy));<br/>        result = execRangerApi("/public/v2/api/policy/" + policy.getId(), "PUT", gson.toJson(policy));<br/>        if (result.getHttpCode() != 200)<br/>            throw new DMCException(String.format("edit policy failed! ranger return : %d, %s", result.getHttpCode(), result.getBodyRaw()));<br/>        logger.info("edit policy ok! " + policyName);<br/>    }<br/>}

Policies can also be applied in bulk using a simple curl command:

curl -H "Content-Type:application/json" -H "X-Token:token-name" -X POST "http://web-url&appUser=user-name" -d "[\"ranger-policy\"]"

In summary, Apache Ranger offers a rich set of security features for Hadoop components, enabling administrators to define, audit, and automate fine‑grained access policies, making it an essential tool for secure big‑data environments.

Reference: Apache Ranger official site ; ZTE Ranger training materials.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datainformation securityApache Rangerpolicy managementData access controlHadoop securityRanger API
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.