Mastering Apache Ranger: Secure Hadoop Data Access with Real‑World Examples
This guide explains Apache Ranger’s role as a centralized security framework for Hadoop, detailing its core features, architecture, workflow, policy creation, auditing, field‑level masking, row‑level filtering, and how to automate policy management via its REST API and Java code.
Ranger, meaning “park ranger,” serves as the administrator for the Hadoop ecosystem, providing a centralized security management framework that enables fine‑grained data access control for components such as HDFS, Hive, HBase and Yarn.
Key functions of Apache Ranger
Centralized management of all security tasks through a web UI or REST API.
Fine‑grained control over Hadoop component operations.
Standardized authorization mechanisms.
Support for role‑based and attribute‑based access control.
Comprehensive auditing of user access and administrative actions.
The latest version is 2.1.0, with 1.2.0 being widely used.
Ranger Architecture
Ranger consists of three main components:
Ranger Admin : Core module with a web UI and REST API for defining security policies.
Agent Plugin : Embedded in Hadoop components, it pulls policies from Ranger Admin, enforces them, and records audit logs.
User Sync : Synchronizes OS users/groups to Ranger’s database.
Ranger Workflow
Users interact with Ranger via the Ranger Admin UI. After a policy is created and saved, the Agent Plugin fetches the policy (default every 30 seconds) and caches it locally. When a user requests data from a Hadoop component, the Agent Plugin performs authentication, returns the result to the component, and enforces the policy. Changes in the Admin UI are propagated to the plugins on the next fetch.
For Hive, two interfaces are provided for custom authorization:
org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizerFactory org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizerHiveAuthorizer runs a PolicyRefresher thread that periodically pulls policies from Ranger Admin, writes them to a temporary JSON file, and uses the cached policies for authorization.
Practical Ranger Operations
Ranger’s UI allows administrators to define users, roles, and permissions, providing a more user‑friendly alternative to traditional Unix/Linux permission models. Temporary policies can be created for short‑term authorizations and removed afterward.
Example: Creating an HDFS policy that grants users admin and test0822-2 access to specific directories such as /user, /user/rangerpath/, /user/rangerpath/data, and /user/rangerpath/data/allday. The policy does not enable recursion, so only the listed directories are accessible.
The Audit tab displays login records, policy evaluation logs, and Agent Plugin status.
Ranger also supports fine‑grained Hive policies, including table‑level, column‑level, field‑level encryption, and row‑level filtering. For example, a masking policy hides the lname column of the customer table for user damp, showing hashed values instead.
A row‑level filter can prevent damp from seeing rows where fname='Sheri'.
Note: Users must log into Ranger Admin with the OS account that corresponds to the Hive administrator; otherwise, policy configuration will fail.
Batch Operations on Ranger Policies
Manually adding policies is error‑prone and inefficient. A Java utility can batch‑create, update, delete, and query policies via the Ranger REST API, dramatically improving operational efficiency.
public ApiResult execRangerApi(String url, String method, String requestBody) {<br/> HadoopConfig.Ranger ranger = this.hadoop.getRanger();<br/> String baseUrl = ranger.getApiBaseUrl();<br/> String user = ranger.getUser();<br/> String password = ranger.getPassword();<br/> String fullUrl = baseUrl + url;<br/> String auth = user + ":" + password;<br/> String authInfo = DatatypeConverter.printBase64Binary(auth.getBytes());<br/> HttpRequest request = null;<br/> if (method.equalsIgnoreCase("GET")) {<br/> request = HttpRequest.get(fullUrl);<br/> } else if (method.equalsIgnoreCase("POST")) {<br/> request = HttpRequest.post(fullUrl);<br/> } else if (method.equalsIgnoreCase("PUT")) {<br/> request = HttpRequest.put(fullUrl);<br/> } else if (method.equalsIgnoreCase("DELETE")) {<br/> request = HttpRequest.delete(fullUrl);<br/> }<br/> ((HttpRequest)((HttpRequest)((HttpRequest)request.header("Authorization", "Basic " + authInfo))<br/> .header("Accept", "application/json"))<br/> .header("Content-Type", "application/json"))<br/> .header("X-XSRF-HEADER", "valid");<br/> if (requestBody != null && !requestBody.isEmpty())<br/> request.body(requestBody);<br/> HttpResponse response = request.execute();<br/> ApiResult result = new ApiResult(this);<br/> result.setHttpCode(response.getStatus());<br/> result.setBodyRaw(response.body());<br/> return result;<br/>} public void savePolicy(String policyName, List<String> paths, boolean isPathAdd, String appUser, List<PolicyAccess> accesses, boolean isReclusive) {<br/> ApiResult result = null;<br/> Policy policy = getPolicyByName(policyName);<br/> ...<br/> Gson gson = new Gson();<br/> if (isNewPolicy) {<br/> logger.info("create policy, content:" + gson.toJson(policy));<br/> result = execRangerApi("/public/v2/api/policy/", "POST", gson.toJson(policy));<br/> if (result.getHttpCode() != 200)<br/> throw new DMCException(String.format("create policy failed! ranger return : %d, %s", result.getHttpCode(), result.getBodyRaw()));<br/> logger.info("create policy ok! " + policyName);<br/> } else {<br/> logger.info("edit policy, content:" + gson.toJson(policy));<br/> result = execRangerApi("/public/v2/api/policy/" + policy.getId(), "PUT", gson.toJson(policy));<br/> if (result.getHttpCode() != 200)<br/> throw new DMCException(String.format("edit policy failed! ranger return : %d, %s", result.getHttpCode(), result.getBodyRaw()));<br/> logger.info("edit policy ok! " + policyName);<br/> }<br/>}Policies can also be applied in bulk using a simple curl command:
curl -H "Content-Type:application/json" -H "X-Token:token-name" -X POST "http://web-url&appUser=user-name" -d "[\"ranger-policy\"]"In summary, Apache Ranger offers a rich set of security features for Hadoop components, enabling administrators to define, audit, and automate fine‑grained access policies, making it an essential tool for secure big‑data environments.
Reference: Apache Ranger official site ; ZTE Ranger training materials.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
