Big Data 30 min read

Building a Big Data Security Center with Apache Ranger: Practices and Technical Insights from NetEase

This article presents NetEase's practical experience of constructing a big‑data security center using Apache Ranger, covering Ranger's core features, a comprehensive security solution, detailed technical analyses, and the outcomes of commercializing the platform across multiple enterprise environments.

DataFunTalk
DataFunTalk
DataFunTalk
Building a Big Data Security Center with Apache Ranger: Practices and Technical Insights from NetEase

The presentation introduces NetEase's practice of building a big‑data security center based on Apache Ranger, outlining the agenda: Ranger overview, overall solution, key technology analysis, and results.

1. Apache Ranger Overview Apache Ranger is a Hadoop‑ecosystem security solution that centralizes policy management via Ranger Admin and plugins for components such as Hive, Spark, and Presto. Its advantages include multi‑component authentication integration, high‑performance policy caching, flexible ABAC/RBAC models, and becoming a de‑facto standard after CDP 7.0.

2. Overall Evaluation Five essential criteria for a security platform are discussed: security, precision (least‑privilege), manageability, efficiency, and performance. Ranger excels in security and performance but shows weaknesses in precision, manageability, and efficiency due to policy‑centric design and limited search dimensions.

3. Solution Architecture NetEase's security center consists of three layers: a Hadoop cluster layer, a service layer (including Ranger, plugins, and the security center), and a business layer (authorization approval, whitelist, freeze directory, etc.). Core design principles are low coupling, low intrusion, traffic sharding, and consistency guarantees.

4. Key Technical Analyses

Unified permission search model built on Ranger change‑log synchronization, providing multi‑dimensional query capabilities.

Authentication optimization and multi‑cluster support via memory‑to‑network conversion and vertical Ranger splitting.

Authentication request payload optimization to reduce large in‑memory messages.

Freeze directory mechanism to protect critical HDFS paths from accidental deletion.

Dynamic data masking with generalized rules and whitelist handling to simplify large‑scale column‑level masking.

Audit and governance enhancements by streaming policy hit information to Kafka and correlating with data‑warehouse dimensions.

Ownership handling by augmenting metadata‑center owner information and introducing recursive owner semantics.

Spark permission and masking integration through execution‑plan modifications, ensuring consistent row‑level security and masking.

Community‑level Ranger improvements addressing finalize‑based memory release and concurrency bugs in policy versioning.

Commercialization considerations for multi‑environment deployments, compatibility with various Hadoop distributions, and agile version management.

5. Outcomes The solution has been deployed internally at NetEase Cloud Music and NetEase Yanxuan and commercialized for external customers, demonstrating stable performance, high usability, and effective big‑data security management.

access controldata platformBig Data SecurityApache RangerPolicy Management
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.