How Hudi MetaServer Transforms Metadata Management and Performance in Data Lakes
This article examines the challenges of Hudi metadata stored on HDFS, introduces the independently developed Hudi MetaServer for centralized metadata, visual management, unified permission control, TTL, expression payloads, and multi‑active scaling, and outlines future enhancements such as LLS, multi‑table fusion, and JDBC support.
Background
In large‑scale data lake environments, Hudi’s original metadata was stored as many small files on HDFS, causing high latency for metadata lookup and heavy load on the NameNode. As data volume and variety grew, engineers needed a more efficient way to manage metadata and ensure high‑performance write and query operations.
Introducing Hudi MetaServer
Unisound Technology (亚信科技) developed an independent metadata service called Hudi MetaServer. It centralizes metadata storage, provides a visual management console, and offers unified permission control. The service reduces HDFS pressure, shortens client initialization time, and simplifies metadata inspection.
Key Features
Unified Permission Management : Centralized metadata enables fine‑grained access control, integrates with Apache Ranger, and enforces consistent policies across Flink, Spark, Hive, and Trino.
Visual Service Platform : An intuitive UI lets users monitor and manage metadata, improving troubleshooting efficiency.
Expression Payload : Aviator‑based expression payloads allow custom data‑processing logic at table creation and during ingestion, boosting flexibility and performance.
Multi‑Active Horizontal Scaling : Architecture supports active‑active deployment and a dynamic optimal‑allocation strategy to maintain stability under heavy load.
Cross‑Platform Integration : Seamless catalog integration with Flink, Spark, Hive, Trino, and other engines.
TTL (Time‑To‑Live) : Partition‑level and record‑level TTL automatically purge stale data, reducing storage cost and improving query efficiency.
Implementation Details
MetaServer adds Table API, Meta API, and Permission API to the Hudi codebase, with both server and client implementations. For permission checks, MetaServerAuthorizer invokes checkTablePermission() and delegates to either MetaServerDefaultAuthorizer or MetaServerRangerAuthorizer. Engine‑side clients such as HoodieTablePermissionMSClient perform RPC calls to the TableService, which executes the actual checks.
Future Roadmap
Planned extensions include Long‑Live Service (LLS) for always‑on table services, multi‑table and multi‑index fusion, and JDBC protocol support, further enhancing usability and performance of lake‑house workloads.
AsiaInfo Technology: New Tech Exploration
AsiaInfo's cutting‑edge ICT viewpoints and industry insights, featuring its latest technology and product case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
