Tencent Big Data Processing Suite and Gravitino: Unified Metadata and Permission Management
This article introduces Tencent's Big Data Processing Suite (TBDS) and the open‑source Gravitino project, explaining how they provide a unified metadata service and a comprehensive, extensible permission model to address data and permission islands across heterogeneous Hadoop and MPP ecosystems.
TBDS (Tencent Big Data Processing Suite) is a data platform built on Hadoop and MPP ecosystems that supports batch‑stream processing, cloud‑native data lakes, lake‑warehouse integration, and domestic data middle‑platform scenarios.
The platform serves a wide range of customers—from finance to industry, media, and government—each with diverse data scales and requirements, leading to the challenge of data islands.
To break data islands, a unified metadata service based on Hive Metastore is employed, allowing both Hadoop and MPP engines to share metadata and supporting modern table formats such as Iceberg.
However, Hive Metastore lacks governance capabilities and a flexible metadata model, and existing permission solutions like Ranger are service‑oriented rather than data‑oriented, leaving a gap for unified access control.
Gravitino, an Apache‑licensed open‑source project, offers a unified metadata service and an open, extensible permission framework that works across public cloud, private cloud, and on‑premise environments, providing SDKs for various compute engines.
The Gravitino permission model follows RBAC, introducing concepts of Metalake (organization), Role (permission set), and User, with many‑to‑many relationships for flexible binding.
Gravitino’s architecture includes built‑in authentication, a RESTful API, and four types of permission plugins: native catalog plugins for Hadoop, Ranger catalog plugins, JDBC catalog plugins for MPP/databases, and cloud catalog plugins for IAM services.
Authentication supports OAuth, Kerberos, and IAM, while three roles—Service Admin, Metalake Admin, and regular User—manage Metalake creation, role assignment, and resource creation (catalogs, schemas, tables) respectively.
The community, launched in December 2023, has grown to over 60 contributors, indicating active development and adoption.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.