Big Data 12 min read

Tencent Big Data Processing Suite and Gravitino: Unified Metadata and Permission Management

This article introduces Tencent's Big Data Processing Suite (TBDS) and the open‑source Gravitino project, explaining how they provide a unified metadata service and a comprehensive, extensible permission model to address data and permission islands across heterogeneous Hadoop and MPP ecosystems.

DataFunSummit

Jul 31, 2024

Tencent Big Data Processing Suite and Gravitino: Unified Metadata and Permission Management

TBDS (Tencent Big Data Processing Suite) is a data platform built on Hadoop and MPP ecosystems that supports batch‑stream processing, cloud‑native data lakes, lake‑warehouse integration, and domestic data middle‑platform scenarios.

The platform serves a wide range of customers—from finance to industry, media, and government—each with diverse data scales and requirements, leading to the challenge of data islands.

To break data islands, a unified metadata service based on Hive Metastore is employed, allowing both Hadoop and MPP engines to share metadata and supporting modern table formats such as Iceberg.

However, Hive Metastore lacks governance capabilities and a flexible metadata model, and existing permission solutions like Ranger are service‑oriented rather than data‑oriented, leaving a gap for unified access control.

Gravitino, an Apache‑licensed open‑source project, offers a unified metadata service and an open, extensible permission framework that works across public cloud, private cloud, and on‑premise environments, providing SDKs for various compute engines.

The Gravitino permission model follows RBAC, introducing concepts of Metalake (organization), Role (permission set), and User, with many‑to‑many relationships for flexible binding.

Gravitino’s architecture includes built‑in authentication, a RESTful API, and four types of permission plugins: native catalog plugins for Hadoop, Ranger catalog plugins, JDBC catalog plugins for MPP/databases, and cloud catalog plugins for IAM services.

Authentication supports OAuth, Kerberos, and IAM, while three roles—Service Admin, Metalake Admin, and regular User—manage Metalake creation, role assignment, and resource creation (catalogs, schemas, tables) respectively.

The community, launched in December 2023, has grown to over 60 contributors, indicating active development and adoption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Lake Tencent metadata management Gravitino Unified Access Control

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.