Big Data 15 min read

Gravitino Powers TBDS Product Architecture Upgrade with a Unified Metadata Lake

This article explains how Tencent Cloud's TBDS platform evolves its architecture by adopting Apache Gravitino as a unified metadata lake, detailing the challenges of legacy versus new lakehouse designs, storage and compute separation, unified data access, permission management, and the resulting benefits for big‑data and AI workloads.

DataFunSummit

Dec 2, 2024

Gravitino Powers TBDS Product Architecture Upgrade with a Unified Metadata Lake

TBDS (Tencent Big Data Suite) is an enterprise‑grade, one‑stop big‑data platform that originally supported two product forms: a traditional Hadoop‑based stack and a newer lakehouse architecture. The platform unifies resources, data, user permissions, and operations to enable smooth migration between these forms.

The next‑generation lakehouse architecture introduces a storage‑compute separation model with support for HDFS, S3‑compatible object storage, and compute engines such as Spark, Flink, Trino, and StarRocks. This design improves resource elasticity, fault isolation, and cost efficiency.

During the transition from the monolithic to the separated architecture, TBDS faces challenges at the architecture, data, and resource layers, including fixed resource allocation, data silos, and low utilization of pooled resources.

To address these challenges, TBDS adopts Apache Gravitino as a unified metadata lake. Gravitino categorizes metadata into four groups—Hive‑style catalogs, relational databases, streaming sources, and AI model metadata—providing a single source of truth for both data and AI workloads.

Gravitino’s core architecture defines a standard metadata model and APIs, exposing catalog.db.table identifiers for tables, filesets, and streams. It offers REST and Iceberg APIs, enabling engines like Spark, Flink, Trino, PyTorch, and TensorFlow to access data uniformly.

The system also supports two metadata service designs: a direct‑connect mode for real‑time access and a managed mode for governance and migration, allowing TBDS to run both legacy and new workloads side‑by‑side.

Beyond unified access, Gravitino provides a common permission model across heterogeneous data sources, simplifying authorization and reducing security complexity.

By integrating Gravitino, TBDS can build a comprehensive metadata lake that manages tables, files, streams, and model assets, supports various connectors (JDBC, Iceberg, Hudi, Paimon), and enables seamless data discovery, lineage, and pipeline orchestration.

The unified metadata layer drives data intelligence: it connects data ingestion, storage, transformation, and downstream analytics or AI applications, turning metadata into the “brain” of the data ecosystem.

Gravitino has attracted a broad community of users and contributors, including Tencent, Xiaomi, Bilibili, and international companies like Pinterest and Yahoo, encouraging further open‑source development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

big data Lakehouse Gravitino metadata lake TBDS Unified Metadata

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.