Big Data 20 min read

Construction and Application of Tencent Oula Data Lineage System

This article details the design, architecture, modular implementation, and practical use cases of Tencent Oula's data lineage platform, covering background goals, system components, graph‑based algorithms, SQL parsing, cost‑allocation insights, and a Q&A session for data engineers and analysts.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Construction and Application of Tencent Oula Data Lineage System

The article introduces Tencent Oula's data lineage platform, outlining its background, objectives, and the three sub‑products—Asset Factory, Governance Engine, and Data Discovery—that the platform supports.

It explains why data lineage is needed, describing current limitations such as insufficient coverage, coarse granularity, and lack of advanced graph models, and how the new lineage system addresses these issues.

The construction focuses on expanding both the breadth (covering production, processing, and application across 20+ products) and depth (task, table, field, and value lineage) of lineage data, with a special emphasis on AST‑level value lineage.

The project architecture is presented, showing the selection of internal components (EasyGraph, ES, Meepo) over open‑source options like Apache Atlas, and describing the data flow from UniMeta metadata ingestion, ETL processing, SQL parsing (Calcite + Antlr), graph computation with Spark GraphX, and storage in Redis for fast queries.

Modular construction details include a unified UID scheme for entities, atomized edge creation for both data‑flow and correspondence relationships, and the use of GraphX for pre‑computing lineage paths.

Application scenarios are categorized into data governance, lineage queries, data‑warehouse development, baseline monitoring, and full‑link cost insight, illustrating how lineage supports governance, optimization, and cost allocation.

A Q&A section addresses practical concerns such as building lineage from tasks to fields, accuracy evaluation, JDBC integration, user roles, Spark DataFrame support, underlying data structures and algorithms, and plans for external exposure of the platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQL parsingData Governancegraph algorithmscost allocationTencent Oula
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.