Tag

Apache Atlas

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Feb 26, 2023 · Big Data

Design, Optimization, and Use Cases of Data Lineage in ByteDance's DataLeap Platform

This article presents an in‑depth overview of DataLeap's data lineage capabilities, covering the challenges, multi‑layer model design, implementation with Apache Atlas and JanusGraph, performance optimizations, diverse use cases across asset, development, governance and security domains, and future trends for lineage technology.

Apache AtlasBig Datadata governance
0 likes · 19 min read
Design, Optimization, and Use Cases of Data Lineage in ByteDance's DataLeap Platform
ByteDance Data Platform
ByteDance Data Platform
Jun 8, 2022 · Backend Development

How ByteDance Optimized Data Catalog Performance with Apache Atlas and JanusGraph

This article details ByteDance's 2021 overhaul of its Data Catalog system, the performance regressions encountered after switching to Apache Atlas, and the step‑by‑step backend optimizations—including JanusGraph tuning, Gremlin query refactoring, parallel processing, and write‑path improvements—that reduced latency from minutes to seconds.

Apache AtlasBackendJanusGraph
0 likes · 12 min read
How ByteDance Optimized Data Catalog Performance with Apache Atlas and JanusGraph
Fangduoduo Tech
Fangduoduo Tech
Feb 8, 2021 · Big Data

Why Build Your Own Data Lineage Engine? Lessons from Apache Atlas to Duo-Lineage

This article explains what data lineage is, why it is essential for data governance in large‑scale big‑data platforms, compares Apache Atlas with a custom solution, and details the technical choices, architecture, and performance optimizations behind the self‑built duo‑lineage system.

Apache AtlasBig DataSQL parsing
0 likes · 14 min read
Why Build Your Own Data Lineage Engine? Lessons from Apache Atlas to Duo-Lineage