Tag

Metadata Management

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Jun 6, 2025 · Big Data

How Unicom Digital’s Integrated Data Platform Revolutionizes Metadata Management

This article details Unicom Digital’s metadata management practice on its integrated data platform, covering the strategic background of data, key challenges, award-winning capabilities, three-pronged solutions—automation, linking+, and AI—along with practical implementations, full‑chain lineage, data responsibility, lifecycle management, and future AI‑driven enhancements.

AIAutomationBig Data
0 likes · 18 min read
How Unicom Digital’s Integrated Data Platform Revolutionizes Metadata Management
DataFunSummit
DataFunSummit
Apr 13, 2025 · Big Data

Data Governance at Didi: Interview with Liu Chao on Big Data Asset Management

In this interview, Didi data governance lead Liu Chao discusses his career journey, the unique technical architecture of Didi’s big‑data governance system, cost‑driven pricing models, metadata management, lineage extraction, automation practices, and offers practical advice for enterprises seeking effective data governance.

AutomationBig DataCost-based Pricing
0 likes · 12 min read
Data Governance at Didi: Interview with Liu Chao on Big Data Asset Management
DataFunSummit
DataFunSummit
Feb 28, 2025 · Big Data

Apache Gravitino: Open‑Source Data Asset Management for AI and Multi‑Cloud Environments

This article introduces Apache Gravitino, an open‑source metadata and data‑asset management platform designed to address AI‑driven data demands and multi‑cloud challenges, detailing its architecture, core components, typical use cases, real‑world success stories, and a Q&A session on its capabilities.

AIApache GravitinoBig Data
0 likes · 18 min read
Apache Gravitino: Open‑Source Data Asset Management for AI and Multi‑Cloud Environments
Architects' Tech Alliance
Architects' Tech Alliance
Jan 5, 2025 · Fundamentals

HadaFS: A New Burst Buffer File System for Scalable High‑Performance Computing

The article presents HadaFS, a novel burst‑buffer‑based distributed file system that combines the scalability of local burst buffers with the data‑sharing advantages of shared buffers, details its LTA architecture, metadata handling, the Hadash management tool, and extensive performance evaluations on the SNS supercomputer.

Burst BufferFile SystemHPC Storage
0 likes · 18 min read
HadaFS: A New Burst Buffer File System for Scalable High‑Performance Computing
Bilibili Tech
Bilibili Tech
Dec 17, 2024 · Big Data

Apache Gravitino: Metadata Management Practices and Production Experience at Bilibili

Bilibili adopted Apache Gravitino as a unified metadata platform that decouples consumers, consolidates schemas and Fileset‑based unstructured data across heterogeneous sources, cuts metadata and storage costs, resolves inconsistencies, boosts Hive Metastore performance, and enables features such as Iceberg branching and future AI‑centric governance.

Apache GravitinoBig DataData Governance
0 likes · 20 min read
Apache Gravitino: Metadata Management Practices and Production Experience at Bilibili
ByteDance Data Platform
ByteDance Data Platform
Nov 27, 2024 · Big Data

Inside Douyin’s Data Asset Platform: Transforming Data Lineage and Governance

Douyin Group’s data asset management platform introduces a systematic "manage, find, use" approach that unifies metadata collection, full‑coverage data lineage, and a suite of applications across development, governance, asset utilization, and security, while outlining its architecture, modeling, quality metrics, and future roadmap.

Big DataData GovernanceData Lineage
0 likes · 14 min read
Inside Douyin’s Data Asset Platform: Transforming Data Lineage and Governance
DataFunTalk
DataFunTalk
Nov 10, 2024 · Big Data

Douyin Group Data Asset Management Platform and Data Lineage Architecture Overview

This article provides a comprehensive overview of Douyin Group's data asset management platform, detailing the evolution, architecture, and applications of its large‑scale data lineage system, and discusses future directions for enhancing data quality, cost efficiency, and security across the organization.

Big DataData GovernanceData Lineage
0 likes · 15 min read
Douyin Group Data Asset Management Platform and Data Lineage Architecture Overview
Bilibili Tech
Bilibili Tech
Nov 1, 2024 · Big Data

Magnus: Intelligent Data Optimization Service for Iceberg Tables in Bilibili's Lakehouse Platform

Magnus is Bilibili’s self‑developed intelligent service that continuously optimizes Iceberg tables by scheduling snapshot expiration, orphan‑file cleanup, manifest rewriting, and multi‑dimensional data optimizations—including small‑file merging, sorting, distribution, and index creation—while automatically recommending configurations from real‑time query logs, delivering over 99.9% task success and up to 30% scan‑data reduction.

Data LakeIcebergIntelligent Recommendation
0 likes · 15 min read
Magnus: Intelligent Data Optimization Service for Iceberg Tables in Bilibili's Lakehouse Platform
DataFunSummit
DataFunSummit
Aug 13, 2024 · Big Data

Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design

This article presents Qichacha's comprehensive data‑cost‑reduction strategy, detailing its Hadoop‑based three‑pillar architecture, layered data warehouse, Hive upgrades, unified metadata across multi‑cloud clusters, middleware choices such as Alluxio and JuiceFS, version‑compatible hybrid clouds, and Kubernetes‑driven resource orchestration to achieve scalable, low‑cost data processing.

Big DataData ArchitectureData Warehouse
0 likes · 16 min read
Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design
DataFunSummit
DataFunSummit
Jul 31, 2024 · Big Data

Tencent Big Data Processing Suite and Gravitino: Unified Metadata and Permission Management

This article introduces Tencent's Big Data Processing Suite (TBDS) and the open‑source Gravitino project, explaining how they provide a unified metadata service and a comprehensive, extensible permission model to address data and permission islands across heterogeneous Hadoop and MPP ecosystems.

Big DataData LakeGravitino
0 likes · 12 min read
Tencent Big Data Processing Suite and Gravitino: Unified Metadata and Permission Management
Bilibili Tech
Bilibili Tech
Jul 19, 2024 · Big Data

Bilibili's One-Stop Big Data Cluster Management Platform (BMR) - Architecture and Implementation

Bilibili’s one‑stop Big Data Cluster Management Platform (BMR) consolidates HDFS, Spark, Flink, ClickHouse, Kafka and other services into a unified system that evolved through four stages—standardization, metadata‑driven construction, containerization, and observability—addressing node consistency, scaling, fault self‑healing, and resource optimization while delivering elastic scaling, automated start/stop, and future cost‑saving and stability enhancements.

Cluster ManagementContainerizationMetadata Management
0 likes · 12 min read
Bilibili's One-Stop Big Data Cluster Management Platform (BMR) - Architecture and Implementation
Beijing SF i-TECH City Technology Team
Beijing SF i-TECH City Technology Team
May 30, 2024 · Big Data

Data Lineage System Design and Implementation for Big Data Platforms

This article presents a comprehensive data lineage system (Data-Lineage) for big data platforms, addressing challenges in heterogeneous data sources, multiple execution engines, and complex dependencies through hook-based architecture and modular design.

Data LineageMetadata ManagementSQL parsing
0 likes · 12 min read
Data Lineage System Design and Implementation for Big Data Platforms
vivo Internet Technology
vivo Internet Technology
May 29, 2024 · Operations

vivo CICD Artifact Management: Evolution and Implementation Practices

vivo’s CICD artifact management has evolved from manual builds to a comprehensive Platform Management 2.0 that provides unified storage, multi‑type support, version control, promotion, security scanning, lifecycle policies, and fine‑grained access, dramatically reducing errors and operational costs.

Artifact ManagementArtifact PromotionCICD
0 likes · 15 min read
vivo CICD Artifact Management: Evolution and Implementation Practices
DataFunSummit
DataFunSummit
May 21, 2024 · Operations

Bilibili Data Governance Operational Framework Practice

This article presents Bilibili's practical data governance operational framework, introducing the DAMA‑Bok methodology, detailing two real‑world cases on storage‑level risk and data‑loss post‑mortem, and outlining the organizational, metadata, and embedded governance mechanisms that drive cost and quality improvements.

DAMA-BokData GovernanceMetadata Management
0 likes · 19 min read
Bilibili Data Governance Operational Framework Practice
DataFunTalk
DataFunTalk
May 19, 2024 · Big Data

Tencent's Multi-Engine Unified Metadata and Permission Management for Big Data

This article introduces Tencent's Big Data Processing Suite (TBDS), discusses challenges of data silos, and presents Gravitino's open‑source unified metadata service and permission model, detailing how it integrates Hadoop, MPP, and various catalog plugins to provide consistent access control across heterogeneous data platforms.

Access ControlBig DataData Lake
0 likes · 12 min read
Tencent's Multi-Engine Unified Metadata and Permission Management for Big Data
DataFunSummit
DataFunSummit
May 3, 2024 · Big Data

Comprehensive Guide to Enterprise Data Governance: Vision, Framework, Organization, Standards, Quality, and Security

This article presents a detailed overview of enterprise data governance, covering its vision and goals, three‑layer framework, organizational structure, institutional policies, data standards, quality management, metadata handling, security controls, lifecycle protection, and practical implementation cases.

Big DataData GovernanceMetadata Management
0 likes · 14 min read
Comprehensive Guide to Enterprise Data Governance: Vision, Framework, Organization, Standards, Quality, and Security
DataFunTalk
DataFunTalk
Mar 9, 2024 · Big Data

Construction and Application of Tencent Oula Data Lineage Platform

This article presents a comprehensive overview of Tencent Oula's data lineage system, detailing its background, goals, architecture, modular construction, key technologies such as graph databases and SQL parsing, and various internal application scenarios including data governance, cost insight, and baseline monitoring.

Big DataData GovernanceData Lineage
0 likes · 20 min read
Construction and Application of Tencent Oula Data Lineage Platform
Bitu Technology
Bitu Technology
Jan 17, 2024 · Artificial Intelligence

Rosetta Stone: Scalable ID Mapping System for Tubi's Content Library Using LLMs and Embeddings

This article describes how Tubi built the Rosetta Stone system—a flexible ID mapping workflow that leverages large language models, embedding similarity ranking, and K‑nearest‑neighbors to unify and enrich metadata across a 200,000‑title library, improve content recommendation, and streamline operations.

Big DataEmbeddingsLLM
0 likes · 10 min read
Rosetta Stone: Scalable ID Mapping System for Tubi's Content Library Using LLMs and Embeddings
DataFunSummit
DataFunSummit
Dec 1, 2023 · Big Data

Bilibili's Event Tracking Standardization: Practices, Challenges, and Future Directions

This article details Bilibili's comprehensive approach to standardizing event tracking (埋点), covering its definition, data pipeline, common business issues, metadata‑driven management strategies, efficiency gains, and future prospects for unified real‑time and batch processing.

Big DataBilibiliData Standardization
0 likes · 21 min read
Bilibili's Event Tracking Standardization: Practices, Challenges, and Future Directions
DataFunTalk
DataFunTalk
Sep 12, 2023 · Big Data

Building an Intelligent Data Governance Platform at NetEase Cloud Music: Architecture, Practices, and Future Plans

This article presents a comprehensive case study of NetEase Cloud Music’s metadata‑driven intelligent governance platform, detailing its scale, construction background, modular architecture, rule‑based automation, practical deployment, and future roadmap for sustainable data ecosystem management.

AutomationBig DataData Governance
0 likes · 22 min read
Building an Intelligent Data Governance Platform at NetEase Cloud Music: Architecture, Practices, and Future Plans