Tagged articles

metadata management

106 articles · Page 1 of 2

AI Large-Model Wave and Transformation Guide

May 29, 2026 · Big Data

How to Solve Data Governance + AI Agent Pitfalls: Agent Roles, NL2SQL Datasets, and Rule Templates Explained

The article analyzes why data‑governance projects still fail when combined with AI, presents a four‑layer NL2SQL architecture, details agent responsibilities, metadata‑governance methods, anomaly‑diagnosis and permission‑control flows, outlines dataset‑building stages, evaluation metrics, and provides a step‑by‑step rollout roadmap.

AI AgentAnomaly DetectionData Governance

0 likes · 21 min read

How to Solve Data Governance + AI Agent Pitfalls: Agent Roles, NL2SQL Datasets, and Rule Templates Explained

DataFunSummit

May 22, 2026 · Big Data

How OPPO Accelerates Multimodal Data & AI Fusion with Gravitino and Curvine

OPPO tackles explosive multimodal data growth by unifying metadata with Gravitino and boosting I/O performance using the open‑source Curvine cache, delivering a four‑layer data‑lake architecture that resolves data islands, metadata chaos, and bandwidth bottlenecks while achieving near‑commercial query speeds.

CurvineGravitinoLanceDB

0 likes · 11 min read

How OPPO Accelerates Multimodal Data & AI Fusion with Gravitino and Curvine

DataFunSummit

Apr 20, 2026 · Industry Insights

How Apache Gravitino Solves Data Fragmentation in the Multi‑Cloud AI Era

In a Data for AI meetup, Datastrato's VP of Engineering Shi Shaofeng explains how Apache Gravitino's metadata federation, metalake architecture, and unified access control address multi‑cloud data fragmentation, compliance, and AI‑driven governance while outlining version 1.1.0 enhancements and the roadmap for 1.2.0.

AI data governanceApache GravitinoMulti-Cloud

0 likes · 12 min read

How Apache Gravitino Solves Data Fragmentation in the Multi‑Cloud AI Era

DataFunSummit

Apr 19, 2026 · Big Data

How OPPO Built a Multi‑Modal Data Lake with Gravitino and Curvine

OPPO’s data‑lake team, led by David, detailed their transition from Hive‑Spark to a unified multi‑modal lake, leveraging Gravitino for cross‑engine metadata management and the open‑source Curvine cache to eliminate data silos, boost I/O performance, and support massive image, recommendation, and AI‑Agent workloads.

Big DataData LakeMultimodal

0 likes · 11 min read

How OPPO Built a Multi‑Modal Data Lake with Gravitino and Curvine

dbaplus Community

Mar 31, 2026 · Industry Insights

Why Most Data Governance Projects Fail and How to Build a Practical, Engineer‑Friendly Solution

Most companies see data governance fail not because of technology but because they start with the wrong direction, focusing on rules, platforms, and processes that add friction instead of improving data usability, and the article provides a step‑by‑step, low‑overhead approach with concrete SQL and Python templates to fix it.

Data GovernanceEngineering ProductivityPython

0 likes · 25 min read

Why Most Data Governance Projects Fail and How to Build a Practical, Engineer‑Friendly Solution

DataFunSummit

Mar 25, 2026 · Big Data

How Apache Gravitino and OpenLineage Transform Data Governance for AI‑Driven Enterprises

In the era of AI and multi‑cloud, this article analyzes the core challenges of data governance—data silos, quality gaps, and compliance risks—and explains how Apache Gravitino’s unified metadata architecture together with OpenLineage’s standardized lineage model provide a scalable, automated solution for intelligent, real‑time data management.

Apache GravitinoBig DataData Governance

0 likes · 15 min read

How Apache Gravitino and OpenLineage Transform Data Governance for AI‑Driven Enterprises

Big Data Tech Team

Jan 19, 2026 · Big Data

What Is Data Fabric and How It Can Eliminate Data Silos Today

This article explains the concept of Data Fabric, debunks common misconceptions, outlines the three key drivers behind its rise, and provides a practical four‑step roadmap—including metadata, semantic layers, policy engines, and AI—to help teams of any size adopt the technology.

AICloudData Fabric

0 likes · 7 min read

What Is Data Fabric and How It Can Eliminate Data Silos Today

DataFunSummit

Dec 1, 2025 · Big Data

7 Cutting-Edge Data Engineering Practices Shaping AI-Driven Data Lakes

This article collection showcases seven advanced data engineering solutions—from Tencent Cloud's Iceberg batch‑stream integration and Apache Gravitino metadata lineage to Xiaohongshu's Lakehouse evolution and multimodal AI data lake implementations—highlighting architectural innovations, performance optimizations, and real‑world deployment insights for modern big‑data platforms.

Apache GravitinoApache IcebergBatch-Stream Integration

0 likes · 7 min read

7 Cutting-Edge Data Engineering Practices Shaping AI-Driven Data Lakes

DataFunSummit

Nov 24, 2025 · Big Data

How Tencent Cloud Uses Iceberg, Gravitino and Multimodal Lakes for Unified Data Processing

This article series explores Tencent Cloud's Iceberg‑based batch‑stream integration, Apache Gravitino's unified metadata and lineage solution, Xiaohongshu's data‑architecture evolution for the Big AI Data era, and a practical Data+AI multimodal data‑lake implementation, highlighting challenges, architectural designs, and performance gains.

Big DataData LakeIceberg

0 likes · 7 min read

How Tencent Cloud Uses Iceberg, Gravitino and Multimodal Lakes for Unified Data Processing

DataFunSummit

Nov 1, 2025 · Big Data

Douyin’s Data Asset Platform: Building Real‑Time, Full‑Coverage Big Data Lineage

This article introduces Douyin Group’s Data Asset Management Platform, explaining how its focus on data assets rather than raw metadata enables a comprehensive, real‑time big‑data lineage system that supports search, AI‑driven discovery, and diverse application scenarios across the organization.

Data Asset PlatformDouyindata lineage

0 likes · 6 min read

Douyin’s Data Asset Platform: Building Real‑Time, Full‑Coverage Big Data Lineage

DataFunSummit

Oct 29, 2025 · Big Data

How Douyin’s Data Asset Platform Revolutionizes Big Data Lineage

This article introduces Douyin Group’s Data Asset Management Platform, explaining its shift from traditional metadata to a comprehensive data‑asset approach, detailing the platform’s capabilities, and focusing on the evolution and application of full‑link data lineage across four key topics to improve visibility, quality, security, and cost efficiency.

Big DataData AssetsDouyin

0 likes · 5 min read

How Douyin’s Data Asset Platform Revolutionizes Big Data Lineage

Instant Consumer Technology Team

Oct 28, 2025 · Artificial Intelligence

Can Data Virtualization Deliver Millisecond Real‑Time Features Across Stores?

This article shares a three‑year journey of building a data‑virtualization‑based, multi‑environment feature management framework for real‑time risk decision platforms, detailing challenges like heterogeneous storage, cold‑start, and operational stability, and presenting a unified architecture that decouples physical storage from business logic.

Big DataData Virtualizationfeature engineering

0 likes · 16 min read

Can Data Virtualization Deliver Millisecond Real‑Time Features Across Stores?

DataFunSummit

Oct 12, 2025 · Big Data

How Douyin’s Data Asset Platform Revolutionizes Big Data Lineage

This article introduces Douyin Group’s Data Asset Management Platform, explaining its shift from traditional metadata to comprehensive data assets, detailing the evolution, architecture, and applications of its full‑link big data lineage, and offering strategic guidance for building effective lineage systems.

Data GovernanceDouyindata asset

0 likes · 5 min read

Baidu Intelligent Cloud Tech Hub

Sep 22, 2025 · Cloud Computing

How Mantle Breaks the Hierarchical Namespace Bottleneck in Cloud Object Storage

The Mantle system, presented in a SOSP'25 paper by Baidu's storage team and collaborators, delivers a distributed hierarchical namespace for cloud object storage that overcomes traditional scalability and performance limits, enabling massive data lake workloads with dramatically reduced latency and vastly increased throughput.

SOSPcloud storagedistributed systems

0 likes · 8 min read

How Mantle Breaks the Hierarchical Namespace Bottleneck in Cloud Object Storage

Data Thinking Notes

Sep 14, 2025 · Artificial Intelligence

How to Build a Robust Tool Integration Module for AI Agents

This article explains the architecture, core components, and step‑by‑step implementation of a tool usage module that enables AI agents to standardize, select, execute, and transform external tools, illustrated with a sales data analysis case and detailed code snippets.

AI AgentLLMTool Integration

0 likes · 9 min read

How to Build a Robust Tool Integration Module for AI Agents

DataFunTalk

Sep 6, 2025 · Big Data

How Xiaomi Cuts Costs and Boosts Efficiency with a Cloud‑Native Lakehouse Architecture

Xiaomi’s data‑lake team explains how they tackled small‑file issues, unified metadata with Gravitino, migrated Hive to Iceberg and Fileset, leveraged JuiceFS for multi‑cloud storage, and combined Iceberg and Paimon to achieve cost‑effective, high‑performance batch and real‑time analytics.

Big DataCloud NativeData Lake

0 likes · 13 min read

How Xiaomi Cuts Costs and Boosts Efficiency with a Cloud‑Native Lakehouse Architecture

DataFunSummit

Sep 2, 2025 · Big Data

How Xiaomi Cuts Costs and Boosts Performance with Cloud‑Native Data Lake Architecture

Xiaomi’s engineers explain how they tackled data‑lake challenges—small files, metadata latency, and multi‑cloud costs—by combining compact storage, Gravitino‑based metadata governance, Iceberg and Paimon formats, and JuiceFS abstraction, achieving lower storage expenses, faster queries, and a roadmap toward intelligent, real‑time, multimodal lakehouses.

Big DataData LakeMulti-Cloud

0 likes · 14 min read

How Xiaomi Cuts Costs and Boosts Performance with Cloud‑Native Data Lake Architecture

DataFunTalk

Aug 28, 2025 · Big Data

How JD Retail Tackles Data Governance Challenges to Boost Efficiency

JD Retail faces growing data volume, redundant models, and resource‑intensive storage, prompting a comprehensive data‑governance strategy that defines standards, streamlines architecture, isolates development, and optimizes compute and storage costs, ultimately enabling more efficient, secure, and agile data operations across the enterprise.

Big DataData ArchitectureData Governance

0 likes · 8 min read

How JD Retail Tackles Data Governance Challenges to Boost Efficiency

Big Data Tech Team

Jun 9, 2025 · Industry Insights

How AI Large Models Transform Data Governance: 2025 Insights & Best Practices

This article examines the essence of data governance, outlines its four core domains, proposes a strategic and technical implementation roadmap, evaluates effectiveness with the DCAM model, and explores how AI large models can enhance metadata, data quality, and compliance while highlighting practical limitations and future trends.

AI large modelsData Qualitycompliance

0 likes · 9 min read

How AI Large Models Transform Data Governance: 2025 Insights & Best Practices

DataFunSummit

Jun 6, 2025 · Big Data

How Unicom Digital’s Integrated Data Platform Revolutionizes Metadata Management

This article details Unicom Digital’s metadata management practice on its integrated data platform, covering the strategic background of data, key challenges, award-winning capabilities, three-pronged solutions—automation, linking+, and AI—along with practical implementations, full‑chain lineage, data responsibility, lifecycle management, and future AI‑driven enhancements.

AIAutomationBig Data

0 likes · 18 min read

How Unicom Digital’s Integrated Data Platform Revolutionizes Metadata Management

Big Data Technology & Architecture

May 16, 2025 · Big Data

Apache Gravitino: An Open‑Source Metadata Lake for Unified Data and AI Asset Management

Apache Gravitino is an open‑source metadata service platform that provides a unified, high‑performance, geographically distributed metadata lake, enabling end‑to‑end data governance, multi‑engine access, and direct management of both structured and unstructured data assets across diverse systems.

Apache GravitinoData GovernanceData Lake

0 likes · 9 min read

Apache Gravitino: An Open‑Source Metadata Lake for Unified Data and AI Asset Management

Ma Wei Says

Mar 30, 2025 · Fundamentals

How Kafka 4.0’s KRaft Replaces ZooKeeper with Raft Consensus

Kafka 4.0 introduces KRaft, a ZooKeeper‑free metadata layer built on the Raft consensus algorithm, detailing role transitions, leader election, log replication, controller and broker responsibilities, and fault‑tolerance mechanisms, enabling a more scalable and self‑managed architecture for large‑scale distributed streaming.

Consensus AlgorithmKRaftRaft

0 likes · 13 min read

How Kafka 4.0’s KRaft Replaces ZooKeeper with Raft Consensus

Big Data Tech Team

Feb 17, 2025 · Industry Insights

How DeepSeek Transforms Data Warehouse Development: 5 Game-Changing Benefits

DeepSeek, the popular Chinese large‑language model, boosts data‑warehouse engineers' productivity by offering free, open‑source AI assistance across code writing, model design, metadata management, data quality monitoring, and governance, ultimately maximizing enterprise data asset value.

Data QualityData WarehouseDeepSeek

0 likes · 5 min read

How DeepSeek Transforms Data Warehouse Development: 5 Game-Changing Benefits

Architects' Tech Alliance

Jan 5, 2025 · Fundamentals

HadaFS: A New Burst Buffer File System for Scalable High‑Performance Computing

The article presents HadaFS, a novel burst‑buffer‑based distributed file system that combines the scalability of local burst buffers with the data‑sharing advantages of shared buffers, details its LTA architecture, metadata handling, the Hadash management tool, and extensive performance evaluations on the SNS supercomputer.

Burst BufferFile SystemHPC Storage

0 likes · 18 min read

HadaFS: A New Burst Buffer File System for Scalable High‑Performance Computing

Bilibili Tech

Dec 17, 2024 · Big Data

Apache Gravitino: Metadata Management Practices and Production Experience at Bilibili

Bilibili adopted Apache Gravitino as a unified metadata platform that decouples consumers, consolidates schemas and Fileset‑based unstructured data across heterogeneous sources, cuts metadata and storage costs, resolves inconsistencies, boosts Hive Metastore performance, and enables features such as Iceberg branching and future AI‑centric governance.

Apache GravitinoBig DataFileset

0 likes · 20 min read

Apache Gravitino: Metadata Management Practices and Production Experience at Bilibili

Huolala Tech

Dec 5, 2024 · Big Data

Huolala’s Metadata Platform: Scaling Data Lineage, AI Search & Cost Governance

Huolala’s data team details the evolution of its metadata management platform—covering architecture, stages from early Hive‑ETL to real‑time field‑level lineage, AI‑driven smart search, cost‑governance mechanisms, and security classifications—showcasing practical solutions for data discoverability, efficiency, and protection at scale.

AI SearchData Securitycost governance

0 likes · 27 min read

Huolala’s Metadata Platform: Scaling Data Lineage, AI Search & Cost Governance

ByteDance Data Platform

Nov 27, 2024 · Big Data

Inside Douyin’s Data Asset Platform: Transforming Data Lineage and Governance

Douyin Group’s data asset management platform introduces a systematic "manage, find, use" approach that unifies metadata collection, full‑coverage data lineage, and a suite of applications across development, governance, asset utilization, and security, while outlining its architecture, modeling, quality metrics, and future roadmap.

Data Governancedata lineagemetadata management

0 likes · 14 min read

Inside Douyin’s Data Asset Platform: Transforming Data Lineage and Governance

DataFunTalk

Nov 10, 2024 · Big Data

Douyin Group Data Asset Management Platform and Data Lineage Architecture Overview

This article provides a comprehensive overview of Douyin Group's data asset management platform, detailing the evolution, architecture, and applications of its large‑scale data lineage system, and discusses future directions for enhancing data quality, cost efficiency, and security across the organization.

Data Governancedata lineagemetadata management

0 likes · 15 min read

Douyin Group Data Asset Management Platform and Data Lineage Architecture Overview

Bilibili Tech

Nov 1, 2024 · Big Data

Magnus: Intelligent Data Optimization Service for Iceberg Tables in Bilibili's Lakehouse Platform

Magnus is Bilibili’s self‑developed intelligent service that continuously optimizes Iceberg tables by scheduling snapshot expiration, orphan‑file cleanup, manifest rewriting, and multi‑dimensional data optimizations—including small‑file merging, sorting, distribution, and index creation—while automatically recommending configurations from real‑time query logs, delivering over 99.9% task success and up to 30% scan‑data reduction.

Data LakeIcebergIntelligent Recommendation

0 likes · 15 min read

Magnus: Intelligent Data Optimization Service for Iceberg Tables in Bilibili's Lakehouse Platform

DataFunSummit

Aug 13, 2024 · Big Data

Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design

This article presents Qichacha's comprehensive data‑cost‑reduction strategy, detailing its Hadoop‑based three‑pillar architecture, layered data warehouse, Hive upgrades, unified metadata across multi‑cloud clusters, middleware choices such as Alluxio and JuiceFS, version‑compatible hybrid clouds, and Kubernetes‑driven resource orchestration to achieve scalable, low‑cost data processing.

Big DataData WarehouseHadoop

0 likes · 16 min read

Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design

AsiaInfo Technology: New Tech Exploration

Aug 12, 2024 · Big Data

How Hudi MetaServer Transforms Metadata Management and Performance in Data Lakes

This article examines the challenges of Hudi metadata stored on HDFS, introduces the independently developed Hudi MetaServer for centralized metadata, visual management, unified permission control, TTL, expression payloads, and multi‑active scaling, and outlines future enhancements such as LLS, multi‑table fusion, and JDBC support.

Big DataData LakeHudi

0 likes · 11 min read

How Hudi MetaServer Transforms Metadata Management and Performance in Data Lakes

DataFunSummit

Jul 31, 2024 · Big Data

Tencent Big Data Processing Suite and Gravitino: Unified Metadata and Permission Management

This article introduces Tencent's Big Data Processing Suite (TBDS) and the open‑source Gravitino project, explaining how they provide a unified metadata service and a comprehensive, extensible permission model to address data and permission islands across heterogeneous Hadoop and MPP ecosystems.

Big DataData LakeGravitino

0 likes · 12 min read

Tencent Big Data Processing Suite and Gravitino: Unified Metadata and Permission Management

Bilibili Tech

Jul 19, 2024 · Big Data

Bilibili's One-Stop Big Data Cluster Management Platform (BMR) - Architecture and Implementation

Bilibili’s one‑stop Big Data Cluster Management Platform (BMR) consolidates HDFS, Spark, Flink, ClickHouse, Kafka and other services into a unified system that evolved through four stages—standardization, metadata‑driven construction, containerization, and observability—addressing node consistency, scaling, fault self‑healing, and resource optimization while delivering elastic scaling, automated start/stop, and future cost‑saving and stability enhancements.

Observabilitybig data platformcluster management

0 likes · 12 min read

Bilibili's One-Stop Big Data Cluster Management Platform (BMR) - Architecture and Implementation

Data Thinking Notes

Jun 6, 2024 · Big Data

How to Build a Robust Data Indicator System: From Design to Future AI Integration

This article explains how to construct a comprehensive data indicator system by outlining its background, design, standardization, metadata management, and future applications, while addressing business, technical, and product challenges and showcasing practical examples and visual workflows.

Big DataData GovernanceIndicator System

0 likes · 9 min read

How to Build a Robust Data Indicator System: From Design to Future AI Integration

Beijing SF i-TECH City Technology Team

May 30, 2024 · Big Data

Data Lineage System Design and Implementation for Big Data Platforms

This article presents a comprehensive data lineage system (Data-Lineage) for big data platforms, addressing challenges in heterogeneous data sources, multiple execution engines, and complex dependencies through hook-based architecture and modular design.

Big Data ArchitectureData QualitySQL parsing

0 likes · 12 min read

Data Lineage System Design and Implementation for Big Data Platforms

vivo Internet Technology

May 29, 2024 · Operations

vivo CICD Artifact Management: Evolution and Implementation Practices

vivo’s CICD artifact management has evolved from manual builds to a comprehensive Platform Management 2.0 that provides unified storage, multi‑type support, version control, promotion, security scanning, lifecycle policies, and fine‑grained access, dramatically reducing errors and operational costs.

Artifact ManagementArtifact PromotionCICD

0 likes · 15 min read

vivo CICD Artifact Management: Evolution and Implementation Practices

DataFunSummit

May 21, 2024 · Operations

Bilibili Data Governance Operational Framework Practice

This article presents Bilibili's practical data governance operational framework, introducing the DAMA‑Bok methodology, detailing two real‑world cases on storage‑level risk and data‑loss post‑mortem, and outlining the organizational, metadata, and embedded governance mechanisms that drive cost and quality improvements.

DAMA-BokData Qualitycost governance

0 likes · 19 min read

Bilibili Data Governance Operational Framework Practice

DataFunTalk

May 19, 2024 · Big Data

Tencent's Multi-Engine Unified Metadata and Permission Management for Big Data

This article introduces Tencent's Big Data Processing Suite (TBDS), discusses challenges of data silos, and presents Gravitino's open‑source unified metadata service and permission model, detailing how it integrates Hadoop, MPP, and various catalog plugins to provide consistent access control across heterogeneous data platforms.

Access ControlBig DataGravitino

0 likes · 12 min read

Tencent's Multi-Engine Unified Metadata and Permission Management for Big Data

Bitu Technology

Jan 17, 2024 · Artificial Intelligence

Rosetta Stone: Scalable ID Mapping System for Tubi's Content Library Using LLMs and Embeddings

This article describes how Tubi built the Rosetta Stone system—a flexible ID mapping workflow that leverages large language models, embedding similarity ranking, and K‑nearest‑neighbors to unify and enrich metadata across a 200,000‑title library, improve content recommendation, and streamline operations.

Big DataLLMcontent ID mapping

0 likes · 10 min read

Rosetta Stone: Scalable ID Mapping System for Tubi's Content Library Using LLMs and Embeddings

DataFunSummit

Dec 1, 2023 · Big Data

Bilibili's Event Tracking Standardization: Practices, Challenges, and Future Directions

This article details Bilibili's comprehensive approach to standardizing event tracking (埋点), covering its definition, data pipeline, common business issues, metadata‑driven management strategies, efficiency gains, and future prospects for unified real‑time and batch processing.

AnalyticsBilibiliData Standardization

0 likes · 21 min read

Bilibili's Event Tracking Standardization: Practices, Challenges, and Future Directions

Programmer DD

Sep 15, 2023 · Big Data

How Alluxio Manages Massive Metadata: Inode, Block, MountTable, and Worker Insights

This article examines Alluxio's open-source distributed file system, detailing the core types of metadata—inode, block, mount table, and worker—along with the mechanisms for their storage, management, and optimization in both HEAP and ROCKS modes, and provides practical configuration guidance for scaling large-scale data environments.

AlluxioBig DataDistributed File System

0 likes · 15 min read

How Alluxio Manages Massive Metadata: Inode, Block, MountTable, and Worker Insights

DataFunTalk

Sep 12, 2023 · Big Data

Building an Intelligent Data Governance Platform at NetEase Cloud Music: Architecture, Practices, and Future Plans

This article presents a comprehensive case study of NetEase Cloud Music’s metadata‑driven intelligent governance platform, detailing its scale, construction background, modular architecture, rule‑based automation, practical deployment, and future roadmap for sustainable data ecosystem management.

AutomationBig DataData Governance

0 likes · 22 min read

Building an Intelligent Data Governance Platform at NetEase Cloud Music: Architecture, Practices, and Future Plans

Weimob Technology Center

Aug 1, 2023 · Big Data

How Weimeng Transformed Data Asset Governance: A Practical Blueprint for Enterprises

Facing fragmented metadata, unclear ownership, and costly data duplication, Weimeng implemented a comprehensive data asset governance framework—covering metadata standards, lineage visualization, metric normalization, and cost management—to boost data quality, security, and business value across its new‑retail platform.

Data Governancedata lineagedata operations

0 likes · 15 min read

How Weimeng Transformed Data Asset Governance: A Practical Blueprint for Enterprises

Didi Tech

Jul 31, 2023 · Big Data

Data Serviceization at Didi: Architecture, Phases, and Standard Metric Service

Didi’s data serviceization converts raw business data into consumable services through a four‑stage pipeline—integration, development, production, and back‑flow—while the Data Dream Factory and Shu‑Chain platform automate synchronization, provide a unified access gateway for thousands of APIs, and introduce a standard metric service that abstracts storage complexities and ensures high‑performance, secure data delivery.

Data IntegrationData Platformdata serviceization

0 likes · 16 min read

Data Serviceization at Didi: Architecture, Phases, and Standard Metric Service

Big Data Technology & Architecture

Jul 12, 2023 · Big Data

Design and Evolution of Volcano Engine DataLeap Data Catalog System

This article details the architecture, design decisions, and iterative improvements of the Data Catalog product within Volcano Engine's DataLeap suite, covering metadata management, ingestion pipelines, search optimization, lineage capabilities, storage layer enhancements, and future development directions.

Apache AtlasBig DataConnector

0 likes · 16 min read

Design and Evolution of Volcano Engine DataLeap Data Catalog System

AntTech

Jul 11, 2023 · Operations

Achieving Full-Stack Observability for Cloud and On-Premise Applications with Ant Group's BOS Platform

This article examines the challenges of maintaining stability across cloud and on‑premise environments, explains how Ant Group's Business‑Intelligent Observability Service (BOS) addresses these issues through unified metadata, seamless application integration, data standardization, and extensive case studies, and demonstrates the resulting improvements in reliability and operational efficiency.

Cloud ComputingFull-stack TracingObservability

0 likes · 16 min read

Achieving Full-Stack Observability for Cloud and On-Premise Applications with Ant Group's BOS Platform

DataFunTalk

May 22, 2023 · Big Data

Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices

This article explains Alibaba Cloud's data lake architecture, unified metadata services, storage management optimizations, and format handling techniques, illustrating how lakehouse concepts, multi‑engine support, and lifecycle policies enable efficient, secure, and cost‑effective big data processing in the cloud.

Big DataCloud ServicesData Lake

0 likes · 22 min read

Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices

Data Thinking Notes

Apr 12, 2023 · Big Data

Building an End‑to‑End Data Governance System: Challenges, Solutions & Impact

This article details DataCake's data‑governance journey, covering the problems of data silos, unclear costs, and tool fragmentation, then explains the strategic thinking, the multi‑layered solution architecture, and the measurable outcomes such as higher resource utilization and reclaimed storage.

Big DataData Governancecost analysis

0 likes · 17 min read

Building an End‑to‑End Data Governance System: Challenges, Solutions & Impact

Data Thinking Notes

Apr 5, 2023 · Big Data

Mastering Data Governance: From Challenges to End‑to‑End Solutions

This article explores the key problems data governance aims to solve, outlines a comprehensive governance framework, and details practical implementation steps—including tool integration, metadata management, lake‑in and lake‑out processes, and governance policies—to achieve a closed‑loop, value‑driven data ecosystem.

Big DataData GovernanceData Lake

0 likes · 13 min read

Mastering Data Governance: From Challenges to End‑to‑End Solutions

DataFunSummit

Mar 31, 2023 · Big Data

Data Governance Practices and Implementation at DataCake

The article outlines DataCake's data governance journey, describing the challenges of data silos and cost inefficiencies, the strategic thinking behind a unified metadata platform, the implementation of governance tools, cost analysis modules, and asset inventory, and concludes with results, future plans, and a Q&A session.

Big Datacost analysismetadata management

0 likes · 14 min read

Data Governance Practices and Implementation at DataCake

DataFunSummit

Mar 1, 2023 · Big Data

Data Governance: Challenges, Framework, and Implementation Practices

This article explains the problems that data governance addresses, outlines a comprehensive governance framework—including system architecture, processes, and policies—and describes practical implementation steps such as integrated tooling, standardized modeling, metadata management, lake‑in and lake‑out governance, and organizational structures for sustainable data management.

Big Datagovernance frameworkmetadata management

0 likes · 12 min read

Data Governance: Challenges, Framework, and Implementation Practices

DataFunTalk

Feb 26, 2023 · Big Data

Design, Optimization, and Use Cases of Data Lineage in ByteDance's DataLeap Platform

This article presents an in‑depth overview of DataLeap's data lineage capabilities, covering the challenges, multi‑layer model design, implementation with Apache Atlas and JanusGraph, performance optimizations, diverse use cases across asset, development, governance and security domains, and future trends for lineage technology.

Apache AtlasBig DataData Governance

0 likes · 19 min read

Design, Optimization, and Use Cases of Data Lineage in ByteDance's DataLeap Platform

Youzan Coder

Feb 7, 2023 · Big Data

Automated Offline Data Cost Optimization in Youzan's Data Platform

Youzan built an automated offline data cost‑optimization platform that gathers accurate metadata, mines unused or failing tables and tasks, and safely decommissions them through a backend‑frontend workflow with owner validation, notifications, rollback safeguards, and plans to extend lineage coverage and real‑time asset handling.

Big DataData GovernancePipeline Automation

0 likes · 11 min read

Automated Offline Data Cost Optimization in Youzan's Data Platform

DataFunSummit

Feb 2, 2023 · Big Data

Data Governance Strategies: Concepts, Practices, and Case Studies

The article explains why data is a critical corporate asset, distinguishes narrow and broad data‑governance approaches, outlines strategic principles such as treating governance as a systematic, prioritized effort, and presents eight real‑world case studies from companies like Tencent, SF Tech, Huolala, and NetEase.

Case StudiesData Qualitymetadata management

0 likes · 7 min read

Data Governance Strategies: Concepts, Practices, and Case Studies

DataFunTalk

Jan 31, 2023 · Big Data

Tencent's Data Governance Practices and Technical Implementation

This article presents Tencent's comprehensive data governance framework, covering its definition, objectives, challenges, methodology, organizational structure, metadata management, data asset lifecycle, security measures, and technical implementation details such as microservice architecture, data collection, lineage analysis, and storage solutions.

Big DataData GovernanceTencent

0 likes · 19 min read

Tencent's Data Governance Practices and Technical Implementation

DataFunTalk

Jan 1, 2023 · Big Data

Zhihu's Real-Time Computing Platform: From Skytree 1.0 to Mipha 2.0

Zhihu’s real‑time computing platform, initially built as Skytree 1.0 on Kubernetes and later re‑engineered as Mipha 2.0 with Flink SQL, unified metadata management, dynamic jar loading, UDF support, Protobuf format, CDC integration, and extensive operational optimizations, now processes petabyte‑scale data with high reliability.

FlinkReal-Time ComputingSQL Gateway

0 likes · 21 min read

Zhihu's Real-Time Computing Platform: From Skytree 1.0 to Mipha 2.0

Data Thinking Notes

Nov 24, 2022 · Fundamentals

How to Build an Enterprise Data Governance System from Scratch

This article explains what data governance is, why enterprises need it, the key components such as data quality, metadata, master data, asset and security management, and provides a step‑by‑step framework, organizational structure, platform features, evaluation methods and common pitfalls.

Data AssetsData GovernanceData Quality

0 likes · 17 min read

How to Build an Enterprise Data Governance System from Scratch

Data Thinking Notes

Nov 10, 2022 · Big Data

Building Kuaishou’s Scalable Metadata Management Platform for Big Data

This article details Kuaishou’s evolution of its metadata management platform—from early Hive‑centric beginnings to a unified 2.0 architecture and a forward‑looking 3.0 vision—highlighting challenges, key technologies, and how metadata drives data production, consumption, governance, and cost optimization across the big‑data middle platform.

Data GovernanceData Platformmetadata lineage

0 likes · 17 min read

Building Kuaishou’s Scalable Metadata Management Platform for Big Data

DataFunSummit

Nov 4, 2022 · Big Data

Real-Time Data Lake Practice at ByteDance: Architecture, Challenges, and Solutions

ByteDance’s data platform team explains their real‑time data lake implementation, covering its evolving definition, six core capabilities, challenges such as data management, concurrent updates, performance and log ingestion, and detailed case studies of multi‑stage deployment, indexing, metadata services, and future roadmap.

HudiIndexingReal-time Data Lake

0 likes · 32 min read

Real-Time Data Lake Practice at ByteDance: Architecture, Challenges, and Solutions

Python Crawling & Data Mining

Oct 30, 2022 · Big Data

Why Ozone Is the Next‑Generation Distributed Object Store for Big Data

This article explains how Ozone, the Hadoop community’s new distributed object‑storage system, overcomes HDFS’s small‑file limitations with a hierarchical Volume‑Bucket‑Object model, detailing its architecture, components, data flow for creating and reading objects, and the benefits of its scalable, fault‑tolerant design.

Big DataDistributed storageHadoop

0 likes · 12 min read

Why Ozone Is the Next‑Generation Distributed Object Store for Big Data

Tencent Cloud Developer

Sep 27, 2022 · Big Data

GooseFS: Accelerating Cloud Storage for Big Data and Data Lake Platforms

GooseFS, Tencent Cloud’s Hadoop‑compatible storage accelerator, adds a local NVMe‑SSD cache layer to cloud‑native data lakes, letting users boost query speeds by up to 46 % and cut backend bandwidth by 200 Gbps without code changes, as demonstrated by a music‑industry customer’s 200‑node deployment caching ten million files.

Data LakeGooseFSHigh Availability

0 likes · 16 min read

GooseFS: Accelerating Cloud Storage for Big Data and Data Lake Platforms

Past Memory Big Data

Aug 23, 2022 · Big Data

JD Tech’s Event‑Tracking Data Governance and One‑Stop Platform: Practices and Innovations

The article explains why event‑tracking data needs governance, outlines a full‑link governance methodology, describes the organizational setup, and details the features of JD Tech’s one‑stop tracking management platform, including metadata unification, one‑click validation, real‑time dashboards, visualization tools, and H5‑native data integration.

Data GovernanceH5-native integrationevent tracking

0 likes · 16 min read

JD Tech’s Event‑Tracking Data Governance and One‑Stop Platform: Practices and Innovations

DataFunSummit

Aug 12, 2022 · Big Data

JD's Big Data Cross‑Domain and Hierarchical Storage Practices

JD’s article details its big‑data platform’s cross‑domain and hierarchical storage solutions, describing the challenges of multi‑datacenter data synchronization, the architecture of its storage layer, the implemented asynchronous and synchronous data flows, topology management, metadata tagging, and performance‑enhancing techniques for efficient, disaster‑resilient data handling.

Data PlatformDisaster RecoveryHierarchical Storage

0 likes · 11 min read

JD's Big Data Cross‑Domain and Hierarchical Storage Practices

DataFunTalk

Jul 14, 2022 · Big Data

Real‑Time Data Lake Practices at ByteDance and Alibaba: Architecture, Challenges, and Solutions

This article presents detailed case studies of ByteDance and Alibaba implementing real‑time data lake solutions with Hudi and Flink, describing the business drivers, architectural challenges, and the specific technical strategies such as unified metadata layers, optimistic locking, scalable hash indexing, and CDC‑based incremental ETL to achieve low‑latency, high‑throughput data processing.

FlinkHudiReal-time Data Lake

0 likes · 9 min read

Real‑Time Data Lake Practices at ByteDance and Alibaba: Architecture, Challenges, and Solutions

DataFunTalk

Jul 13, 2022 · Databases

Technical Analysis and Case Studies of Knowledge Graphs by Neo4j

This presentation explains where knowledge resides in data architectures, demonstrates knowledge‑graph‑driven skill discovery, metadata management, and semantic search, and concludes with a comparison of GraphQL and Cypher for graph queries, illustrated with real‑world Neo4j case studies.

CypherGraphQLKnowledge Graph

0 likes · 11 min read

Technical Analysis and Case Studies of Knowledge Graphs by Neo4j

ByteDance Data Platform

Jun 8, 2022 · Backend Development

How ByteDance Optimized Data Catalog Performance with Apache Atlas and JanusGraph

This article details ByteDance's 2021 overhaul of its Data Catalog system, the performance regressions encountered after switching to Apache Atlas, and the step‑by‑step backend optimizations—including JanusGraph tuning, Gremlin query refactoring, parallel processing, and write‑path improvements—that reduced latency from minutes to seconds.

Apache AtlasData CatalogJanusGraph

0 likes · 12 min read

How ByteDance Optimized Data Catalog Performance with Apache Atlas and JanusGraph

Big Data Technology Architecture

Jun 5, 2022 · Big Data

Introduction to Data Lake Concepts, Capabilities, and Applications

This article explains the origin and definition of data lakes, describes their ability to store structured, semi‑structured and unstructured data at any scale on‑premises or in the cloud, outlines essential lake capabilities such as unified storage, raw‑data preservation, scalable compute, metadata and security management, and compares data lakes with data warehouses and lakehouse architectures through real‑world cloud‑native examples.

cloud storagemetadata management

0 likes · 16 min read

Introduction to Data Lake Concepts, Capabilities, and Applications

vivo Internet Technology

May 25, 2022 · Big Data

Understanding Druid Metadata Management and Architecture

Apache Druid manages metadata through a layered, distributed system where the Overlord coordinates ingestion tasks, MiddleManagers launch Peons to create segments, Coordinators and Historical nodes store and serve segment data, Brokers route queries, while MySQL, Zookeeper, memory, and local files synchronize metadata for fault‑tolerant, high‑performance OLAP analytics.

Big DataDruidQuery Processing

0 likes · 19 min read

Understanding Druid Metadata Management and Architecture

DataFunTalk

May 23, 2022 · Big Data

Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions

ByteDance shares its real‑time data lake implementation, covering the evolving definition of data lakes, six core capabilities, challenges such as data management, weak concurrent updates, performance, and log ingestion, and detailed solutions including Hudi Metastore Server, bucket indexing, multi‑stage use cases, and future roadmap.

Batch ProcessingHudiReal-time Data Lake

0 likes · 32 min read

Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions

Airbnb Technology Team

May 12, 2022 · Information Security

Airbnb Data Privacy and Security Engineering – Data Protection Platform (DPP) Overview and Madoka Metadata System

Airbnb’s Data Protection Platform (DPP) combines automated discovery, classification, encryption and privacy‑orchestration services—Inspekt, Angmar, Cipher, Obliviate, Minister, and the Madoka metadata system—to continuously inventory petabyte‑scale MySQL, Hive and S3 assets, track ownership and security attributes, and enforce GDPR, PIPL and CCPA compliance.

AirbnbAutomationData Protection

0 likes · 15 min read

Airbnb Data Privacy and Security Engineering – Data Protection Platform (DPP) Overview and Madoka Metadata System

ByteDance Data Platform

Apr 27, 2022 · Big Data

How ByteDance Built a Scalable Data Catalog: Key Technologies and Future Plans

ByteDance’s Data Catalog article details the system’s unified metadata model, standardized ingestion connectors, search optimization techniques, lineage capabilities, and storage layer enhancements, highlighting key technical designs, performance improvements, and future work to advance data governance and asset utilization.

Data CatalogStorage Optimizationdata lineage

0 likes · 12 min read

How ByteDance Built a Scalable Data Catalog: Key Technologies and Future Plans

ByteDance Data Platform

Dec 31, 2021 · Big Data

How ByteDance Leverages Hudi for a Real‑Time Data Lake Platform

This article introduces ByteDance’s real‑time data lake platform built on Apache Hudi, covering Hudi fundamentals, table types, indexing, practical use cases, platform optimizations, and future roadmap, illustrating how the system enables low‑latency, scalable analytics across batch and streaming workloads.

HudiLakehousemetadata management

0 likes · 11 min read

How ByteDance Leverages Hudi for a Real‑Time Data Lake Platform

Ctrip Technology

Dec 16, 2021 · Big Data

Data Standard Management Practices in Ctrip Vacation Data Governance

This article outlines Ctrip Vacation's data standard management approach, covering why standards are needed, the three‑element framework of scope, tools, and policies, and detailed practices for data integration, production change handling, metadata governance, portal dashboard standardization, and self‑service query templating.

Big DataData GovernanceData Integration

0 likes · 12 min read

Data Standard Management Practices in Ctrip Vacation Data Governance

Big Data Technology Architecture

Nov 2, 2021 · Big Data

ByteLake: ByteDance’s Real‑Time Data Lake Platform Built on Apache Hudi

This article presents ByteDance’s ByteLake, a real‑time data lake platform built on Apache Hudi, covering Hudi fundamentals, ByteLake’s use cases, the platform’s architectural optimizations, new features such as a commit‑based metastore and bucket indexing, and future roadmap plans.

Apache HudiBucket IndexByteLake

0 likes · 10 min read

ByteLake: ByteDance’s Real‑Time Data Lake Platform Built on Apache Hudi

DataFunTalk

Aug 11, 2021 · Big Data

OPPO CBFS: Architecture and Key Technologies of a Scalable Data Lake Storage System

This article introduces OPPO's self‑developed data lake storage system CBFS, covering the fundamentals of data lake storage, the multi‑layer CBFS architecture, its core technologies such as metadata management and erasure coding, and future directions for large‑scale, low‑cost data analytics.

CBFSCloud NativeData Lake

0 likes · 14 min read

OPPO CBFS: Architecture and Key Technologies of a Scalable Data Lake Storage System

DataFunTalk

Jul 27, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain

This article describes how Shuhai Supply Chain upgraded its data warehouse from a complex, high‑cost 1.0 architecture to a streamlined, real‑time solution built around Apache Doris, detailing the motivations, design choices, zero‑code ingestion, metadata management, Flink connector, and the resulting performance gains.

Apache DorisBig DataFlink

0 likes · 13 min read

Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain

Big Data Technology & Architecture

Jun 29, 2021 · Big Data

Huawei Data Governance Practices and Metadata Management

This article outlines Huawei's data governance practices, detailing its digital transformation vision, two-stage data management evolution, structured and unstructured data classification frameworks, external data compliance, and comprehensive metadata management architecture, highlighting challenges and solutions for enterprise-wide data assets.

Huaweidata classificationdigital transformation

0 likes · 20 min read

Huawei Data Governance Practices and Metadata Management

Qunar Tech Salon

Jun 21, 2021 · Big Data

Using Apache Iceberg 0.11 with Flink for Real‑time Data Lake: Architecture, Pain Points, and Solutions

This article examines the challenges of using Kafka, Flink, and Hive for real‑time data warehousing, introduces Apache Iceberg 0.11 as a solution, details its architecture, query planning, Flink integration, code examples, optimization techniques, and summarizes the benefits for large‑scale data processing.

Big DataData LakeFlink

0 likes · 12 min read

Using Apache Iceberg 0.11 with Flink for Real‑time Data Lake: Architecture, Pain Points, and Solutions

Big Data Technology Architecture

Jun 10, 2021 · Big Data

Understanding Apache Iceberg: Design, Architecture, and Its Application at NetEase Cloud Music

This article explains Apache Iceberg’s table‑format design, compares it with Hive’s limitations, details its snapshot‑based architecture and metadata handling, and describes how NetEase Cloud Music leveraged Iceberg to dramatically improve large‑scale log processing performance and stability.

Apache IcebergSparkmetadata management

0 likes · 12 min read

Understanding Apache Iceberg: Design, Architecture, and Its Application at NetEase Cloud Music

macrozheng

May 8, 2021 · Big Data

Why Kafka 2.8 Drops Zookeeper: Architecture, Challenges, and KIP‑500

This article explains how Kafka 2.8 removes its dependency on Zookeeper, describes Kafka's core concepts and its interaction with Zookeeper, outlines the role of the Controller, discusses operational complexities and upgrade paths with KIP‑500, and highlights the benefits of the new KRaft‑based architecture.

KIP-500KRaftZookeeper

0 likes · 10 min read

Why Kafka 2.8 Drops Zookeeper: Architecture, Challenges, and KIP‑500

DataFunTalk

Feb 8, 2021 · Big Data

Ozone: The Next‑Generation Distributed Storage System Aiming to Replace HDFS

This article explains how Apache Ozone, built on the HDDS layer, addresses the scalability, memory, and performance limitations of HDFS by splitting metadata services, using RocksDB, implementing fine‑grained locking, RAFT‑based HA, and offering rich APIs, while outlining current challenges and future roadmap.

Big DataHDDSHDFS

0 likes · 29 min read

Ozone: The Next‑Generation Distributed Storage System Aiming to Replace HDFS

DataFunSummit

Nov 17, 2020 · Big Data

Sohu Intelligent Media Data Warehouse Architecture and Technical Practices

This article presents Sohu Intelligent Media's data warehouse construction practice, covering fundamental concepts, batch and real‑time processing, OLAP theory, multidimensional modeling, workflow management, data quality, metadata lineage, and security, with a focus on Apache Doris and a Lambda‑style architecture.

Apache DorisBatch ProcessingData Quality

0 likes · 18 min read

Sohu Intelligent Media Data Warehouse Architecture and Technical Practices

Beike Product & Technology

Nov 13, 2020 · Big Data

Beike One‑Stop Big Data Development Platform: Architecture, Evolution, and Future Outlook

The article summarizes Beike's one‑stop big data development platform, describing its data business background, the evolution from a simple Hadoop‑Kafka‑Hive stack to a metadata‑driven, asset‑oriented platform, and outlines current capabilities in data management, integration, scheduling, quality, openness, and future plans.

Big DataData EngineeringData Governance

0 likes · 11 min read

Beike One‑Stop Big Data Development Platform: Architecture, Evolution, and Future Outlook

Alibaba Cloud Developer

Oct 25, 2020 · Big Data

How Alibaba’s Cloud‑Native Data Lake Solves Big Data Challenges

Alibaba Cloud’s Data Lake Analytics (DLA) tackles the growing complexity of data scenarios by offering cloud‑native, serverless solutions for data lake management, massive metadata construction, and high‑performance Spark and Presto engines, while addressing challenges such as high entry barriers, stability, and multi‑tenant isolation.

Cloud NativeData LakeServerless Spark

0 likes · 22 min read

How Alibaba’s Cloud‑Native Data Lake Solves Big Data Challenges

ITPUB

Oct 16, 2020 · Big Data

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite

This article details NetEase Cloud Music's evolution of a real‑time data warehouse built on Flink 1.9 and Calcite, covering platform scale, architectural design, metadata management, SDK simplifications, monitoring improvements, and concrete use cases such as AB‑testing, live reporting, and feature serving.

Big DataCalciteFlink

0 likes · 8 min read

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite

Architecture Digest

Sep 12, 2020 · Backend Development

Zookeeper Usage Scenarios and Interview Analysis

This article explains common Zookeeper usage scenarios—including distributed coordination, distributed locking, metadata/configuration management, and high‑availability—provides interview‑style analysis, and illustrates each case with diagrams, helping Java developers understand how Zookeeper supports core distributed system functions.

High AvailabilityZookeepercoordination

0 likes · 5 min read

Zookeeper Usage Scenarios and Interview Analysis

dbaplus Community

Aug 18, 2020 · Big Data

Designing a Scalable Financial Data Warehouse: Modeling, Layers, and Quality Control

This article outlines a comprehensive approach to building a financial data warehouse, covering background needs, modeling methodologies, a layered architecture (I, C, S, R), data quality monitoring, metadata management, and detailed naming and coding standards to ensure maintainable, high‑quality data pipelines.

Big DataData QualityData Warehouse

0 likes · 14 min read

Designing a Scalable Financial Data Warehouse: Modeling, Layers, and Quality Control

DataFunTalk

Jul 23, 2020 · Big Data

Design and Implementation of a Financial Data Warehouse: Architecture, Modeling, Quality Control, and Metadata Management

This article outlines the end‑to‑end design of a financial data warehouse, covering background needs, modeling methodology choices, a layered architecture, data quality monitoring, metadata management, naming and coding standards, and future improvement directions.

Big DataData QualitySQL Standards

0 likes · 11 min read

Design and Implementation of a Financial Data Warehouse: Architecture, Modeling, Quality Control, and Metadata Management

58 Tech

Jul 13, 2020 · Big Data

Design and Implementation of a Financial Data Warehouse: Architecture, Modeling, Quality Monitoring, and Metadata Management

This article presents a comprehensive design and implementation guide for a financial data warehouse, covering background needs, modeling methodology choices, a layered architecture, data quality monitoring, metadata management, naming and coding standards, and future development directions.

Big DataData QualityData Warehouse

0 likes · 11 min read

Design and Implementation of a Financial Data Warehouse: Architecture, Modeling, Quality Monitoring, and Metadata Management

Big Data Technology Architecture

Apr 20, 2020 · Big Data

Introduction to HDFS: Architecture, Features, Replication, Rack Awareness, and Metadata Management

This article provides a comprehensive overview of Hadoop Distributed File System (HDFS), covering its streaming data access model, key characteristics, master‑slave architecture, block storage and replication mechanisms, rack‑aware placement strategy, and how the NameNode manages metadata and checkpoints.

Distributed File SystemHDFSHadoop

0 likes · 7 min read

Introduction to HDFS: Architecture, Features, Replication, Rack Awareness, and Metadata Management

Dada Group Technology

Apr 15, 2020 · Big Data

Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL

This article details Dada Group's development of the Dada Flink SQL engine, describing its background, architecture, parser design, dimension‑table join strategies, numerous enhancements such as HA support, Kafka keyword handling, metadata integration, Redis and ClickHouse sinks, BINLOG simplification, and future migration plans toward Flink 1.10.

ClickHouseFlinkReal-Time Computing

0 likes · 12 min read

Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL

Big Data Technology Architecture

Feb 19, 2020 · Big Data

Comparative Analysis of Hudi, Iceberg, and Delta Lake for Data Lake Storage

This article compares three open‑source data‑lake storage layers—Hudi, Iceberg, and Delta Lake—examining their shared reliance on meta‑files for schema and transaction handling, and detailing their differing designs for upserts, streaming support, query performance, and ecosystem integration.

Delta LakeHudiIceberg

0 likes · 13 min read

Comparative Analysis of Hudi, Iceberg, and Delta Lake for Data Lake Storage

dbaplus Community

Jan 14, 2020 · Big Data

How OPPO Built a Real‑Time Data Warehouse with Flink SQL

This article details{32-64 words} OPPO's evolution from an offline data warehouse to a real‑time platform, describing the business scale, data‑mid platform architecture, migration strategy using Flink SQL, extensions like AthenaX, and practical use cases such as real‑time ETL, CTR calculation, and tag import.

Data EngineeringETLFlink

0 likes · 18 min read

How OPPO Built a Real‑Time Data Warehouse with Flink SQL

Big Data Technology & Architecture

Oct 12, 2019 · Big Data

Origin Data Governance Platform: Architecture, Modules, and Implementation at Meituan

The article describes Meituan's Origin Data Governance Platform, detailing its background, challenges, architectural redesign, core modules such as data storage, metadata, business, security, and application management, as well as its internal workflow, achievements, and future roadmap for unified, secure, and high‑performance data services.

Meituanmetadata managementplatform architecture

0 likes · 22 min read

Origin Data Governance Platform: Architecture, Modules, and Implementation at Meituan

Mafengwo Technology

Sep 26, 2019 · Big Data

Mafengwo’s Data Warehouse & Middle Platform: Architecture, Modeling, Toolchain

This article details Mafengwo’s journey in constructing a data warehouse and data middle platform, covering the core three‑layer architecture, hybrid modeling approaches, the supporting toolchain for data synchronization, scheduling, and metadata management, and the design of an indicator platform for business analytics.

Big Data ArchitectureData Middle PlatformData Warehouse

0 likes · 18 min read

Mafengwo’s Data Warehouse & Middle Platform: Architecture, Modeling, Toolchain

DataFunTalk

Aug 1, 2019 · Big Data

Streaming Data Platform Practices and Challenges at Beike Real Estate

This article presents an in‑depth overview of Beike's four‑layer streaming data platform, covering the foundational infrastructure, capability aggregation, data content, and output layers, as well as the challenges of metadata management, real‑time processing, and productization through the Ark and Tianyan systems.

Ark platformBeikeTianyan

0 likes · 14 min read

Streaming Data Platform Practices and Challenges at Beike Real Estate

Meituan Technology Team

Dec 27, 2018 · Big Data

Meituan Origin Data Governance Platform: Architecture and Practices

Meituan’s Origin Data Governance Platform inserts a unified governance layer between its data‑warehouse and application stacks, consolidating metric and dimension definitions, automating metadata management, enforcing security and workflow controls, and delivering cross‑engine query, monitoring and lineage capabilities that resolve inconsistencies and boost trust across dozens of internal data platforms.

metadata managementplatform architecture

0 likes · 21 min read

Meituan Origin Data Governance Platform: Architecture and Practices

Youzan Coder

Aug 3, 2018 · Big Data

Youzan Data Warehouse Metadata System: From Manual Tables to Metadata‑Driven Architecture

Youzan’s data‑warehouse metadata system evolved from manually maintained tables to an automated data dictionary and finally to a metadata‑driven architecture that automatically captures technical, business, and process metadata, visualizes lineage, tracks resource usage, manages synchronization rules and permissions, and now aims to improve novice usability with visual models and impact‑analysis tools.

Big DataData WarehouseHive

0 likes · 11 min read

Youzan Data Warehouse Metadata System: From Manual Tables to Metadata‑Driven Architecture

ITPUB

May 30, 2018 · Backend Development

How JD.com Engineered Its Own Distributed Storage System for Billions of Files

This article chronicles JD.com's journey from recognizing massive storage demands to designing, building, and evolving a self‑developed distributed storage platform—JFS—that handles small and large files, powers a custom image system, object storage, and future container‑native workloads.

Backend EngineeringDistributed storageJFS

0 likes · 16 min read

How JD.com Engineered Its Own Distributed Storage System for Billions of Files

ITFLY8 Architecture Home

Jul 1, 2017 · Fundamentals

Designing Distributed File Systems: Solving Local FS Limits

Distributed file systems extend traditional local storage by partitioning data across multiple servers, using a master node for metadata and coordination, handling namespace, replication, load balancing, caching, and client interfaces, thereby overcoming file size, quantity, and concurrency constraints of ext3, reiserfs, and similar local filesystems.

CachingDistributed File Systemmetadata management

0 likes · 15 min read

Designing Distributed File Systems: Solving Local FS Limits