Tag

metadata

1 views collected around this technical thread.

DataFunSummit
DataFunSummit
Jun 10, 2025 · Big Data

How OpenLake Redefines Data Lake Infrastructure for the AI Era

This article explores OpenLake's evolution as a data lake platform for AI, covering the transition from Hive to modern lake formats like Iceberg and Paimon, performance benchmarks, metadata management advances, intelligent storage optimization, and the integration of multimodal support with the Lance file format.

AIBig DataData Lake
0 likes · 22 min read
How OpenLake Redefines Data Lake Infrastructure for the AI Era
Alimama Tech
Alimama Tech
Apr 10, 2025 · Big Data

Performance Optimization of Apache Paimon in Dolphin OLAP Engine

The article details how Apache Paimon, integrated as an external table format in Alibaba’s Dolphin OLAP engine, achieves millisecond‑level query latency and up to 10k QPS through ORC push‑down, manifest conversion, caching, concurrency, and encoding optimizations, outperforming StarRocks and Hologres.

JavaOLAPPaimon
0 likes · 17 min read
Performance Optimization of Apache Paimon in Dolphin OLAP Engine
DataFunSummit
DataFunSummit
Jan 1, 2025 · Big Data

Douyin Group Data Asset Management Platform: Full‑Stack Data Lineage Evolution and Applications

This article introduces Douyin Group’s end‑to‑end data asset management platform, explains the evolution and architecture of its large‑scale data lineage system, presents quality metrics and ecosystem components, and outlines practical applications and future directions for data governance, development, and security.

Big DataData GovernanceData Lineage
0 likes · 16 min read
Douyin Group Data Asset Management Platform: Full‑Stack Data Lineage Evolution and Applications
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Nov 29, 2024 · Big Data

How Ozone Scales Metadata for Massive Big Data Storage

This article explains Ozone's object storage architecture, its evolution of metadata management using distributed KV stores like Apache Cassandra, and the performance optimizations—read/write separation, unlimited scaling, and partitioning—that enable high‑throughput, low‑latency handling of massive datasets.

Apache CassandraBig DataDistributed KV
0 likes · 9 min read
How Ozone Scales Metadata for Massive Big Data Storage
DeWu Technology
DeWu Technology
Nov 13, 2024 · Backend Development

Evolution of Rainbow Bridge Architecture: Building a Self‑Managed Metadata Center and SDK Enhancements

The new Rainbow Bridge architecture replaces the SLB‑based load‑balancing model with a self‑managed, multi‑AZ metadata center and enhanced SDK that aggregates node health, provides zone‑aware weighted routing, supports rapid failover and manual overrides, and delivers faster recovery and scalable traffic handling.

Distributed SystemsService Discoveryhigh availability
0 likes · 11 min read
Evolution of Rainbow Bridge Architecture: Building a Self‑Managed Metadata Center and SDK Enhancements
Baidu Geek Talk
Baidu Geek Talk
Nov 6, 2024 · Cloud Computing

Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers

Baidu’s Canghai Storage unifies metadata, hierarchical namespace, and data layers into a Meta‑Aware, three‑generation architecture that scales to trillions of metadata items and zettabyte‑scale data, using a distributed transactional KV store, single‑machine‑distributed namespace, and online erasure‑coding micro‑services to deliver high performance, low cost, and seamless scalability.

Big DataDistributed SystemsErasure Coding
0 likes · 18 min read
Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers
DataFunSummit
DataFunSummit
Oct 17, 2024 · Big Data

Waggle Dance Based Metadata Solution at Tongcheng Travel: Architecture, Migration Strategies, and Future Outlook

This article presents Tongcheng Travel's metadata solution built on the open‑source Waggle Dance project, detailing the three‑layer architecture, challenges of a monolithic Hive Metastore, evaluated migration plans, federation implementation, migration workflow, and future directions for unified metadata governance.

Big DataFederationHive Metastore
0 likes · 11 min read
Waggle Dance Based Metadata Solution at Tongcheng Travel: Architecture, Migration Strategies, and Future Outlook
DataFunSummit
DataFunSummit
Sep 12, 2024 · Cloud Native

Design and Implementation of a Next‑Generation Multi‑Protocol Unstructured Storage System for Machine Learning

This article presents the challenges of storing massive machine‑learning datasets, evaluates existing storage solutions, and details the design of OrangeFS—a cloud‑native, multi‑protocol, multi‑tenant unstructured storage system that integrates object and file interfaces, optimizes metadata services, supports hot upgrades, and provides robust scalability and reliability for AI workloads.

Distributed Systemscloud-nativehigh performance
0 likes · 24 min read
Design and Implementation of a Next‑Generation Multi‑Protocol Unstructured Storage System for Machine Learning
DataFunSummit
DataFunSummit
Aug 28, 2024 · Big Data

Building Data Lineage Foundations and Applications for E‑commerce Scenarios

This article explains how to construct a full‑link data lineage platform for e‑commerce, detailing its architecture, quality metrics, and practical uses such as table migration, field‑level tracing, and automated metric decomposition to improve data governance and efficiency.

Big DataData GovernanceData Lineage
0 likes · 14 min read
Building Data Lineage Foundations and Applications for E‑commerce Scenarios
DataFunSummit
DataFunSummit
Jun 19, 2024 · Big Data

Apache Hudi from Zero to One: Introduction to Hudi’s Storage Format (Part 1)

This article introduces Apache Hudi’s storage format, explaining the table layout, metadata and data file organization, the naming conventions of timeline actions, and the trade‑offs between Copy‑on‑Write and Merge‑on‑Read table types for transactional data lakes.

Apache HudiBig DataData Lake
0 likes · 8 min read
Apache Hudi from Zero to One: Introduction to Hudi’s Storage Format (Part 1)
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Jun 14, 2024 · Fundamentals

In-depth Analysis of F2FS Filesystem Superblock, SIT, NAT, and SSA Structures

The article provides a detailed code-level walkthrough of F2FS’s superblock, Segment Information Table, Node Address Table, Summary Area, and Main Area structures, explaining each field, entry layout, and their roles in managing segments, inodes, and data within the filesystem.

F2FSLinux kernelNAT
0 likes · 9 min read
In-depth Analysis of F2FS Filesystem Superblock, SIT, NAT, and SSA Structures
DataFunSummit
DataFunSummit
May 13, 2024 · Big Data

Metadata‑Driven Data Governance: Concepts, Architectures, and Practices

This article explains how metadata‑driven data governance addresses the challenges of the digital economy by detailing the era background, limitations of traditional methods, the roles of Data Fabric, Data Mesh and DataOps, and presenting real‑world case studies and future directions.

AIBig DataData Governance
0 likes · 14 min read
Metadata‑Driven Data Governance: Concepts, Architectures, and Practices
DataFunSummit
DataFunSummit
Apr 26, 2024 · Big Data

Didi's Big Data Cost Governance Practices

This article details Didi's comprehensive big data cost governance framework, covering its data architecture, asset management scoring, Hadoop and Elasticsearch cost optimization methods, and practical insights on organizational processes and incentives for effective cost control.

Big DataHadoopcost governance
0 likes · 17 min read
Didi's Big Data Cost Governance Practices
Bilibili Tech
Bilibili Tech
Apr 26, 2024 · Big Data

Fine-Grained Lock Optimization for HDFS NameNode to Improve Metadata Read/Write Performance

To overcome the NameNode write bottleneck caused by a single global read/write lock in Bilibili’s massive HDFS deployment, the team introduced hierarchical fine‑grained locking—splitting the lock into Namespace, BlockPool, and per‑INode levels—which yielded up to three‑fold write throughput gains, a 90 % drop in RPC queue time, and shifted performance limits from lock contention to log synchronization.

Big DataHDFSNameNode
0 likes · 15 min read
Fine-Grained Lock Optimization for HDFS NameNode to Improve Metadata Read/Write Performance
DataFunSummit
DataFunSummit
Mar 18, 2024 · Big Data

Scenario‑Based Data Governance Practices in the Securities Industry

This article presents a comprehensive, scenario-driven data governance practice at Guoxin Securities, covering the industry's pain points, a three‑layer governance framework, detailed implementations for data standards, metadata, data quality, data modeling, and data security, and outlines future directions for intelligent and measurable governance.

Big DataData Governancedata quality
0 likes · 30 min read
Scenario‑Based Data Governance Practices in the Securities Industry
DataFunTalk
DataFunTalk
Mar 17, 2024 · Databases

MatrixOne Storage Format Design Overview

This article provides a comprehensive overview of MatrixOne's hyper‑converged cloud‑native database architecture, detailing its three‑layer design, data execution flow, columnar storage format, metadata hierarchy, performance optimizations, compatibility mechanisms, and practical usage scenarios.

CompatibilityDistributed DatabaseMatrixOne
0 likes · 12 min read
MatrixOne Storage Format Design Overview
DataFunSummit
DataFunSummit
Feb 13, 2024 · Big Data

Guoxin Securities Data Governance Serviceization: Frameworks, Practices, and Insights

This article presents Guoxin Securities' comprehensive data governance journey, detailing the regulatory background, strategic vision, service‑oriented implementation across data standards, quality, modeling, metadata, and security, and highlighting the resulting business value and future directions.

Data Governancedata qualitydata security
0 likes · 29 min read
Guoxin Securities Data Governance Serviceization: Frameworks, Practices, and Insights
政采云技术
政采云技术
Jan 23, 2024 · Big Data

Design and Implementation of a Big Data Permission Management System

This article outlines the background, importance, scenarios, challenges, objectives, and architectural design—including RBAC and ABAC models, metadata integration, data classification, and verification mechanisms—of a comprehensive big data permission management system for secure and fine‑grained data access.

ABACAccess ControlRBAC
0 likes · 14 min read
Design and Implementation of a Big Data Permission Management System
DataFunTalk
DataFunTalk
Jan 8, 2024 · Big Data

Didi's Big Data Cost Governance Practices and Framework

This article presents Didi's comprehensive big data cost governance approach, detailing the overall framework, data system architecture, asset management platform, Hadoop and Elasticsearch cost‑control practices, metadata‑driven optimization, and organizational insights for effective resource and budget management.

Big DataData PlatformHadoop
0 likes · 19 min read
Didi's Big Data Cost Governance Practices and Framework