Tagged articles
184 articles
Page 1 of 2
AI Engineer Programming
AI Engineer Programming
May 20, 2026 · Artificial Intelligence

Why Chunk‑Based RAG Fails and How IdeaBlocks Improve Retrieval

The article argues that the common assumption that text chunks are the proper knowledge unit in RAG pipelines is flawed, leading to versioning, metadata, and redundancy problems, and demonstrates that replacing chunks with structured IdeaBlocks dramatically reduces corpus size, token usage, and improves vector relevance.

IdeaBlockLLMRAG
0 likes · 10 min read
Why Chunk‑Based RAG Fails and How IdeaBlocks Improve Retrieval
DataFunSummit
DataFunSummit
May 14, 2026 · Big Data

How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse

The article examines the challenges of multimodal data in modern lakehouses and presents a three‑tool stack—Gravitino, Daft, and Lance—that provides unified metadata, distributed multimodal compute, and high‑performance storage, while detailing security governance, integration paths, and future directions.

DaftGravitinoLakehouse
0 likes · 11 min read
How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse
Woodpecker Software Testing
Woodpecker Software Testing
Apr 30, 2026 · Databases

Datafaker: A Powerful Tool for Bulk Test Data Generation

Datafaker is a Python‑compatible utility that creates large volumes of synthetic test data for databases, streams, files, and messaging systems, offering flexible metadata rules, multi‑backend support, and command‑line options for quick data provisioning.

ElasticsearchKafkaPython
0 likes · 14 min read
Datafaker: A Powerful Tool for Bulk Test Data Generation
Ray's Galactic Tech
Ray's Galactic Tech
Apr 27, 2026 · Artificial Intelligence

Using AI to Auto‑Generate Forms: Production‑Ready Low‑Code Form Generation with Spring AI Alibaba ReactAgent

The article presents a production‑grade solution that lets users describe a form in natural language, then uses a Spring AI Alibaba ReactAgent powered by a ReAct reasoning loop to retrieve templates, validate fields, generate layout, enforce governance, and finally emit a versioned JSON schema ready for deployment.

ObservabilityReactReactAgent
0 likes · 29 min read
Using AI to Auto‑Generate Forms: Production‑Ready Low‑Code Form Generation with Spring AI Alibaba ReactAgent
AI Architect Hub
AI Architect Hub
Apr 25, 2026 · Artificial Intelligence

How to Feed Massive Documents to an RAG System: Mastering the Art of Text Chunking

This article explains why proper text chunking is critical for Retrieval‑Augmented Generation, illustrates common pitfalls with real‑world examples, compares four chunking strategies (fixed length, recursive, structure‑aware, and code‑aware), and provides practical guidelines for chunk size, overlap, metadata handling, and a production‑ready pipeline.

AI RetrievalLangChainRAG
0 likes · 21 min read
How to Feed Massive Documents to an RAG System: Mastering the Art of Text Chunking
Su San Talks Tech
Su San Talks Tech
Apr 19, 2026 · Artificial Intelligence

Boost Enterprise RAG: Data Pipeline Tricks, Hybrid Search & Rerank

To make Retrieval‑Augmented Generation reliable in production, the article outlines five key engineering tactics—semantic chunking with metadata, hybrid vector‑keyword search, two‑stage retrieval with reranking, query rewriting and expansion, and dynamic result evaluation—each illustrated with concrete examples and code snippets.

AI EngineeringHybrid SearchQuery Rewriting
0 likes · 10 min read
Boost Enterprise RAG: Data Pipeline Tricks, Hybrid Search & Rerank
James' Growth Diary
James' Growth Diary
Apr 17, 2026 · Artificial Intelligence

How to Load and Split Documents for RAG: First Step to Building a Knowledge Base

This tutorial explains why document loading and splitting are critical for RAG pipelines, introduces LangChain's Document format, demonstrates loaders for various file types, details the RecursiveCharacterTextSplitter and alternative splitters, and provides practical tips on parameter tuning, metadata preservation, Chinese text handling, and common pitfalls.

AIDocument LoaderLangChain
0 likes · 27 min read
How to Load and Split Documents for RAG: First Step to Building a Knowledge Base
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 7, 2026 · Artificial Intelligence

Mastering Offline Document Parsing for RAG: From PDFs to Multimodal Knowledge Bases

This article provides a comprehensive guide to offline document parsing for Retrieval‑Augmented Generation, covering multi‑format extraction, layout analysis, OCR pitfalls, chunking strategies, hierarchical metadata tagging, and how these steps directly affect retrieval accuracy and overall RAG performance.

Document ParsingRAGmetadata
0 likes · 14 min read
Mastering Offline Document Parsing for RAG: From PDFs to Multimodal Knowledge Bases
DataFunTalk
DataFunTalk
Mar 3, 2026 · Big Data

Exploring Tencent Cloud’s Iceberg Batch‑Stream Integration and AI‑Driven Data Governance

This article presents a series of seven technical case studies—including Tencent Cloud’s Iceberg‑based batch‑stream integration, AI‑driven data governance with Apache Gravitino, Xiaohongshu’s lakehouse evolution, and a multimodal data‑lake solution—detailing challenges, architectural designs, implementation steps, performance results, and future directions.

AIBig DataData Lake
0 likes · 8 min read
Exploring Tencent Cloud’s Iceberg Batch‑Stream Integration and AI‑Driven Data Governance
Baidu Geek Talk
Baidu Geek Talk
Feb 9, 2026 · Databases

How Mantle Redefined Cloud Object Storage Metadata for Billion‑File Scale

This article recounts how Baidu's storage team tackled the performance and scalability limits of traditional object storage by redesigning metadata handling with the Mantle and MantleX architectures, introducing a centralized IndexNode, strong consistency, delta‑record writes, and a seamless single‑node to distributed transition for massive file systems.

FilesystemPerformance OptimizationScalability
0 likes · 37 min read
How Mantle Redefined Cloud Object Storage Metadata for Billion‑File Scale
DataFunTalk
DataFunTalk
Dec 26, 2025 · Cloud Native

How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing

Haier’s digital transformation leverages a cloud‑native, open‑source‑based multi‑modal data lake that unifies structured and unstructured industrial data, uses metadata models and knowledge graphs for governance, and provides AI‑ready services that balance performance, cost, and real‑time requirements.

AIData LakeMultimodal Data
0 likes · 12 min read
How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing
DataFunSummit
DataFunSummit
Dec 19, 2025 · Cloud Native

How HiSilicon Uses Cloud‑Native Architecture to Build a Multi‑Modal Data Lake

Amid the AI wave, HiSilicon’s digital transformation tackles fragmented industrial data by adopting a cloud‑native, open‑source stack centered on Paimon, creating a unified metadata model, knowledge graph, and elastic scheduling that balances performance and cost while powering AI‑ready services across nine business domains.

AIKnowledge Graphbig-data
0 likes · 12 min read
How HiSilicon Uses Cloud‑Native Architecture to Build a Multi‑Modal Data Lake
dbaplus Community
dbaplus Community
Dec 6, 2025 · Big Data

Why Precise Data Warehouse Naming Boosts Efficiency and Cuts Costs

In the era of digital transformation, chaotic data warehouse naming wastes resources, while a well‑defined naming convention improves maintainability, collaboration, and business value, as demonstrated by real‑world cases showing three‑fold query speed gains and up to 60% reduction in cross‑team effort.

Big DataData Warehousebest practices
0 likes · 6 min read
Why Precise Data Warehouse Naming Boosts Efficiency and Cuts Costs
DataFunTalk
DataFunTalk
Nov 22, 2025 · Big Data

How Modern Data Lakes and AI Governance Transform Enterprise Analytics

This article collection examines Tencent Cloud’s Iceberg batch‑stream integration, AI‑driven game data governance, Apache Gravitino unified metadata and lineage, Xiaohongshu’s multimodal data‑lake evolution, and Volcano Engine’s Data+AI multimodal lake, highlighting architectures, techniques, performance gains, and practical implementations.

AI GovernanceData LakeGravitino
0 likes · 7 min read
How Modern Data Lakes and AI Governance Transform Enterprise Analytics
DataFunSummit
DataFunSummit
Oct 22, 2025 · Big Data

How Douyin’s Data Asset Platform Revolutionizes Big Data Lineage

This article introduces Douyin Group’s comprehensive data asset management platform, explains why it emphasizes data assets over raw metadata, outlines its full‑linkage lineage capabilities, and presents practical insights on building, applying, and future‑proofing big data lineage within complex enterprise environments.

Big DataData Asset ManagementData Lineage
0 likes · 5 min read
How Douyin’s Data Asset Platform Revolutionizes Big Data Lineage
DataFunSummit
DataFunSummit
Oct 19, 2025 · Big Data

How Apache Gravitino and OpenLineage Transform Data Governance in the AI Era

This article explains how the rapid rise of AI and large‑model technologies is driving a paradigm shift in data governance toward intelligent, automated, and real‑time collaboration, outlines the challenges of multi‑cloud environments, and demonstrates how Apache Gravitino and OpenLineage provide a unified metadata and lineage solution that improves data quality, compliance, and business agility.

Apache GravitinoBig DataData Lineage
0 likes · 12 min read
How Apache Gravitino and OpenLineage Transform Data Governance in the AI Era
DataFunSummit
DataFunSummit
Oct 14, 2025 · Big Data

How Douyin’s Data Asset Platform Redefines Big Data Lineage

This article introduces Douyin Group’s one‑stop Data Asset Management Platform, explains why the company focuses on data assets rather than raw metadata, and details the evolution, architecture, applications, and future outlook of its comprehensive big‑data lineage system.

Big DataData Asset ManagementData Governance
0 likes · 5 min read
How Douyin’s Data Asset Platform Redefines Big Data Lineage
DataFunSummit
DataFunSummit
Oct 11, 2025 · Big Data

What Small Banks Can Learn from Cutting-Edge Data Governance Practices

This article shares a data‑governance roadmap for small and medium banks, covering industry pain points, high‑quality data sets, a three‑step governance path, data standards, metadata management, master‑data strategy, business data modeling, a hybrid Greenplum‑Hadoop platform, quality monitoring, and a maturity assessment framework.

BankingBig DataData Architecture
0 likes · 21 min read
What Small Banks Can Learn from Cutting-Edge Data Governance Practices
Sohu Tech Products
Sohu Tech Products
Oct 9, 2025 · Mobile Development

How Android Dynamic Photos Work: XMP Metadata, Formats, and Kotlin Extraction

This article explores the technical architecture of Android dynamic photos, detailing the three‑layer file structure, XMP metadata specifications, and differences among Xiaomi Micro Video, Google Motion Photo, and OPPO O Live Photo, and provides a unified Kotlin solution for detection, parsing, and playback.

AndroidDynamic PhotoKotlin
0 likes · 25 min read
How Android Dynamic Photos Work: XMP Metadata, Formats, and Kotlin Extraction
JD Tech
JD Tech
Oct 9, 2025 · Artificial Intelligence

What Is Retrieval‑Augmented Generation (RAG) and How Does It Boost AI Accuracy?

This article explains Retrieval‑Augmented Generation (RAG), an AI framework that combines external knowledge retrieval with large language models, covering its motivations, data preparation, chunking strategies, vectorization, storage, query processing, retrieval, reranking, prompt engineering, and LLM generation, plus practical optimization tips.

LLMRAGchunking
0 likes · 14 min read
What Is Retrieval‑Augmented Generation (RAG) and How Does It Boost AI Accuracy?
Continuous Delivery 2.0
Continuous Delivery 2.0
Sep 11, 2025 · Artificial Intelligence

Building Scalable Enterprise RAG: Lessons, Pitfalls, and Proven Solutions

This article shares practical lessons from building a large‑scale enterprise RAG system, covering imperfect data, document quality scoring, hierarchical chunking, metadata design, semantic‑search failures, open‑source model choices, and table handling to achieve reliable AI‑driven search.

Enterprise AIOpen-source modelsRAG
0 likes · 13 min read
Building Scalable Enterprise RAG: Lessons, Pitfalls, and Proven Solutions
DataFunTalk
DataFunTalk
Sep 1, 2025 · Big Data

How JD Retail Tackles Data Governance Challenges to Boost Efficiency

JD Retail outlines the growing data management challenges it faces—including asset discovery, architecture agility, development quality, and rising IT costs—and presents a comprehensive data governance framework that leverages standards, agile architecture, development isolation, and resource optimization to improve efficiency and reduce operational expenses.

Big DataData GovernanceData Management
0 likes · 7 min read
How JD Retail Tackles Data Governance Challenges to Boost Efficiency
Big Data Technology Tribe
Big Data Technology Tribe
Aug 22, 2025 · Backend Development

How StarRocks Keeps Metadata Consistent Across FE Nodes

This article explains the roles of StarRocks FE and BE nodes, details the metadata stored in FE, describes the leader‑follower‑observer architecture, and shows how BDB JE replication, journal logs, and checkpoint mechanisms ensure metadata synchronization and durability even after node failures.

BDB JEDistributed SystemsReplication
0 likes · 17 min read
How StarRocks Keeps Metadata Consistent Across FE Nodes
DataFunSummit
DataFunSummit
Jun 10, 2025 · Big Data

How OpenLake Redefines Data Lake Infrastructure for the AI Era

This article explores OpenLake's evolution as a data lake platform for AI, covering the transition from Hive to modern lake formats like Iceberg and Paimon, performance benchmarks, metadata management advances, intelligent storage optimization, and the integration of multimodal support with the Lance file format.

AIBig DataData Lake
0 likes · 22 min read
How OpenLake Redefines Data Lake Infrastructure for the AI Era
Architecture and Beyond
Architecture and Beyond
May 1, 2025 · Industry Insights

How Tag Systems Become the Brain of Digital Content – An Architect’s Guide

This article examines tag systems as the neural network of digital content, comparing them with traditional hierarchies, tracing their evolution, outlining business‑driven design steps, and detailing architectural components, non‑functional requirements, integration patterns, and future AI‑enhanced trends.

AI taggingScalabilityarchitecture
0 likes · 24 min read
How Tag Systems Become the Brain of Digital Content – An Architect’s Guide
Big Data Tech Team
Big Data Tech Team
Apr 28, 2025 · Big Data

Mastering Metadata, Master Data, and Data Governance: A Complete Guide

This article explains the core concepts of metadata, master data, data resources, data governance, and data management, outlines their roles, compares governance with management, and provides practical steps and best‑practice recommendations for building a robust enterprise data framework.

Big DataData GovernanceMaster Data
0 likes · 15 min read
Mastering Metadata, Master Data, and Data Governance: A Complete Guide
Big Data Tech Team
Big Data Tech Team
Apr 16, 2025 · Operations

Mastering Data Warehouse Naming: A Complete Guide to Standards and Processes

This article provides a comprehensive, step‑by‑step guide to data‑warehouse development, covering the full R&D workflow, data modeling layers, data dictionary creation, naming conventions for tables, columns, indexes and ETL jobs, metric standardization, and governance processes to ensure consistent, maintainable data assets across the organization.

ETLdata dictionarymetadata
0 likes · 28 min read
Mastering Data Warehouse Naming: A Complete Guide to Standards and Processes
Alimama Tech
Alimama Tech
Apr 10, 2025 · Big Data

Performance Optimization of Apache Paimon in Dolphin OLAP Engine

The article details how Apache Paimon, integrated as an external table format in Alibaba’s Dolphin OLAP engine, achieves millisecond‑level query latency and up to 10k QPS through ORC push‑down, manifest conversion, caching, concurrency, and encoding optimizations, outperforming StarRocks and Hologres.

DolphinJavaOLAP
0 likes · 17 min read
Performance Optimization of Apache Paimon in Dolphin OLAP Engine
Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
Feb 17, 2025 · Backend Development

Swiss Tables in Go 1.24: Open Addressing, SIMD, and Metadata Secrets

The article explains how Go 1.24’s new Swiss Tables hash‑map implementation replaces the traditional bucket‑based design with open addressing, SIMD‑accelerated probing, and metadata separation, detailing the underlying principles, performance advantages, handling of clustering and deletions, and a comparison with previous Go maps and Java’s HashMap.

GoSIMDhash map
0 likes · 16 min read
Swiss Tables in Go 1.24: Open Addressing, SIMD, and Metadata Secrets
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 1, 2025 · Big Data

Douyin Group Data Asset Management Platform: Comprehensive Data Lineage Overview and Practices

This article presents a detailed overview of Douyin Group's Data Asset Management Platform, focusing on the evolution, architecture, modeling, metrics, and application scenarios of its large‑scale data lineage system, and outlines future directions for full‑coverage, fine‑grained lineage capabilities.

Big DataData Asset ManagementData Lineage
0 likes · 17 min read
Douyin Group Data Asset Management Platform: Comprehensive Data Lineage Overview and Practices
DataFunSummit
DataFunSummit
Jan 1, 2025 · Big Data

Douyin Group Data Asset Management Platform: Full‑Stack Data Lineage Evolution and Applications

This article introduces Douyin Group’s end‑to‑end data asset management platform, explains the evolution and architecture of its large‑scale data lineage system, presents quality metrics and ecosystem components, and outlines practical applications and future directions for data governance, development, and security.

Data Asset PlatformData GovernanceData Lineage
0 likes · 16 min read
Douyin Group Data Asset Management Platform: Full‑Stack Data Lineage Evolution and Applications
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Nov 29, 2024 · Big Data

How Ozone Scales Metadata for Massive Big Data Storage

This article explains Ozone's object storage architecture, its evolution of metadata management using distributed KV stores like Apache Cassandra, and the performance optimizations—read/write separation, unlimited scaling, and partitioning—that enable high‑throughput, low‑latency handling of massive datasets.

Apache CassandraBig DataDistributed KV
0 likes · 9 min read
How Ozone Scales Metadata for Massive Big Data Storage
DeWu Technology
DeWu Technology
Nov 13, 2024 · Backend Development

Evolution of Rainbow Bridge Architecture: Building a Self‑Managed Metadata Center and SDK Enhancements

The new Rainbow Bridge architecture replaces the SLB‑based load‑balancing model with a self‑managed, multi‑AZ metadata center and enhanced SDK that aggregates node health, provides zone‑aware weighted routing, supports rapid failover and manual overrides, and delivers faster recovery and scalable traffic handling.

Distributed Systemsload balancingmetadata
0 likes · 11 min read
Evolution of Rainbow Bridge Architecture: Building a Self‑Managed Metadata Center and SDK Enhancements
Baidu Geek Talk
Baidu Geek Talk
Nov 6, 2024 · Cloud Computing

Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers

Baidu’s Canghai Storage unifies metadata, hierarchical namespace, and data layers into a Meta‑Aware, three‑generation architecture that scales to trillions of metadata items and zettabyte‑scale data, using a distributed transactional KV store, single‑machine‑distributed namespace, and online erasure‑coding micro‑services to deliver high performance, low cost, and seamless scalability.

Big DataDistributed SystemsNewSQL
0 likes · 18 min read
Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Nov 4, 2024 · Cloud Computing

How Baidu’s Unified Storage Platform Tackles AI‑Era Data Challenges

This article details Baidu’s unified storage architecture—covering its metadata, hierarchical namespace, and data layers—explaining how meta‑aware design, custom partitioning, flexible engines, and micro‑service based erasure coding together meet the scalability, performance, and cost demands of modern AI‑driven cloud storage workloads.

Microservicescloud storageerasure coding
0 likes · 17 min read
How Baidu’s Unified Storage Platform Tackles AI‑Era Data Challenges
Data Thinking Notes
Data Thinking Notes
Oct 29, 2024 · Big Data

Unlocking Data Value: A Complete Guide to Data Asset Management and Governance

This article explores how enterprises can systematically identify, inventory, and govern massive data assets by defining key concepts, adopting frameworks like DAMA and DCMM, building layered management structures, and implementing integrated platforms for data integration, metadata, master data, standards, quality, and security to unlock data-driven value.

DAMADCMMData Asset Management
0 likes · 15 min read
Unlocking Data Value: A Complete Guide to Data Asset Management and Governance
DataFunSummit
DataFunSummit
Oct 17, 2024 · Big Data

Waggle Dance Based Metadata Solution at Tongcheng Travel: Architecture, Migration Strategies, and Future Outlook

This article presents Tongcheng Travel's metadata solution built on the open‑source Waggle Dance project, detailing the three‑layer architecture, challenges of a monolithic Hive Metastore, evaluated migration plans, federation implementation, migration workflow, and future directions for unified metadata governance.

Data MigrationFederationHive Metastore
0 likes · 11 min read
Waggle Dance Based Metadata Solution at Tongcheng Travel: Architecture, Migration Strategies, and Future Outlook
DataFunSummit
DataFunSummit
Sep 12, 2024 · Cloud Native

Design and Implementation of a Next‑Generation Multi‑Protocol Unstructured Storage System for Machine Learning

This article presents the challenges of storing massive machine‑learning datasets, evaluates existing storage solutions, and details the design of OrangeFS—a cloud‑native, multi‑protocol, multi‑tenant unstructured storage system that integrates object and file interfaces, optimizes metadata services, supports hot upgrades, and provides robust scalability and reliability for AI workloads.

Cloud NativeMulti-Protocolhigh performance
0 likes · 24 min read
Design and Implementation of a Next‑Generation Multi‑Protocol Unstructured Storage System for Machine Learning
G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
Sep 9, 2024 · Product Management

What Can Logistics Software Learn from Oracle EBS and OFSA Design?

This article examines how the flexible, metadata‑driven architecture and configurable features of Oracle E‑Business Suite and OFSA can inspire more adaptable, modular, and user‑centric logistics software, covering design principles, extensibility, integration, and data‑model strategies.

LogisticsOFSAOracle EBS
0 likes · 25 min read
What Can Logistics Software Learn from Oracle EBS and OFSA Design?
AI Large Model Application Practice
AI Large Model Application Practice
Aug 29, 2024 · Artificial Intelligence

8 Essential Indexing Strategies to Boost Enterprise RAG Performance

This article presents eight practical optimization recommendations for the indexing stage of enterprise‑level Retrieval‑Augmented Generation (RAG) applications, covering chunk creation, abbreviation handling, multimodal document processing, semantic enrichment, metadata usage, alternative index types, and embedding model selection.

RAGchunkingindexing
0 likes · 15 min read
8 Essential Indexing Strategies to Boost Enterprise RAG Performance
DataFunSummit
DataFunSummit
Aug 28, 2024 · Big Data

Building Data Lineage Foundations and Applications for E‑commerce Scenarios

This article explains how to construct a full‑link data lineage platform for e‑commerce, detailing its architecture, quality metrics, and practical uses such as table migration, field‑level tracing, and automated metric decomposition to improve data governance and efficiency.

Data GovernanceData Lineagee‑commerce
0 likes · 14 min read
Building Data Lineage Foundations and Applications for E‑commerce Scenarios
Bilibili Tech
Bilibili Tech
Apr 26, 2024 · Big Data

Fine-Grained Lock Optimization for HDFS NameNode to Improve Metadata Read/Write Performance

To overcome the NameNode write bottleneck caused by a single global read/write lock in Bilibili’s massive HDFS deployment, the team introduced hierarchical fine‑grained locking—splitting the lock into Namespace, BlockPool, and per‑INode levels—which yielded up to three‑fold write throughput gains, a 90 % drop in RPC queue time, and shifted performance limits from lock contention to log synchronization.

Big DataHDFSNameNode
0 likes · 15 min read
Fine-Grained Lock Optimization for HDFS NameNode to Improve Metadata Read/Write Performance
DataFunSummit
DataFunSummit
Mar 18, 2024 · Big Data

Scenario‑Based Data Governance Practices in the Securities Industry

This article presents a comprehensive, scenario-driven data governance practice at Guoxin Securities, covering the industry's pain points, a three‑layer governance framework, detailed implementations for data standards, metadata, data quality, data modeling, and data security, and outlines future directions for intelligent and measurable governance.

Big DataData Qualitydata security
0 likes · 30 min read
Scenario‑Based Data Governance Practices in the Securities Industry
DataFunTalk
DataFunTalk
Mar 17, 2024 · Databases

MatrixOne Storage Format Design Overview

This article provides a comprehensive overview of MatrixOne's hyper‑converged cloud‑native database architecture, detailing its three‑layer design, data execution flow, columnar storage format, metadata hierarchy, performance optimizations, compatibility mechanisms, and practical usage scenarios.

CompatibilityMatrixOneStorage Engine
0 likes · 12 min read
MatrixOne Storage Format Design Overview
政采云技术
政采云技术
Jan 23, 2024 · Big Data

Design and Implementation of a Big Data Permission Management System

This article outlines the background, importance, scenarios, challenges, objectives, and architectural design—including RBAC and ABAC models, metadata integration, data classification, and verification mechanisms—of a comprehensive big data permission management system for secure and fine‑grained data access.

ABACBig DataRBAC
0 likes · 14 min read
Design and Implementation of a Big Data Permission Management System
DataFunTalk
DataFunTalk
Jan 8, 2024 · Big Data

Didi's Big Data Cost Governance Practices and Framework

This article presents Didi's comprehensive big data cost governance approach, detailing the overall framework, data system architecture, asset management platform, Hadoop and Elasticsearch cost‑control practices, metadata‑driven optimization, and organizational insights for effective resource and budget management.

Resource Optimizationcost governancemetadata
0 likes · 19 min read
Didi's Big Data Cost Governance Practices and Framework
Architect
Architect
Dec 31, 2023 · Industry Insights

How Mooncake Automated API Documentation and Built a Metadata Hub

The article details how the Mooncake platform tackled outdated, manually‑maintained API docs by introducing naming conventions, a one‑click IntelliJ plugin, GitLab MR auto‑parsing, and a metadata center that supports debugging, mocking, and downstream consumption, saving developers hundreds of hours per release.

API documentationAutomationDebugging
0 likes · 18 min read
How Mooncake Automated API Documentation and Built a Metadata Hub
Ctrip Technology
Ctrip Technology
Nov 23, 2023 · Big Data

Optimizing Data Warehouse Timeliness Using Metadata Lineage

This article presents a metadata‑driven approach to improve data warehouse timeliness by extracting upstream lineage, identifying over‑layered, duplicate, and critical‑path tasks, and applying targeted scheduling and code‑level optimizations, demonstrated with a hotel order wide‑table case study.

DAGData WarehouseLineage
0 likes · 7 min read
Optimizing Data Warehouse Timeliness Using Metadata Lineage
php Courses
php Courses
Nov 21, 2023 · Backend Development

Using PHP 8 Attributes to Manage Code Metadata

This article explains PHP 8's new Attributes feature, describing what attributes are, how to attach them to classes and methods with examples like @Table and @Route, and demonstrates retrieving attribute values via reflection to enable flexible metadata management.

AttributesPHP8Reflection
0 likes · 4 min read
Using PHP 8 Attributes to Manage Code Metadata
DataFunSummit
DataFunSummit
Nov 10, 2023 · Operations

Data Model Governance Practices at Taobao (Tao Tian Group)

This article presents a comprehensive overview of Taobao's data model governance, covering background challenges, a four‑pillar solution framework, detailed practices such as invalid table decommissioning, source‑table consolidation, data handover, public‑layer operations, incremental control, productization, and future planning to improve efficiency, cost, and quality of large‑scale data models.

Data Governancemetadatamodel governance
0 likes · 26 min read
Data Model Governance Practices at Taobao (Tao Tian Group)
Huya Tech Engineering
Huya Tech Engineering
Nov 10, 2023 · Operations

How a Unified Metadata Platform Boosts SRE Efficiency and Cuts Costs

This article describes how Huya built a unified metadata platform to break data silos across its SRE systems, enabling standardized data ingestion, correlation, and analysis that improve resource governance, root‑cause diagnosis, and overall cost‑efficiency for large‑scale live streaming services.

DevOpsObservabilitySRE
0 likes · 13 min read
How a Unified Metadata Platform Boosts SRE Efficiency and Cuts Costs
Data Thinking Notes
Data Thinking Notes
Oct 29, 2023 · Big Data

How Banks Can Master Data Governance: 9 Core Domains Explained

This article outlines why banks need robust data governance, describes nine essential domains—including data models, metadata, standards, quality, lifecycle, distribution, exchange, security and services—and explains how big‑data techniques can drive innovation, risk control, and refined decision‑making in banking.

metadata
0 likes · 17 min read
How Banks Can Master Data Governance: 9 Core Domains Explained
php Courses
php Courses
Oct 23, 2023 · Backend Development

Using PHP 8 Attributes to Manage Code Metadata

This article explains PHP 8’s new Attributes feature, describing what attributes are, how to attach custom attributes such as @Table and @Route to classes and methods, and demonstrates retrieving attribute values via reflection, providing clear code examples for backend developers.

AttributesBackendPHP
0 likes · 5 min read
Using PHP 8 Attributes to Manage Code Metadata
JD Cloud Developers
JD Cloud Developers
Oct 19, 2023 · Frontend Development

How Dynamic Forms Transform Custom Business Workflows

This article explains what dynamic forms are, why they are needed for tenant‑specific business scenarios, and outlines a three‑step metadata‑driven implementation—including data partitioning, metadata design, and front‑end rendering—while also discussing their limitations.

Dynamic Formsbackend data partitioningconfigurable UI
0 likes · 5 min read
How Dynamic Forms Transform Custom Business Workflows
dbaplus Community
dbaplus Community
Sep 6, 2023 · Backend Development

How to Scale a Schema‑Free Classification Platform to 100 Billion Records

This article explains how to design a classification‑information system that handles 100 billion rows, ten‑thousand dynamic attributes, and hundreds of thousands of QPS by using vertical partitioning, unified metadata services, and an external search layer for scalable storage and retrieval.

Backenddatabasesmetadata
0 likes · 12 min read
How to Scale a Schema‑Free Classification Platform to 100 Billion Records
DeWu Technology
DeWu Technology
Sep 4, 2023 · Backend Development

How Mooncake Automates API Docs, Builds a Metadata Hub, and Boosts Development Efficiency

This article examines the challenges of manual API documentation, introduces Mooncake’s standardized organization, the MooncakeUpload IntelliJ plugin for one‑click doc generation, GitLab MR auto‑parsing for continuous updates, and the API metadata center that enhances debugging, mocking, and cross‑team collaboration.

API documentationAutomationBackend Tools
0 likes · 19 min read
How Mooncake Automates API Docs, Builds a Metadata Hub, and Boosts Development Efficiency
Ximalaya Technology Team
Ximalaya Technology Team
Aug 17, 2023 · R&D Management

FeiKu: Ximalaya's Low‑Code Platform for Rapid Business Application Development

FeiKu, Ximalaya’s low‑code platform, lets business users design and publish full‑stack applications through drag‑and‑drop configuration, providing built‑in permission, workflow, scripting and API integration, which has already generated nearly 300 internal apps and dramatically cut repetitive development while still evolving performance and openness.

FeiKuWorkflow EngineXimalaya
0 likes · 9 min read
FeiKu: Ximalaya's Low‑Code Platform for Rapid Business Application Development
DataFunTalk
DataFunTalk
Jul 26, 2023 · Big Data

Data Model Governance Practices at Taobao (Alibaba)

This article presents a comprehensive case study of Taobao's data model governance, detailing the background challenges, the four‑pillar solution framework, specific governance practices such as invalid table decommissioning, data handover, public layer operations, incremental control, productization, future plans, and a Q&A session.

AlibabaDataWorksmetadata
0 likes · 26 min read
Data Model Governance Practices at Taobao (Alibaba)
Architects Research Society
Architects Research Society
Jul 21, 2023 · Big Data

Understanding Data Fabric Architecture: Key Pillars for Modern Data Management and Integration

The article explains what Data Fabric (also called data weaving) is, outlines its four essential pillars—metadata collection, active metadata, knowledge‑graph management, and a robust integration backbone—and shows how D&A leaders can adopt this design to achieve agile, AI‑enabled data integration across hybrid and multi‑cloud environments.

AI/MLData Managementmetadata
0 likes · 10 min read
Understanding Data Fabric Architecture: Key Pillars for Modern Data Management and Integration
Architects' Tech Alliance
Architects' Tech Alliance
Jun 13, 2023 · Fundamentals

HadaFS: A Scalable Burst Buffer File System for Exascale Supercomputers

The article introduces HadaFS, a novel burst‑buffer file system that combines the scalability and performance of local burst buffers with the data‑sharing and cost advantages of shared buffers, details its LTA architecture, metadata handling, and evaluates its superior performance on the SNS supercomputer against BeeGFS and traditional GFS solutions.

Burst BufferHPCSNS supercomputer
0 likes · 16 min read
HadaFS: A Scalable Burst Buffer File System for Exascale Supercomputers
DataFunTalk
DataFunTalk
Jun 9, 2023 · Big Data

Cloud Music Data Governance Practice

This article presents a comprehensive case study of NetEase Cloud Music's data governance practice, covering data background, governance philosophy, detailed solutions across metadata, storage, compute, and model design, practical implementations, measurable cost savings, and future planning for sustainable data management.

Cost OptimizationHadoopSpark
0 likes · 15 min read
Cloud Music Data Governance Practice
DataFunSummit
DataFunSummit
Jun 4, 2023 · Fundamentals

The Role of Metadata in Data Governance and Its Applications

Metadata serves as a foundational element of data governance, enabling analysis, monitoring, discovery, and understanding of data assets, while applications such as data lineage, impact analysis, and data mapping help organizations assess quality, trace origins, and optimize processing workflows.

Big DataInformation Managementmetadata
0 likes · 5 min read
The Role of Metadata in Data Governance and Its Applications
IT Services Circle
IT Services Circle
May 30, 2023 · Fundamentals

Understanding Java Annotations: Concepts, Uses, and Implementation

This article explains Java annotations as metadata introduced in Java 5, covering their definition, built‑in and custom forms, purposes such as providing metadata, compile‑time checks, code generation and runtime processing, and demonstrates how to define, apply, and process them with code examples.

JavaReflectionannotations
0 likes · 6 min read
Understanding Java Annotations: Concepts, Uses, and Implementation
ITPUB
ITPUB
May 10, 2023 · Cloud Native

How Meituan’s MStore Achieves Scalable Storage‑Compute Separation in Cloud‑Native Environments

This article explains how Meituan’s storage team designed the MStore distributed storage platform to separate storage and compute, addressing scaling, cost, and reliability challenges of monolithic architectures, and details its cloud‑native components, data model, performance optimizations, observability, and the derived EBS block‑storage service.

Distributed SystemsMStorecloud-native
0 likes · 16 min read
How Meituan’s MStore Achieves Scalable Storage‑Compute Separation in Cloud‑Native Environments
DataFunTalk
DataFunTalk
Apr 21, 2023 · Fundamentals

Data Architecture and Data Modeling Overview, Solutions, and Enterprise Case Studies

This article explains data architecture and data modeling fundamentals, presents DAMA DMBOK concepts, outlines four practical solutions for model design, standard management, automated change control, and business mapping, and shares an enterprise manufacturing case study with Q&A on governance and efficiency.

Data ArchitectureEnterprise Datadata modeling
0 likes · 21 min read
Data Architecture and Data Modeling Overview, Solutions, and Enterprise Case Studies
ITPUB
ITPUB
Apr 9, 2023 · Artificial Intelligence

How ChatGPT Redefines Knowledge Acquisition: Six Practical Insights

The author shares a personal journey of using ChatGPT as a knowledge engine, illustrating six key benefits—answering complex questions, applying Occam's razor, simplifying concepts for beginners, enabling generative learning, fostering T‑shaped expertise, and mastering effective prompting—through concrete examples ranging from metadata explanations to Docker deployment steps.

AIChatGPTDocker
0 likes · 23 min read
How ChatGPT Redefines Knowledge Acquisition: Six Practical Insights
DataFunSummit
DataFunSummit
Apr 3, 2023 · Big Data

Evolution and Architecture of Data Lineage in Volcano Engine DataLeap

This article outlines the background, development stages, architectural evolution, key features such as incremental updates and quality metrics, and future directions of the data lineage capability within Volcano Engine's DataLeap big‑data governance platform.

Big DataDataLeapmetadata
0 likes · 18 min read
Evolution and Architecture of Data Lineage in Volcano Engine DataLeap
DataFunTalk
DataFunTalk
Mar 15, 2023 · Big Data

Evolution of Next‑Generation Cloud Data Platform Architecture

This technical presentation reviews the historical development of big data platforms, outlines the four generations of cloud data platform architectures, details the modern cloud‑native stack—including unified metadata, scheduling, and integration systems—and showcases a real‑world industrial manufacturing case with a Q&A session.

Cloud Data PlatformData ArchitectureScheduling
0 likes · 23 min read
Evolution of Next‑Generation Cloud Data Platform Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 14, 2023 · Big Data

Comprehensive Guide to Data Lineage: Model Design, Optimization, and Use Cases at ByteDance

This article presents an in‑depth overview of data lineage at ByteDance, detailing the design of storage, display, abstraction, implementation, and storage layers, optimization techniques for real‑time updates and queries, open export methods, practical use cases across asset, development, governance, and security domains, and future directions.

Apache AtlasData LineageJanusGraph
0 likes · 20 min read
Comprehensive Guide to Data Lineage: Model Design, Optimization, and Use Cases at ByteDance
DataFunSummit
DataFunSummit
Feb 11, 2023 · Big Data

Intelligent Metadata Governance for Power Data: Background, Solution, Value and Case Studies

This article presents a comprehensive overview of the intelligent metadata‑driven data governance framework implemented by Southern Power Grid Yunnan, detailing its background, challenges, architectural design, key AI‑enabled technologies, practical case studies, and the resulting business value for the power industry.

AIData Qualityelectric power
0 likes · 14 min read
Intelligent Metadata Governance for Power Data: Background, Solution, Value and Case Studies
DataFunTalk
DataFunTalk
Jan 19, 2023 · Big Data

Data Governance Strategies: Concepts, Practices, and Case Studies

The article explains the importance of data governance for organizations handling big data, outlines narrow and broad governance approaches, presents strategic design principles, and shares practical case studies from leading companies, while also offering a downloadable ebook of governance strategies.

Case StudiesData Managementdata security
0 likes · 7 min read
Data Governance Strategies: Concepts, Practices, and Case Studies
DataFunTalk
DataFunTalk
Jan 13, 2023 · Big Data

Data Governance Strategies and Practices: Insights from Leading Companies

The article explains the importance of data governance for organizations handling big data, distinguishes narrow and broad governance approaches, outlines strategic principles, and presents case studies from companies like Tencent, SF Tech, Huolala, and NetEase to illustrate effective governance practices.

Case StudyData QualityEnterprise Data
0 likes · 8 min read
Data Governance Strategies and Practices: Insights from Leading Companies
DataFunTalk
DataFunTalk
Dec 5, 2022 · Big Data

Data Governance Practices at ZTO Express: Challenges, Solutions, and Future Plans

The article details ZTO Express's data governance journey, covering company background, drivers and goals, challenges such as data asset inventory, standardization, quality, and modeling, and outlines their multi‑layered governance framework, practical implementations in data quality, model and metadata, and future plans.

Data PlatformLogisticsmetadata
0 likes · 17 min read
Data Governance Practices at ZTO Express: Challenges, Solutions, and Future Plans
DeWu Technology
DeWu Technology
Nov 30, 2022 · Big Data

Fundamentals and Implementation of Data Lineage in Big Data Environments

Data lineage in big‑data environments tracks how data moves and transforms—from source tables through SQL processing to final storage—enabling management tasks such as domain segmentation, performance tuning, anomaly detection, and dependency verification, with implementations ranging from simple regex extraction to robust AST parsing and optimization, as used by tools like Alibaba DataWorks and Apache Atlas.

ASTBig DataData Lineage
0 likes · 7 min read
Fundamentals and Implementation of Data Lineage in Big Data Environments
DataFunSummit
DataFunSummit
Nov 24, 2022 · Big Data

Metadata Management and Governance Practices at Wing Payment: Architecture, Techniques, and Future Outlook

This article explains how Wing Payment uses metadata as the foundation of its data‑governance practice, describing the challenges of data quality, efficiency, cost and security, the four‑step governance framework, the design of its metadata platform, and future directions such as multi‑source management and intelligent recommendation.

Master Datadata securitymetadata
0 likes · 18 min read
Metadata Management and Governance Practices at Wing Payment: Architecture, Techniques, and Future Outlook
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 22, 2022 · Big Data

Comprehensive Guide to Metadata Management, Data Quality, and Optimization in Big Data Systems

This article provides an in-depth overview of metadata concepts, their technical and business classifications, value in data management, applications such as data profiling and lineage, optimization techniques for compute and storage, lifecycle management, and comprehensive data quality assurance practices within large‑scale big data environments.

big-datadata-qualitydata-warehouse
0 likes · 38 min read
Comprehensive Guide to Metadata Management, Data Quality, and Optimization in Big Data Systems
Data Thinking Notes
Data Thinking Notes
Nov 16, 2022 · Big Data

Why Metadata Management Is Essential for Data Warehouses

This article explains the concept of metadata, its role in data warehouses, why managing metadata is critical for building, maintaining, and scaling data warehouse systems, and outlines practical steps, use cases, and tools for effective metadata management.

Data GovernanceData WarehouseETL
0 likes · 15 min read
Why Metadata Management Is Essential for Data Warehouses
DataFunSummit
DataFunSummit
Nov 11, 2022 · Big Data

Tencent Oula Data Governance Platform: Architecture, Practices, and Solutions

The article presents an in‑depth overview of Tencent's Oula data governance platform, describing its construction goals, core capabilities, DataOps‑driven development workflow, unified metric store, data map services, and practical Q&A on asset health scoring and data lineage, illustrating a comprehensive end‑to‑end big‑data governance solution.

DataOpsTencentbig data platform
0 likes · 17 min read
Tencent Oula Data Governance Platform: Architecture, Practices, and Solutions
DataFunTalk
DataFunTalk
Nov 2, 2022 · Big Data

Tencent Oula Data Governance Platform: Architecture, Practices, and Solutions

Tencent's Oula platform, launched in 2019, provides a DataOps‑driven, end‑to‑end data governance solution covering data discovery, asset factory, metric platform, and governance engine, and the talk details its construction goals, data development governance, unified metric system, data map, and Q&A on asset health and lineage.

Data PlatformDataOpsmetadata
0 likes · 17 min read
Tencent Oula Data Governance Platform: Architecture, Practices, and Solutions
ITPUB
ITPUB
Oct 20, 2022 · Big Data

Will HDFS Be Replaced? Analyzing Its Drawbacks and Future Alternatives

The article examines why Hadoop's Distributed File System may become obsolete by detailing its three main shortcomings—deployment complexity, metadata memory limits, and high replication overhead—and explores how newer architectures and erasure coding could address these issues.

Big DataDistributed File SystemHDFS
0 likes · 8 min read
Will HDFS Be Replaced? Analyzing Its Drawbacks and Future Alternatives
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Sep 23, 2022 · Industry Insights

How Domain Model Metadata Boosts Business System Reuse and Efficiency

This article explores how structured domain model metadata, derived from domain‑driven design principles, can standardize business component descriptions, enable visual UML modeling, support low‑code and no‑code generation, and ultimately reduce development costs while accelerating delivery of enterprise support systems.

Domain-Driven DesignUMLlow-code
0 likes · 20 min read
How Domain Model Metadata Boosts Business System Reuse and Efficiency
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Sep 23, 2022 · Databases

How Baidu’s TafDB Achieves Trillion‑Scale Metadata Storage with Near‑Zero Latency

This article explores the design and engineering of Baidu’s TafDB, a distributed metadata database that powers cloud object and file storage, detailing its architecture, namespace evolution, transaction optimizations, garbage collection strategies, and clock mechanisms that enable trillion‑scale metadata and millions of QPS.

Scalabilitycloud storagemetadata
0 likes · 19 min read
How Baidu’s TafDB Achieves Trillion‑Scale Metadata Storage with Near‑Zero Latency
ShiZhen AI
ShiZhen AI
Sep 7, 2022 · Big Data

Getting Started with DataHub: A One‑Stop Guide to Metadata Governance

This article walks you through the fundamentals of data governance, explains metadata management concepts, compares traditional tools with DataHub, and provides a step‑by‑step tutorial for installing Docker, Python, and DataHub 0.8.20 on CentOS 7, ingesting MySQL metadata, and exploring the UI.

Big DataData GovernanceDataHub
0 likes · 19 min read
Getting Started with DataHub: A One‑Stop Guide to Metadata Governance
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Aug 4, 2022 · Cloud Native

What Is a Cloud‑Native Data Platform? Architecture, Components, and Best Practices

This article explores the evolution and architecture of cloud‑native data platforms, covering their historical roots, modern components such as storage layers, ingestion, processing, metadata, and consumption, and offers practical guidance on selecting tools, designing pipelines, and implementing best‑practice strategies for scalable, flexible data infrastructure.

Data Architecturebig-datacloud-native
0 likes · 41 min read
What Is a Cloud‑Native Data Platform? Architecture, Components, and Best Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 18, 2022 · Big Data

Systematic Data Governance Practices in Meituan Accommodation Business

This article details Meituan's accommodation data governance team's evolution toward an automated, systematic, and standardized governance framework, covering background challenges, the conceptualization of a comprehensive governance system, its practical implementation across standardization, digitization, and systematization, and the resulting operational benefits and future directions.

AutomationOperationsmetadata
0 likes · 30 min read
Systematic Data Governance Practices in Meituan Accommodation Business
Big Data Technology Architecture
Big Data Technology Architecture
Jun 7, 2022 · Big Data

Multi-Modal Index in Apache Hudi 0.11.0: Design, Implementation, and Performance Benefits

This article explains the motivation, design principles, implementation details, and performance improvements of the new multi‑modal indexing subsystem introduced in Apache Hudi 0.11.0 for Lakehouse architectures, covering scalable metadata, ACID updates, fast lookups, file listing, data skipping, upsert performance, and future work.

Apache Hudiindexingmetadata
0 likes · 19 min read
Multi-Modal Index in Apache Hudi 0.11.0: Design, Implementation, and Performance Benefits
MaGe Linux Operations
MaGe Linux Operations
Jun 6, 2022 · Databases

What Is a Schema? From Databases to Kubernetes Explained

This article explains the concept of a schema—from its Greek origins and psychological meaning to its role as metadata in databases and Kubernetes—detailing different database schema models, Kubernetes resource definitions, and how to extend and register custom schemas in Go.

GolangYAMLdatabase
0 likes · 8 min read
What Is a Schema? From Databases to Kubernetes Explained
Laravel Tech Community
Laravel Tech Community
May 30, 2022 · Backend Development

Highlights of Apache Pulsar 2.10.0 Release: New Features and Bug Fixes

The Apache Pulsar 2.10.0 release introduces automatic cluster failover, lazy‑loading producers, new TableView support, enhanced broker interceptors, enriched client authentication, Etcd metadata storage, and numerous bug fixes, offering developers and operators a more flexible and performant messaging platform.

Apache PulsarBrokerMessaging
0 likes · 7 min read
Highlights of Apache Pulsar 2.10.0 Release: New Features and Bug Fixes
Architect
Architect
May 25, 2022 · Big Data

Metadata Infrastructure and Governance in Bilibili's Data Platform

The article details how Bilibili built a unified metadata infrastructure—including a URN‑based model, collection pipelines, quality assurance, storage in TiDB/ES/HugeGraph, and query services—to support data discovery, lineage, impact analysis, and governance across its growing data platform.

Big DataData CatalogData Governance
0 likes · 21 min read
Metadata Infrastructure and Governance in Bilibili's Data Platform