Tagged articles

Metadata

195 articles · Page 1 of 2

Jun 30, 2026 · Industry Insights

When Max Planck Is Marked Retracted: Algorithmic Mistake Exposes Publishing Flaws

A new paper reveals that two of Max Planck's 1940s articles were mistakenly labeled as retracted on Springer’s platform due to algorithmic metadata errors, highlighting how modern publishing infrastructure can misinterpret historic scientific works and affect AI‑driven knowledge systems.

AIDigital ArchivesMax Planck

0 likes · 8 min read

When Max Planck Is Marked Retracted: Algorithmic Mistake Exposes Publishing Flaws

Black & White Path

Jun 27, 2026 · Information Security

What the 905 GB BreachForums CDN Leak Reveals About Hacker Infrastructure

A 905 GB BitTorrent seed of BreachForums’ CDN cache, containing raw databases, exploit tools, proof‑of‑concept media, and detailed forum metadata, was publicly released, offering an unprecedented view into the full inventory of a major underground hacker market and highlighting the risks of CDN misconfiguration.

BreachForumsCDN leakMetadata

0 likes · 6 min read

What the 905 GB BreachForums CDN Leak Reveals About Hacker Infrastructure

DataFunSummit

Jun 7, 2026 · Artificial Intelligence

How Qichacha Uses Large Language Models for Field‑Level Data Lineage

This article details Qichacha's technical journey of applying large language models to resolve field‑level data lineage challenges in a complex, multi‑source data environment, describing the motivation, architecture, practical implementation, engineering trade‑offs, and measurable outcomes.

AIBig DataData Governance

0 likes · 11 min read

How Qichacha Uses Large Language Models for Field‑Level Data Lineage

DataFunTalk

May 27, 2026 · Industry Insights

Data Agent Tipping Point in 6‑12 Months? Xiaomi, Alibaba Cloud & Datastrato Discuss

The round‑table examines how Data Agent is moving from proof‑of‑concept to production, outlines its three‑stage evolution from NL2SQL to a general AI‑driven agent, highlights verification and semantic‑gap challenges, and presents expert views that the scaling tipping point could arrive within the next six to twelve months.

AIApache GravitinoData Agent

0 likes · 10 min read

Data Agent Tipping Point in 6‑12 Months? Xiaomi, Alibaba Cloud & Datastrato Discuss

AI Waka

May 26, 2026 · Artificial Intelligence

Master the 14+ SKILL.md Metadata Fields for Claude Agents

This guide explains all 14+ metadata fields available in a SKILL.md file, distinguishes the six standard Agent Skills fields from Claude Code extensions, and provides concrete examples and best‑practice recommendations for naming, descriptions, licensing, compatibility, allowed tools, and advanced options such as model selection and sub‑agent configuration.

Agent SkillsClaudeMetadata

0 likes · 13 min read

Master the 14+ SKILL.md Metadata Fields for Claude Agents

DataFunSummit

May 24, 2026 · Industry Insights

Why AI Agents Are Redefining Data Infrastructure Governance

The rise of AI agents as data consumers forces a fundamental shift in data infrastructure design, requiring unified metadata control, a robust semantic layer, and a governed agent access framework to replace traditional human‑centric RBAC models and ensure secure, auditable operations.

AI AgentsAgentic Data ProtocolApache Gravitino

0 likes · 18 min read

Why AI Agents Are Redefining Data Infrastructure Governance

AI Engineer Programming

May 20, 2026 · Artificial Intelligence

Why Chunk‑Based RAG Fails and How IdeaBlocks Improve Retrieval

The article argues that the common assumption that text chunks are the proper knowledge unit in RAG pipelines is flawed, leading to versioning, metadata, and redundancy problems, and demonstrates that replacing chunks with structured IdeaBlocks dramatically reduces corpus size, token usage, and improves vector relevance.

IdeaBlockLLMMetadata

0 likes · 10 min read

Why Chunk‑Based RAG Fails and How IdeaBlocks Improve Retrieval

Linyb Geek Road

May 20, 2026 · Big Data

Why 90% of Companies Get Data Governance Wrong and How to Reduce Friction

Most data‑governance initiatives fail not because of lacking technology but because they add friction; the article explains how companies mistakenly focus on rules, platforms, and processes, and offers a step‑by‑step approach—identifying high‑value tables, minimal metadata, targeted quality rules, and fast issue diagnosis—to make governance truly useful.

Big DataData GovernanceData Quality

0 likes · 29 min read

Why 90% of Companies Get Data Governance Wrong and How to Reduce Friction

Big Data Tech Team

May 19, 2026 · Big Data

Enterprise Data Warehouse Development Playbook: Standard Engineering Edition

This playbook provides enterprise‑level data warehouse engineers, ETL developers, data modelers, and data‑team managers with a complete, logical, and actionable set of standards, processes, and best‑practice guidelines covering architecture, development principles, role responsibilities, end‑to‑end workflow, metadata, security, performance metrics, and team collaboration.

Data QualityETLMetadata

0 likes · 18 min read

Enterprise Data Warehouse Development Playbook: Standard Engineering Edition

DataFunSummit

May 14, 2026 · Big Data

How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse

The article examines the challenges of multimodal data in modern lakehouses and presents a three‑tool stack—Gravitino, Daft, and Lance—that provides unified metadata, distributed multimodal compute, and high‑performance storage, while detailing security governance, integration paths, and future directions.

DaftGravitinoLakehouse

0 likes · 11 min read

How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse

Woodpecker Software Testing

Apr 30, 2026 · Databases

Datafaker: A Powerful Tool for Bulk Test Data Generation

Datafaker is a Python‑compatible utility that creates large volumes of synthetic test data for databases, streams, files, and messaging systems, offering flexible metadata rules, multi‑backend support, and command‑line options for quick data provisioning.

ElasticsearchMetadataPython

0 likes · 14 min read

Datafaker: A Powerful Tool for Bulk Test Data Generation

Ray's Galactic Tech

Apr 27, 2026 · Artificial Intelligence

Using AI to Auto‑Generate Forms: Production‑Ready Low‑Code Form Generation with Spring AI Alibaba ReactAgent

The article presents a production‑grade solution that lets users describe a form in natural language, then uses a Spring AI Alibaba ReactAgent powered by a ReAct reasoning loop to retrieve templates, validate fields, generate layout, enforce governance, and finally emit a versioned JSON schema ready for deployment.

MetadataObservabilityReAct

0 likes · 29 min read

Using AI to Auto‑Generate Forms: Production‑Ready Low‑Code Form Generation with Spring AI Alibaba ReactAgent

AI Architect Hub

Apr 25, 2026 · Artificial Intelligence

How to Feed Massive Documents to an RAG System: Mastering the Art of Text Chunking

This article explains why proper text chunking is critical for Retrieval‑Augmented Generation, illustrates common pitfalls with real‑world examples, compares four chunking strategies (fixed length, recursive, structure‑aware, and code‑aware), and provides practical guidelines for chunk size, overlap, metadata handling, and a production‑ready pipeline.

AI RetrievalLangChainMetadata

0 likes · 21 min read

How to Feed Massive Documents to an RAG System: Mastering the Art of Text Chunking

Su San Talks Tech

Apr 19, 2026 · Artificial Intelligence

Boost Enterprise RAG: Data Pipeline Tricks, Hybrid Search & Rerank

To make Retrieval‑Augmented Generation reliable in production, the article outlines five key engineering tactics—semantic chunking with metadata, hybrid vector‑keyword search, two‑stage retrieval with reranking, query rewriting and expansion, and dynamic result evaluation—each illustrated with concrete examples and code snippets.

AI EngineeringHybrid SearchMetadata

0 likes · 10 min read

Boost Enterprise RAG: Data Pipeline Tricks, Hybrid Search & Rerank

James' Growth Diary

Apr 17, 2026 · Artificial Intelligence

How to Load and Split Documents for RAG: First Step to Building a Knowledge Base

This tutorial explains why document loading and splitting are critical for RAG pipelines, introduces LangChain's Document format, demonstrates loaders for various file types, details the RecursiveCharacterTextSplitter and alternative splitters, and provides practical tips on parameter tuning, metadata preservation, Chinese text handling, and common pitfalls.

AIChunkingDocument Loader

0 likes · 27 min read

How to Load and Split Documents for RAG: First Step to Building a Knowledge Base

Wu Shixiong's Large Model Academy

Mar 7, 2026 · Artificial Intelligence

Mastering Offline Document Parsing for RAG: From PDFs to Multimodal Knowledge Bases

This article provides a comprehensive guide to offline document parsing for Retrieval‑Augmented Generation, covering multi‑format extraction, layout analysis, OCR pitfalls, chunking strategies, hierarchical metadata tagging, and how these steps directly affect retrieval accuracy and overall RAG performance.

Document ParsingMetadataRAG

0 likes · 14 min read

Mastering Offline Document Parsing for RAG: From PDFs to Multimodal Knowledge Bases

DataFunTalk

Mar 3, 2026 · Big Data

Exploring Tencent Cloud’s Iceberg Batch‑Stream Integration and AI‑Driven Data Governance

This article presents a series of seven technical case studies—including Tencent Cloud’s Iceberg‑based batch‑stream integration, AI‑driven data governance with Apache Gravitino, Xiaohongshu’s lakehouse evolution, and a multimodal data‑lake solution—detailing challenges, architectural designs, implementation steps, performance results, and future directions.

AIBig DataData Lake

0 likes · 8 min read

Exploring Tencent Cloud’s Iceberg Batch‑Stream Integration and AI‑Driven Data Governance

Mingyi World Elasticsearch

Feb 25, 2026 · Databases

How to Accurately Track Document Write Time in Elasticsearch – 3 Practical Methods

Elasticsearch does not store a built‑in write timestamp, so to trace when a document was indexed you must add the field during ingest, using either an Ingest Pipeline, Logstash/Beats configuration, or application‑side code, with guidance on advantages, caveats, and handling historical data.

BeatsElasticsearchIngest Pipeline

0 likes · 5 min read

How to Accurately Track Document Write Time in Elasticsearch – 3 Practical Methods

Baidu Geek Talk

Feb 9, 2026 · Databases

How Mantle Redefined Cloud Object Storage Metadata for Billion‑File Scale

This article recounts how Baidu's storage team tackled the performance and scalability limits of traditional object storage by redesigning metadata handling with the Mantle and MantleX architectures, introducing a centralized IndexNode, strong consistency, delta‑record writes, and a seamless single‑node to distributed transition for massive file systems.

Distributed storageFilesystemMetadata

0 likes · 37 min read

How Mantle Redefined Cloud Object Storage Metadata for Billion‑File Scale

DataFunTalk

Dec 26, 2025 · Cloud Native

How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing

Haier’s digital transformation leverages a cloud‑native, open‑source‑based multi‑modal data lake that unifies structured and unstructured industrial data, uses metadata models and knowledge graphs for governance, and provides AI‑ready services that balance performance, cost, and real‑time requirements.

AIData LakeMetadata

0 likes · 12 min read

How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing

DataFunSummit

Dec 19, 2025 · Cloud Native

How HiSilicon Uses Cloud‑Native Architecture to Build a Multi‑Modal Data Lake

Amid the AI wave, HiSilicon’s digital transformation tackles fragmented industrial data by adopting a cloud‑native, open‑source stack centered on Paimon, creating a unified metadata model, knowledge graph, and elastic scheduling that balances performance and cost while powering AI‑ready services across nine business domains.

AIKnowledge GraphMetadata

0 likes · 12 min read

How HiSilicon Uses Cloud‑Native Architecture to Build a Multi‑Modal Data Lake

dbaplus Community

Dec 6, 2025 · Big Data

Why Precise Data Warehouse Naming Boosts Efficiency and Cuts Costs

In the era of digital transformation, chaotic data warehouse naming wastes resources, while a well‑defined naming convention improves maintainability, collaboration, and business value, as demonstrated by real‑world cases showing three‑fold query speed gains and up to 60% reduction in cross‑team effort.

Big DataData WarehouseMetadata

0 likes · 6 min read

Why Precise Data Warehouse Naming Boosts Efficiency and Cuts Costs

DataFunTalk

Nov 22, 2025 · Big Data

How Modern Data Lakes and AI Governance Transform Enterprise Analytics

This article collection examines Tencent Cloud’s Iceberg batch‑stream integration, AI‑driven game data governance, Apache Gravitino unified metadata and lineage, Xiaohongshu’s multimodal data‑lake evolution, and Volcano Engine’s Data+AI multimodal lake, highlighting architectures, techniques, performance gains, and practical implementations.

AI GovernanceData LakeGravitino

0 likes · 7 min read

How Modern Data Lakes and AI Governance Transform Enterprise Analytics

DataFunSummit

Oct 22, 2025 · Big Data

How Douyin’s Data Asset Platform Revolutionizes Big Data Lineage

This article introduces Douyin Group’s comprehensive data asset management platform, explains why it emphasizes data assets over raw metadata, outlines its full‑linkage lineage capabilities, and presents practical insights on building, applying, and future‑proofing big data lineage within complex enterprise environments.

Big DataData Asset ManagementDouyin

0 likes · 5 min read

How Douyin’s Data Asset Platform Revolutionizes Big Data Lineage

DataFunSummit

Oct 19, 2025 · Big Data

How Apache Gravitino and OpenLineage Transform Data Governance in the AI Era

This article explains how the rapid rise of AI and large‑model technologies is driving a paradigm shift in data governance toward intelligent, automated, and real‑time collaboration, outlines the challenges of multi‑cloud environments, and demonstrates how Apache Gravitino and OpenLineage provide a unified metadata and lineage solution that improves data quality, compliance, and business agility.

Apache GravitinoBig DataMetadata

0 likes · 12 min read

How Apache Gravitino and OpenLineage Transform Data Governance in the AI Era

DataFunSummit

Oct 14, 2025 · Big Data

How Douyin’s Data Asset Platform Redefines Big Data Lineage

This article introduces Douyin Group’s one‑stop Data Asset Management Platform, explains why the company focuses on data assets rather than raw metadata, and details the evolution, architecture, applications, and future outlook of its comprehensive big‑data lineage system.

Big DataData Asset ManagementData Governance

0 likes · 5 min read

How Douyin’s Data Asset Platform Redefines Big Data Lineage

DataFunSummit

Oct 11, 2025 · Big Data

What Small Banks Can Learn from Cutting-Edge Data Governance Practices

This article shares a data‑governance roadmap for small and medium banks, covering industry pain points, high‑quality data sets, a three‑step governance path, data standards, metadata management, master‑data strategy, business data modeling, a hybrid Greenplum‑Hadoop platform, quality monitoring, and a maturity assessment framework.

Big DataData ArchitectureData Governance

0 likes · 21 min read

What Small Banks Can Learn from Cutting-Edge Data Governance Practices

Sohu Tech Products

Oct 9, 2025 · Mobile Development

How Android Dynamic Photos Work: XMP Metadata, Formats, and Kotlin Extraction

This article explores the technical architecture of Android dynamic photos, detailing the three‑layer file structure, XMP metadata specifications, and differences among Xiaomi Micro Video, Google Motion Photo, and OPPO O Live Photo, and provides a unified Kotlin solution for detection, parsing, and playback.

AndroidDynamic PhotoKotlin

0 likes · 25 min read

How Android Dynamic Photos Work: XMP Metadata, Formats, and Kotlin Extraction

JD Tech

Oct 9, 2025 · Artificial Intelligence

What Is Retrieval‑Augmented Generation (RAG) and How Does It Boost AI Accuracy?

This article explains Retrieval‑Augmented Generation (RAG), an AI framework that combines external knowledge retrieval with large language models, covering its motivations, data preparation, chunking strategies, vectorization, storage, query processing, retrieval, reranking, prompt engineering, and LLM generation, plus practical optimization tips.

ChunkingLLMMetadata

0 likes · 14 min read

What Is Retrieval‑Augmented Generation (RAG) and How Does It Boost AI Accuracy?

Continuous Delivery 2.0

Sep 11, 2025 · Artificial Intelligence

Building Scalable Enterprise RAG: Lessons, Pitfalls, and Proven Solutions

This article shares practical lessons from building a large‑scale enterprise RAG system, covering imperfect data, document quality scoring, hierarchical chunking, metadata design, semantic‑search failures, open‑source model choices, and table handling to achieve reliable AI‑driven search.

Enterprise AIMetadataRAG

0 likes · 13 min read

Building Scalable Enterprise RAG: Lessons, Pitfalls, and Proven Solutions

DataFunTalk

Sep 1, 2025 · Big Data

How JD Retail Tackles Data Governance Challenges to Boost Efficiency

JD Retail outlines the growing data management challenges it faces—including asset discovery, architecture agility, development quality, and rising IT costs—and presents a comprehensive data governance framework that leverages standards, agile architecture, development isolation, and resource optimization to improve efficiency and reduce operational expenses.

Big DataData GovernanceData Management

0 likes · 7 min read

How JD Retail Tackles Data Governance Challenges to Boost Efficiency

Big Data Technology Tribe

Aug 22, 2025 · Backend Development

How StarRocks Keeps Metadata Consistent Across FE Nodes

This article explains the roles of StarRocks FE and BE nodes, details the metadata stored in FE, describes the leader‑follower‑observer architecture, and shows how BDB JE replication, journal logs, and checkpoint mechanisms ensure metadata synchronization and durability even after node failures.

BDB JEMetadataStarRocks

0 likes · 17 min read

How StarRocks Keeps Metadata Consistent Across FE Nodes

Past Memory Big Data

Jul 30, 2025 · Big Data

Why Iceberg Is Dropping Positional Deletes in Merge‑on‑Read Tables

The article explains how Apache Iceberg v3 replaces the scalable‑limited positional‑delete mechanism in Merge‑on‑Read tables with compact Deletion Vectors, detailing the performance, I/O and metadata drawbacks of positional deletes and showing how the new bitmap‑based approach resolves them.

Apache IcebergData LakeDeletion Vector

0 likes · 20 min read

Why Iceberg Is Dropping Positional Deletes in Merge‑on‑Read Tables

DataFunSummit

Jun 10, 2025 · Big Data

How OpenLake Redefines Data Lake Infrastructure for the AI Era

This article explores OpenLake's evolution as a data lake platform for AI, covering the transition from Hive to modern lake formats like Iceberg and Paimon, performance benchmarks, metadata management advances, intelligent storage optimization, and the integration of multimodal support with the Lance file format.

AIBig DataData Lake

0 likes · 22 min read

How OpenLake Redefines Data Lake Infrastructure for the AI Era

Architecture and Beyond

May 1, 2025 · Industry Insights

How Tag Systems Become the Brain of Digital Content – An Architect’s Guide

This article examines tag systems as the neural network of digital content, comparing them with traditional hierarchies, tracing their evolution, outlining business‑driven design steps, and detailing architectural components, non‑functional requirements, integration patterns, and future AI‑enhanced trends.

AI taggingMetadataarchitecture

0 likes · 24 min read

How Tag Systems Become the Brain of Digital Content – An Architect’s Guide

Big Data Tech Team

Apr 28, 2025 · Big Data

Mastering Metadata, Master Data, and Data Governance: A Complete Guide

This article explains the core concepts of metadata, master data, data resources, data governance, and data management, outlines their roles, compares governance with management, and provides practical steps and best‑practice recommendations for building a robust enterprise data framework.

Big DataData GovernanceMaster Data

0 likes · 15 min read

Mastering Metadata, Master Data, and Data Governance: A Complete Guide

Big Data Tech Team

Apr 16, 2025 · Operations

Mastering Data Warehouse Naming: A Complete Guide to Standards and Processes

This article provides a comprehensive, step‑by‑step guide to data‑warehouse development, covering the full R&D workflow, data modeling layers, data dictionary creation, naming conventions for tables, columns, indexes and ETL jobs, metric standardization, and governance processes to ensure consistent, maintainable data assets across the organization.

ETLMetadatadata dictionary

0 likes · 28 min read

Mastering Data Warehouse Naming: A Complete Guide to Standards and Processes

Alimama Tech

Apr 10, 2025 · Big Data

Performance Optimization of Apache Paimon in Dolphin OLAP Engine

The article details how Apache Paimon, integrated as an external table format in Alibaba’s Dolphin OLAP engine, achieves millisecond‑level query latency and up to 10k QPS through ORC push‑down, manifest conversion, caching, concurrency, and encoding optimizations, outperforming StarRocks and Hologres.

DolphinJavaMetadata

0 likes · 17 min read

Performance Optimization of Apache Paimon in Dolphin OLAP Engine

Xiao Lou's Tech Notes

Feb 17, 2025 · Backend Development

Swiss Tables in Go 1.24: Open Addressing, SIMD, and Metadata Secrets

The article explains how Go 1.24’s new Swiss Tables hash‑map implementation replaces the traditional bucket‑based design with open addressing, SIMD‑accelerated probing, and metadata separation, detailing the underlying principles, performance advantages, handling of clustering and deletions, and a comparison with previous Go maps and Java’s HashMap.

GoMetadataSIMD

0 likes · 16 min read

Swiss Tables in Go 1.24: Open Addressing, SIMD, and Metadata Secrets

Big Data Technology & Architecture

Feb 1, 2025 · Big Data

Douyin Group Data Asset Management Platform: Comprehensive Data Lineage Overview and Practices

This article presents a detailed overview of Douyin Group's Data Asset Management Platform, focusing on the evolution, architecture, modeling, metrics, and application scenarios of its large‑scale data lineage system, and outlines future directions for full‑coverage, fine‑grained lineage capabilities.

Big DataData Asset ManagementMetadata

0 likes · 17 min read

Douyin Group Data Asset Management Platform: Comprehensive Data Lineage Overview and Practices

DataFunSummit

Jan 1, 2025 · Big Data

Douyin Group Data Asset Management Platform: Full‑Stack Data Lineage Evolution and Applications

This article introduces Douyin Group’s end‑to‑end data asset management platform, explains the evolution and architecture of its large‑scale data lineage system, presents quality metrics and ecosystem components, and outlines practical applications and future directions for data governance, development, and security.

Data Asset PlatformData GovernanceData Quality

0 likes · 16 min read

Douyin Group Data Asset Management Platform: Full‑Stack Data Lineage Evolution and Applications

360 Zhihui Cloud Developer

Nov 29, 2024 · Big Data

How Ozone Scales Metadata for Massive Big Data Storage

This article explains Ozone's object storage architecture, its evolution of metadata management using distributed KV stores like Apache Cassandra, and the performance optimizations—read/write separation, unlimited scaling, and partitioning—that enable high‑throughput, low‑latency handling of massive datasets.

Apache CassandraBig DataDistributed KV

0 likes · 9 min read

How Ozone Scales Metadata for Massive Big Data Storage

DeWu Technology

Nov 13, 2024 · Backend Development

Evolution of Rainbow Bridge Architecture: Building a Self‑Managed Metadata Center and SDK Enhancements

The new Rainbow Bridge architecture replaces the SLB‑based load‑balancing model with a self‑managed, multi‑AZ metadata center and enhanced SDK that aggregates node health, provides zone‑aware weighted routing, supports rapid failover and manual overrides, and delivers faster recovery and scalable traffic handling.

Metadatadistributed systemsload balancing

0 likes · 11 min read

Evolution of Rainbow Bridge Architecture: Building a Self‑Managed Metadata Center and SDK Enhancements

Baidu Geek Talk

Nov 6, 2024 · Cloud Computing

Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers

Baidu’s Canghai Storage unifies metadata, hierarchical namespace, and data layers into a Meta‑Aware, three‑generation architecture that scales to trillions of metadata items and zettabyte‑scale data, using a distributed transactional KV store, single‑machine‑distributed namespace, and online erasure‑coding micro‑services to deliver high performance, low cost, and seamless scalability.

Big DataMetadataNewSQL

0 likes · 18 min read

Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers

Baidu Intelligent Cloud Tech Hub

Nov 4, 2024 · Cloud Computing

How Baidu’s Unified Storage Platform Tackles AI‑Era Data Challenges

This article details Baidu’s unified storage architecture—covering its metadata, hierarchical namespace, and data layers—explaining how meta‑aware design, custom partitioning, flexible engines, and micro‑service based erasure coding together meet the scalability, performance, and cost demands of modern AI‑driven cloud storage workloads.

MetadataMicroservicescloud storage

0 likes · 17 min read

How Baidu’s Unified Storage Platform Tackles AI‑Era Data Challenges

Data Thinking Notes

Oct 29, 2024 · Big Data

Unlocking Data Value: A Complete Guide to Data Asset Management and Governance

This article explores how enterprises can systematically identify, inventory, and govern massive data assets by defining key concepts, adopting frameworks like DAMA and DCMM, building layered management structures, and implementing integrated platforms for data integration, metadata, master data, standards, quality, and security to unlock data-driven value.

DAMADCMMData Asset Management

0 likes · 15 min read

Unlocking Data Value: A Complete Guide to Data Asset Management and Governance

DataFunSummit

Oct 17, 2024 · Big Data

Waggle Dance Based Metadata Solution at Tongcheng Travel: Architecture, Migration Strategies, and Future Outlook

This article presents Tongcheng Travel's metadata solution built on the open‑source Waggle Dance project, detailing the three‑layer architecture, challenges of a monolithic Hive Metastore, evaluated migration plans, federation implementation, migration workflow, and future directions for unified metadata governance.

Data MigrationFederationHive Metastore

0 likes · 11 min read

Waggle Dance Based Metadata Solution at Tongcheng Travel: Architecture, Migration Strategies, and Future Outlook

DataFunSummit

Sep 12, 2024 · Cloud Native

Design and Implementation of a Next‑Generation Multi‑Protocol Unstructured Storage System for Machine Learning

This article presents the challenges of storing massive machine‑learning datasets, evaluates existing storage solutions, and details the design of OrangeFS—a cloud‑native, multi‑protocol, multi‑tenant unstructured storage system that integrates object and file interfaces, optimizes metadata services, supports hot upgrades, and provides robust scalability and reliability for AI workloads.

Cloud NativeMetadataMulti-Protocol

0 likes · 24 min read

Design and Implementation of a Next‑Generation Multi‑Protocol Unstructured Storage System for Machine Learning

G7 EasyFlow Tech Circle

Sep 9, 2024 · Product Management

What Can Logistics Software Learn from Oracle EBS and OFSA Design?

This article examines how the flexible, metadata‑driven architecture and configurable features of Oracle E‑Business Suite and OFSA can inspire more adaptable, modular, and user‑centric logistics software, covering design principles, extensibility, integration, and data‑model strategies.

MetadataOFSAOracle EBS

0 likes · 25 min read

What Can Logistics Software Learn from Oracle EBS and OFSA Design?

AI Large Model Application Practice

Aug 29, 2024 · Artificial Intelligence

8 Essential Indexing Strategies to Boost Enterprise RAG Performance

This article presents eight practical optimization recommendations for the indexing stage of enterprise‑level Retrieval‑Augmented Generation (RAG) applications, covering chunk creation, abbreviation handling, multimodal document processing, semantic enrichment, metadata usage, alternative index types, and embedding model selection.

ChunkingIndexingMetadata

0 likes · 15 min read

8 Essential Indexing Strategies to Boost Enterprise RAG Performance

DataFunSummit

Aug 28, 2024 · Big Data

Building Data Lineage Foundations and Applications for E‑commerce Scenarios

This article explains how to construct a full‑link data lineage platform for e‑commerce, detailing its architecture, quality metrics, and practical uses such as table migration, field‑level tracing, and automated metric decomposition to improve data governance and efficiency.

Data GovernanceMetadatadata lineage

0 likes · 14 min read

Building Data Lineage Foundations and Applications for E‑commerce Scenarios

DataFunSummit

Jun 19, 2024 · Big Data

Apache Hudi from Zero to One: Introduction to Hudi’s Storage Format (Part 1)

This article introduces Apache Hudi’s storage format, explaining the table layout, metadata and data file organization, the naming conventions of timeline actions, and the trade‑offs between Copy‑on‑Write and Merge‑on‑Read table types for transactional data lakes.

Apache HudiBig DataData Lake

0 likes · 8 min read

Apache Hudi from Zero to One: Introduction to Hudi’s Storage Format (Part 1)

OPPO Kernel Craftsman

Jun 14, 2024 · Fundamentals

In-depth Analysis of F2FS Filesystem Superblock, SIT, NAT, and SSA Structures

The article provides a detailed code-level walkthrough of F2FS’s superblock, Segment Information Table, Node Address Table, Summary Area, and Main Area structures, explaining each field, entry layout, and their roles in managing segments, inodes, and data within the filesystem.

F2FSFilesystemMetadata

0 likes · 9 min read

In-depth Analysis of F2FS Filesystem Superblock, SIT, NAT, and SSA Structures

Bilibili Tech

Apr 26, 2024 · Big Data

Fine-Grained Lock Optimization for HDFS NameNode to Improve Metadata Read/Write Performance

To overcome the NameNode write bottleneck caused by a single global read/write lock in Bilibili’s massive HDFS deployment, the team introduced hierarchical fine‑grained locking—splitting the lock into Namespace, BlockPool, and per‑INode levels—which yielded up to three‑fold write throughput gains, a 90 % drop in RPC queue time, and shifted performance limits from lock contention to log synchronization.

Big DataHDFSMetadata

0 likes · 15 min read

Fine-Grained Lock Optimization for HDFS NameNode to Improve Metadata Read/Write Performance

DataFunSummit

Mar 18, 2024 · Big Data

Scenario‑Based Data Governance Practices in the Securities Industry

This article presents a comprehensive, scenario-driven data governance practice at Guoxin Securities, covering the industry's pain points, a three‑layer governance framework, detailed implementations for data standards, metadata, data quality, data modeling, and data security, and outlines future directions for intelligent and measurable governance.

Big DataData QualityData Security

0 likes · 30 min read

Scenario‑Based Data Governance Practices in the Securities Industry

DataFunTalk

Mar 17, 2024 · Databases

MatrixOne Storage Format Design Overview

This article provides a comprehensive overview of MatrixOne's hyper‑converged cloud‑native database architecture, detailing its three‑layer design, data execution flow, columnar storage format, metadata hierarchy, performance optimizations, compatibility mechanisms, and practical usage scenarios.

MatrixOneMetadataStorage Engine

0 likes · 12 min read

MatrixOne Storage Format Design Overview

政采云技术

Jan 23, 2024 · Big Data

Design and Implementation of a Big Data Permission Management System

This article outlines the background, importance, scenarios, challenges, objectives, and architectural design—including RBAC and ABAC models, metadata integration, data classification, and verification mechanisms—of a comprehensive big data permission management system for secure and fine‑grained data access.

ABACAccess ControlBig Data

0 likes · 14 min read

Design and Implementation of a Big Data Permission Management System

DataFunTalk

Jan 8, 2024 · Big Data

Didi's Big Data Cost Governance Practices and Framework

This article presents Didi's comprehensive big data cost governance approach, detailing the overall framework, data system architecture, asset management platform, Hadoop and Elasticsearch cost‑control practices, metadata‑driven optimization, and organizational insights for effective resource and budget management.

Metadatacost governanceresource optimization

0 likes · 19 min read

Didi's Big Data Cost Governance Practices and Framework

Architect

Dec 31, 2023 · Industry Insights

How Mooncake Automated API Documentation and Built a Metadata Hub

The article details how the Mooncake platform tackled outdated, manually‑maintained API docs by introducing naming conventions, a one‑click IntelliJ plugin, GitLab MR auto‑parsing, and a metadata center that supports debugging, mocking, and downstream consumption, saving developers hundreds of hours per release.

API documentationAutomationGitLab CI

0 likes · 18 min read

How Mooncake Automated API Documentation and Built a Metadata Hub

Ctrip Technology

Nov 23, 2023 · Big Data

Optimizing Data Warehouse Timeliness Using Metadata Lineage

This article presents a metadata‑driven approach to improve data warehouse timeliness by extracting upstream lineage, identifying over‑layered, duplicate, and critical‑path tasks, and applying targeted scheduling and code‑level optimizations, demonstrated with a hotel order wide‑table case study.

DAGData WarehouseLineage

0 likes · 7 min read

Optimizing Data Warehouse Timeliness Using Metadata Lineage

php Courses

Nov 21, 2023 · Backend Development

Using PHP 8 Attributes to Manage Code Metadata

This article explains PHP 8's new Attributes feature, describing what attributes are, how to attach them to classes and methods with examples like @Table and @Route, and demonstrates retrieving attribute values via reflection to enable flexible metadata management.

AttributesMetadataPHP8

0 likes · 4 min read

Using PHP 8 Attributes to Manage Code Metadata

DataFunSummit

Nov 10, 2023 · Operations

Data Model Governance Practices at Taobao (Tao Tian Group)

This article presents a comprehensive overview of Taobao's data model governance, covering background challenges, a four‑pillar solution framework, detailed practices such as invalid table decommissioning, source‑table consolidation, data handover, public‑layer operations, incremental control, productization, and future planning to improve efficiency, cost, and quality of large‑scale data models.

Data GovernanceMetadatamodel governance

0 likes · 26 min read

Data Model Governance Practices at Taobao (Tao Tian Group)

Huya Tech Engineering

Nov 10, 2023 · Operations

How a Unified Metadata Platform Boosts SRE Efficiency and Cuts Costs

This article describes how Huya built a unified metadata platform to break data silos across its SRE systems, enabling standardized data ingestion, correlation, and analysis that improve resource governance, root‑cause diagnosis, and overall cost‑efficiency for large‑scale live streaming services.

MetadataObservabilitySRE

0 likes · 13 min read

How a Unified Metadata Platform Boosts SRE Efficiency and Cuts Costs

Data Thinking Notes

Oct 29, 2023 · Big Data

How Banks Can Master Data Governance: 9 Core Domains Explained

This article outlines why banks need robust data governance, describes nine essential domains—including data models, metadata, standards, quality, lifecycle, distribution, exchange, security and services—and explains how big‑data techniques can drive innovation, risk control, and refined decision‑making in banking.

Metadata

0 likes · 17 min read

How Banks Can Master Data Governance: 9 Core Domains Explained

php Courses

Oct 23, 2023 · Backend Development

Using PHP 8 Attributes to Manage Code Metadata

This article explains PHP 8’s new Attributes feature, describing what attributes are, how to attach custom attributes such as @Table and @Route to classes and methods, and demonstrates retrieving attribute values via reflection, providing clear code examples for backend developers.

AttributesMetadataPHP

0 likes · 5 min read

JD Cloud Developers

Oct 19, 2023 · Frontend Development

How Dynamic Forms Transform Custom Business Workflows

This article explains what dynamic forms are, why they are needed for tenant‑specific business scenarios, and outlines a three‑step metadata‑driven implementation—including data partitioning, metadata design, and front‑end rendering—while also discussing their limitations.

Dynamic FormsMetadatabackend data partitioning

0 likes · 5 min read

How Dynamic Forms Transform Custom Business Workflows

dbaplus Community

Sep 6, 2023 · Backend Development

How to Scale a Schema‑Free Classification Platform to 100 Billion Records

This article explains how to design a classification‑information system that handles 100 billion rows, ten‑thousand dynamic attributes, and hundreds of thousands of QPS by using vertical partitioning, unified metadata services, and an external search layer for scalable storage and retrieval.

DatabasesMetadatabackend

0 likes · 12 min read

How to Scale a Schema‑Free Classification Platform to 100 Billion Records

DeWu Technology

Sep 4, 2023 · Backend Development

How Mooncake Automates API Docs, Builds a Metadata Hub, and Boosts Development Efficiency

This article examines the challenges of manual API documentation, introduces Mooncake’s standardized organization, the MooncakeUpload IntelliJ plugin for one‑click doc generation, GitLab MR auto‑parsing for continuous updates, and the API metadata center that enhances debugging, mocking, and cross‑team collaboration.

API documentationAutomationBackend Tools

0 likes · 19 min read

How Mooncake Automates API Docs, Builds a Metadata Hub, and Boosts Development Efficiency

Ximalaya Technology Team

Aug 17, 2023 · R&D Management

FeiKu: Ximalaya's Low‑Code Platform for Rapid Business Application Development

FeiKu, Ximalaya’s low‑code platform, lets business users design and publish full‑stack applications through drag‑and‑drop configuration, providing built‑in permission, workflow, scripting and API integration, which has already generated nearly 300 internal apps and dramatically cut repetitive development while still evolving performance and openness.

FeiKuMetadataPlatform

0 likes · 9 min read

FeiKu: Ximalaya's Low‑Code Platform for Rapid Business Application Development

DataFunTalk

Jul 26, 2023 · Big Data

Data Model Governance Practices at Taobao (Alibaba)

This article presents a comprehensive case study of Taobao's data model governance, detailing the background challenges, the four‑pillar solution framework, specific governance practices such as invalid table decommissioning, data handover, public layer operations, incremental control, productization, future plans, and a Q&A session.

AlibabaDataWorksMetadata

0 likes · 26 min read

Data Model Governance Practices at Taobao (Alibaba)

Architects Research Society

Jul 21, 2023 · Big Data

Understanding Data Fabric Architecture: Key Pillars for Modern Data Management and Integration

The article explains what Data Fabric (also called data weaving) is, outlines its four essential pillars—metadata collection, active metadata, knowledge‑graph management, and a robust integration backbone—and shows how D&A leaders can adopt this design to achieve agile, AI‑enabled data integration across hybrid and multi‑cloud environments.

AI/MLData ManagementMetadata

0 likes · 10 min read

Understanding Data Fabric Architecture: Key Pillars for Modern Data Management and Integration

Architects' Tech Alliance

Jun 13, 2023 · Fundamentals

HadaFS: A Scalable Burst Buffer File System for Exascale Supercomputers

The article introduces HadaFS, a novel burst‑buffer file system that combines the scalability and performance of local burst buffers with the data‑sharing and cost advantages of shared buffers, details its LTA architecture, metadata handling, and evaluates its superior performance on the SNS supercomputer against BeeGFS and traditional GFS solutions.

Burst BufferFile SystemHPC

0 likes · 16 min read

HadaFS: A Scalable Burst Buffer File System for Exascale Supercomputers

DataFunTalk

Jun 9, 2023 · Big Data

Cloud Music Data Governance Practice

This article presents a comprehensive case study of NetEase Cloud Music's data governance practice, covering data background, governance philosophy, detailed solutions across metadata, storage, compute, and model design, practical implementations, measurable cost savings, and future planning for sustainable data management.

HadoopMetadataSpark

0 likes · 15 min read

DataFunSummit

Jun 4, 2023 · Fundamentals

The Role of Metadata in Data Governance and Its Applications

Metadata serves as a foundational element of data governance, enabling analysis, monitoring, discovery, and understanding of data assets, while applications such as data lineage, impact analysis, and data mapping help organizations assess quality, trace origins, and optimize processing workflows.

Big DataInformation ManagementMetadata

0 likes · 5 min read

The Role of Metadata in Data Governance and Its Applications

IT Services Circle

May 30, 2023 · Fundamentals

Understanding Java Annotations: Concepts, Uses, and Implementation

This article explains Java annotations as metadata introduced in Java 5, covering their definition, built‑in and custom forms, purposes such as providing metadata, compile‑time checks, code generation and runtime processing, and demonstrates how to define, apply, and process them with code examples.

JavaMetadataReflection

0 likes · 6 min read

Understanding Java Annotations: Concepts, Uses, and Implementation

ITPUB

May 10, 2023 · Cloud Native

How Meituan’s MStore Achieves Scalable Storage‑Compute Separation in Cloud‑Native Environments

This article explains how Meituan’s storage team designed the MStore distributed storage platform to separate storage and compute, addressing scaling, cost, and reliability challenges of monolithic architectures, and details its cloud‑native components, data model, performance optimizations, observability, and the derived EBS block‑storage service.

MStoreMetadatacloud-native

0 likes · 16 min read

How Meituan’s MStore Achieves Scalable Storage‑Compute Separation in Cloud‑Native Environments

DataFunTalk

Apr 21, 2023 · Fundamentals

Data Architecture and Data Modeling Overview, Solutions, and Enterprise Case Studies

This article explains data architecture and data modeling fundamentals, presents DAMA DMBOK concepts, outlines four practical solutions for model design, standard management, automated change control, and business mapping, and shares an enterprise manufacturing case study with Q&A on governance and efficiency.

Data ArchitectureEnterprise DataMetadata

0 likes · 21 min read

Data Architecture and Data Modeling Overview, Solutions, and Enterprise Case Studies

ITPUB

Apr 9, 2023 · Artificial Intelligence

How ChatGPT Redefines Knowledge Acquisition: Six Practical Insights

The author shares a personal journey of using ChatGPT as a knowledge engine, illustrating six key benefits—answering complex questions, applying Occam's razor, simplifying concepts for beginners, enabling generative learning, fostering T‑shaped expertise, and mastering effective prompting—through concrete examples ranging from metadata explanations to Docker deployment steps.

AIChatGPTDocker

0 likes · 23 min read

How ChatGPT Redefines Knowledge Acquisition: Six Practical Insights

DataFunSummit

Apr 3, 2023 · Big Data

Evolution and Architecture of Data Lineage in Volcano Engine DataLeap

This article outlines the background, development stages, architectural evolution, key features such as incremental updates and quality metrics, and future directions of the data lineage capability within Volcano Engine's DataLeap big‑data governance platform.

Big DataDataLeapMetadata

0 likes · 18 min read

Evolution and Architecture of Data Lineage in Volcano Engine DataLeap

DataFunTalk

Mar 15, 2023 · Big Data

Evolution of Next‑Generation Cloud Data Platform Architecture

This technical presentation reviews the historical development of big data platforms, outlines the four generations of cloud data platform architectures, details the modern cloud‑native stack—including unified metadata, scheduling, and integration systems—and showcases a real‑world industrial manufacturing case with a Q&A session.

Cloud Data PlatformData ArchitectureMetadata

0 likes · 23 min read

Evolution of Next‑Generation Cloud Data Platform Architecture

Big Data Technology & Architecture

Mar 14, 2023 · Big Data

Comprehensive Guide to Data Lineage: Model Design, Optimization, and Use Cases at ByteDance

This article presents an in‑depth overview of data lineage at ByteDance, detailing the design of storage, display, abstraction, implementation, and storage layers, optimization techniques for real‑time updates and queries, open export methods, practical use cases across asset, development, governance, and security domains, and future directions.

Apache AtlasJanusGraphMetadata

0 likes · 20 min read

Comprehensive Guide to Data Lineage: Model Design, Optimization, and Use Cases at ByteDance

DataFunTalk

Mar 4, 2023 · Big Data

Understanding Data Governance: Challenges, Framework, and Practical Implementation

This article explains NetEase DataFang's comprehensive view on data governance, detailing the problems it solves, the structure of a governance system, and concrete steps for implementing integrated data development and governance across enterprises.

Data ManagementData QualityMetadata

0 likes · 11 min read

Understanding Data Governance: Challenges, Framework, and Practical Implementation

Alibaba Cloud Developer

Feb 14, 2023 · Big Data

Data Fabric vs Data Mesh: Choosing the Right Architecture for Modern Big Data Platforms

This article examines the inherent complexity of building big‑data platforms, compares the emerging concepts of Data Fabric and Data Mesh, outlines their architectural features, technology stacks, and practical implementation challenges, and offers guidance on when each approach is appropriate.

Big Data ArchitectureData FabricData Governance

0 likes · 31 min read

Data Fabric vs Data Mesh: Choosing the Right Architecture for Modern Big Data Platforms

DataFunSummit

Feb 11, 2023 · Big Data

Intelligent Metadata Governance for Power Data: Background, Solution, Value and Case Studies

This article presents a comprehensive overview of the intelligent metadata‑driven data governance framework implemented by Southern Power Grid Yunnan, detailing its background, challenges, architectural design, key AI‑enabled technologies, practical case studies, and the resulting business value for the power industry.

AIData QualityMetadata

0 likes · 14 min read

Intelligent Metadata Governance for Power Data: Background, Solution, Value and Case Studies

Baidu Intelligent Cloud Tech Hub

Feb 8, 2023 · Cloud Computing

How Baidu’s Next‑Gen Metadata Engine Powers Trillion‑Object Object Storage

This article details Baidu's Cloud Storage (BOS) architecture, the challenges of its legacy metadata system, and the design of a new generation metadata engine that enables trillion‑object buckets, million‑QPS performance, hierarchical namespaces, and intelligent lifecycle management.

BaiduMetadatacloud storage

0 likes · 14 min read

How Baidu’s Next‑Gen Metadata Engine Powers Trillion‑Object Object Storage

DataFunTalk

Jan 19, 2023 · Big Data

Data Governance Strategies: Concepts, Practices, and Case Studies

The article explains the importance of data governance for organizations handling big data, outlines narrow and broad governance approaches, presents strategic design principles, and shares practical case studies from leading companies, while also offering a downloadable ebook of governance strategies.

Case StudiesData ManagementData Security

0 likes · 7 min read

Data Governance Strategies: Concepts, Practices, and Case Studies

DataFunTalk

Jan 13, 2023 · Big Data

Data Governance Strategies and Practices: Insights from Leading Companies

The article explains the importance of data governance for organizations handling big data, distinguishes narrow and broad governance approaches, outlines strategic principles, and presents case studies from companies like Tencent, SF Tech, Huolala, and NetEase to illustrate effective governance practices.

Case StudyData QualityEnterprise Data

0 likes · 8 min read

Data Governance Strategies and Practices: Insights from Leading Companies

DataFunTalk

Dec 5, 2022 · Big Data

Data Governance Practices at ZTO Express: Challenges, Solutions, and Future Plans

The article details ZTO Express's data governance journey, covering company background, drivers and goals, challenges such as data asset inventory, standardization, quality, and modeling, and outlines their multi‑layered governance framework, practical implementations in data quality, model and metadata, and future plans.

Data PlatformMetadatalogistics

0 likes · 17 min read

Data Governance Practices at ZTO Express: Challenges, Solutions, and Future Plans

DeWu Technology

Nov 30, 2022 · Big Data

Fundamentals and Implementation of Data Lineage in Big Data Environments

Data lineage in big‑data environments tracks how data moves and transforms—from source tables through SQL processing to final storage—enabling management tasks such as domain segmentation, performance tuning, anomaly detection, and dependency verification, with implementations ranging from simple regex extraction to robust AST parsing and optimization, as used by tools like Alibaba DataWorks and Apache Atlas.

ASTBig DataHive

0 likes · 7 min read

Fundamentals and Implementation of Data Lineage in Big Data Environments

DataFunSummit

Nov 24, 2022 · Big Data

Metadata Management and Governance Practices at Wing Payment: Architecture, Techniques, and Future Outlook

This article explains how Wing Payment uses metadata as the foundation of its data‑governance practice, describing the challenges of data quality, efficiency, cost and security, the four‑step governance framework, the design of its metadata platform, and future directions such as multi‑source management and intelligent recommendation.

Data SecurityMaster DataMetadata

0 likes · 18 min read

Metadata Management and Governance Practices at Wing Payment: Architecture, Techniques, and Future Outlook

Big Data Technology & Architecture

Nov 22, 2022 · Big Data

Comprehensive Guide to Metadata Management, Data Quality, and Optimization in Big Data Systems

This article provides an in-depth overview of metadata concepts, their technical and business classifications, value in data management, applications such as data profiling and lineage, optimization techniques for compute and storage, lifecycle management, and comprehensive data quality assurance practices within large‑scale big data environments.

MetadataOptimizationbig-data

0 likes · 38 min read

Comprehensive Guide to Metadata Management, Data Quality, and Optimization in Big Data Systems

Data Thinking Notes

Nov 16, 2022 · Big Data

Why Metadata Management Is Essential for Data Warehouses

This article explains the concept of metadata, its role in data warehouses, why managing metadata is critical for building, maintaining, and scaling data warehouse systems, and outlines practical steps, use cases, and tools for effective metadata management.

Data GovernanceData WarehouseETL

0 likes · 15 min read

Why Metadata Management Is Essential for Data Warehouses

Past Memory Big Data

Nov 15, 2022 · Big Data

How Uber Accelerated Presto Queries with Alluxio Local Cache

Uber processes over 500,000 daily Presto queries across 20 clusters handling more than 50 PB of data, and by deploying Alluxio Local Cache on NVMe disks they raised cache‑hit rates from roughly 65% to over 90% while addressing real‑time partition updates, node churn, and cache‑size constraints.

AlluxioBig DataConsistent Hashing

0 likes · 15 min read

How Uber Accelerated Presto Queries with Alluxio Local Cache

DataFunSummit

Nov 11, 2022 · Big Data

Tencent Oula Data Governance Platform: Architecture, Practices, and Solutions

The article presents an in‑depth overview of Tencent's Oula data governance platform, describing its construction goals, core capabilities, DataOps‑driven development workflow, unified metric store, data map services, and practical Q&A on asset health scoring and data lineage, illustrating a comprehensive end‑to‑end big‑data governance solution.

DataOpsMetadataTencent

0 likes · 17 min read

Tencent Oula Data Governance Platform: Architecture, Practices, and Solutions

DataFunTalk

Nov 2, 2022 · Big Data

Tencent Oula Data Governance Platform: Architecture, Practices, and Solutions

Tencent's Oula platform, launched in 2019, provides a DataOps‑driven, end‑to‑end data governance solution covering data discovery, asset factory, metric platform, and governance engine, and the talk details its construction goals, data development governance, unified metric system, data map, and Q&A on asset health and lineage.

Data PlatformDataOpsMetadata

0 likes · 17 min read

Huolala Tech

Oct 27, 2022 · Big Data

Turning Big Data into Valuable Assets: The Business Case for Data Governance

Amid the explosive growth of big data, this article explains why systematic data governance—covering metadata, quality, lifecycle, and security—is essential for turning raw data into measurable business assets, reducing costs, and enhancing operational efficiency.

Big DataData GovernanceData Lifecycle

0 likes · 11 min read

Turning Big Data into Valuable Assets: The Business Case for Data Governance

Kuaishou Big Data

Oct 20, 2022 · Big Data

How Kuaishou Scaled Metadata Management for Big Data: Architecture & Lessons

This article outlines Kuaishou's evolution of metadata management from its early Hive‑centric stage to a unified 2.0 platform, detailing system architecture, key technologies, challenges, and future 3.0 vision for low‑code, automated, and intelligent data governance.

Big DataData GovernanceMetadata

0 likes · 15 min read

How Kuaishou Scaled Metadata Management for Big Data: Architecture & Lessons

ITPUB

Oct 20, 2022 · Big Data

Will HDFS Be Replaced? Analyzing Its Drawbacks and Future Alternatives

The article examines why Hadoop's Distributed File System may become obsolete by detailing its three main shortcomings—deployment complexity, metadata memory limits, and high replication overhead—and explores how newer architectures and erasure coding could address these issues.

Big DataDistributed File SystemHDFS

0 likes · 8 min read

Will HDFS Be Replaced? Analyzing Its Drawbacks and Future Alternatives

Past Memory Big Data

Oct 9, 2022 · Operations

How Cloud Music Scaled Data Governance: Practices, Metrics, and Lessons Learned

The article details Cloud Music’s data‑governance journey, covering early modeling standards, self‑service data tools, quality and metadata management, asset‑reuse improvements, and cost‑saving Spark optimizations, while sharing concrete metrics, processes, and the team’s systematic methodology.

Data GovernanceData WarehouseMetadata

0 likes · 18 min read

How Cloud Music Scaled Data Governance: Practices, Metrics, and Lessons Learned

IT Services Circle

Oct 5, 2022 · Databases

Debugging a NullPointerException During ShardingSphere Startup Caused by TiDB View Metadata

The article details a NullPointerException that occurs when ShardingSphere starts, explains how null values in TiDB‑generated view metadata trigger the error, describes the step‑by‑step investigation and reproduction using TiUP, and discusses several possible fixes and a call for community contributions.

JavaMetadataNullPointerException

0 likes · 4 min read

Debugging a NullPointerException During ShardingSphere Startup Caused by TiDB View Metadata