Tagged articles
82 articles
Page 1 of 1
DataFunSummit
DataFunSummit
May 20, 2026 · Big Data

How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture

The article explains how Kuaishou partnered with Apache Hudi to overhaul its ODS‑based data lake, addressing latency, storage cost, and complexity for AI and BI workloads, detailing the evolution from mysql‑to‑hive to mysql‑to‑hudi 1.0 and 2.0, the resulting performance gains, cost savings, and future roadmap.

AIBIBig Data
0 likes · 20 min read
How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture
DataFunSummit
DataFunSummit
Feb 8, 2026 · Big Data

Kuaishou’s Data Lake Upgrade with Hudi: Solving AI & BI Challenges

The article explains how Kuaishou modernized its data lake by partnering with Apache Hudi to address latency, storage cost, and consistency issues in both AI and BI pipelines, detailing architectural changes, new ingestion tools, partitioning strategies, compaction mechanisms, performance gains and future plans.

AIBIBig Data
0 likes · 20 min read
Kuaishou’s Data Lake Upgrade with Hudi: Solving AI & BI Challenges
dbaplus Community
dbaplus Community
Apr 20, 2025 · Databases

Why Wide Tables Fail and How to Design Them Efficiently

This article explains what wide tables are, why they are controversial, outlines three common design pitfalls with practical avoidance tips, and introduces three key technologies—ClickHouse, Cassandra, and Hudi/Iceberg—to help engineers build performant, maintainable wide‑table solutions in data warehouses.

Big DataClickHouseDatabase design
0 likes · 7 min read
Why Wide Tables Fail and How to Design Them Efficiently
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 12, 2024 · Big Data

Understanding Time Travel and Snapshot Retention in Lake Frameworks (Hudi & Paimon)

This article explains how lake frameworks like Hudi and Paimon implement Time Travel by recording older data versions, the snapshot retention policies that limit historical data access, and practical recommendations for managing snapshots and consumption patterns to reduce storage costs in large‑scale data warehouses.

Big DataHudiPaimon
0 likes · 7 min read
Understanding Time Travel and Snapshot Retention in Lake Frameworks (Hudi & Paimon)
DataFunSummit
DataFunSummit
Nov 12, 2024 · Big Data

Data Lake and Data Warehouse Architectures: Expert Insights from Industry Leaders

The article summarizes a roundtable discussion where experts compare four lake‑warehouse architectural patterns, explain their suitability for different business scenarios, contrast them with traditional data warehouses, and highlight practical considerations for choosing and evolving data platforms.

HudiIcebergLakehouse Architecture
0 likes · 6 min read
Data Lake and Data Warehouse Architectures: Expert Insights from Industry Leaders
DataFunSummit
DataFunSummit
Nov 8, 2024 · Big Data

Roundtable Discussion on Data Lake Technology Maturity and Governance Practices

Experts from Kuaishou, former Tencent, Ping An Insurance and others discuss data lake maturity, column‑level governance, resource management of unstructured data, and automated optimization techniques such as Iceberg small‑file merging, highlighting how these advances improve data quality and business decision‑making.

Big DataColumn-level GovernanceData Lake
0 likes · 6 min read
Roundtable Discussion on Data Lake Technology Maturity and Governance Practices
DataFunTalk
DataFunTalk
Oct 3, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions

Amid growing data demands, this article explains the data lake technology maturity curve, detailing lake‑warehouse architectural patterns, design principles, core functionalities, and the four leading open‑source solutions (Hudi, Iceberg, Delta Lake, Paimon) to guide enterprises in building flexible, scalable, and governed data platforms.

Big DataData ArchitectureData Lake
0 likes · 10 min read
Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions
DataFunTalk
DataFunTalk
Sep 24, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications

This article explains the rapid growth of data-driven businesses, the challenges of traditional data warehouses, and how modern data lake technologies such as Delta Lake, Hudi, Iceberg, and Paimon form a maturity curve that guides enterprises in architecture choices, design principles, core capabilities, and practical applications.

Big DataData LakeDelta Lake
0 likes · 12 min read
Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications

How Hudi MetaServer Transforms Metadata Management and Performance in Data Lakes

This article examines the challenges of Hudi metadata stored on HDFS, introduces the independently developed Hudi MetaServer for centralized metadata, visual management, unified permission control, TTL, expression payloads, and multi‑active scaling, and outlines future enhancements such as LLS, multi‑table fusion, and JDBC support.

Big DataData LakeHudi
0 likes · 11 min read
How Hudi MetaServer Transforms Metadata Management and Performance in Data Lakes
DataFunSummit
DataFunSummit
Jul 1, 2024 · Big Data

Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks

This article details JD Retail's transition from a complex Lambda architecture to a unified real‑time data pipeline using Flink, Hudi, and StarRocks, addressing data completeness versus latency, reducing maintenance costs, improving storage efficiency, and delivering faster, more consistent analytics for business users.

Data WarehouseFlinkHudi
0 likes · 13 min read
Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 1, 2024 · Big Data

Applying Data Lake (Hudi) at Kuaishou: Architecture Evolution, Use Cases, and Practice

This article details Kuashou's journey of adopting the Hudi data lake, covering business challenges, migration from Hive to Hudi, architectural redesign, promotion strategy, real‑world use cases such as CDC sync and batch‑stream integration, and key lessons learned for large‑scale data engineering.

Big Data ArchitectureData WarehouseHudi
0 likes · 11 min read
Applying Data Lake (Hudi) at Kuaishou: Architecture Evolution, Use Cases, and Practice
DataFunSummit
DataFunSummit
May 17, 2024 · Big Data

Comprehensive Hudi Real-Time Data Lake Ingestion Solutions

This article presents a complete guide to Hudi-based real-time data lake ingestion, covering overall data integration architecture, batch and streaming ingestion strategies, advanced table design, and practical recommendations for handling challenges such as deduplication, latency, partitioning, and performance optimization.

Batch ProcessingBig DataData Lake
0 likes · 12 min read
Comprehensive Hudi Real-Time Data Lake Ingestion Solutions
DataFunSummit
DataFunSummit
Mar 25, 2024 · Big Data

Exploring Real-Time Data Lake Practices at Kangaroo Cloud

This article shares Kangaroo Cloud's exploration and practice of a real-time data lake, covering background, data lake concepts, challenges, solution architecture using the Shuzhan platform with Iceberg/Hudi, CDC ingestion, small file handling, cross-cluster ingestion, materialized view acceleration, and future development plans.

CDCCross-Cluster IngestionHudi
0 likes · 12 min read
Exploring Real-Time Data Lake Practices at Kangaroo Cloud
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 20, 2024 · Big Data

Feishu ShenNuo's Real-Time Data Warehouse with Flink, Hudi, and Hologres

Feishu ShenNuo redesigned its data architecture by integrating Flink, Hudi, and Hologres to create a cloud‑native real‑time data warehouse that supports both millisecond‑level ad monitoring and minute‑level game operations, offering scalable storage, low‑latency queries, and comprehensive monitoring and capacity planning.

FlinkHologresHudi
0 likes · 16 min read
Feishu ShenNuo's Real-Time Data Warehouse with Flink, Hudi, and Hologres
DataFunTalk
DataFunTalk
Oct 28, 2023 · Big Data

Data Lake Architecture, Ingestion Options, Real-time Optimization, and Query Practices

This article presents a comprehensive overview of a unified data lake architecture, evaluates three ingestion solutions, details real‑time ingestion optimizations for Flink‑Hudi pipelines, and describes how Kyuubi enables unified query access across multiple engines, offering practical guidance for large‑scale data processing.

Big DataData LakeFlink
0 likes · 14 min read
Data Lake Architecture, Ingestion Options, Real-time Optimization, and Query Practices
Data Thinking Notes
Data Thinking Notes
Oct 11, 2023 · Big Data

How ByteDance Optimized Its E‑Commerce Data Lake to Cut Costs and Boost Real‑Time Accuracy

ByteDance revamped its traditional Lambda architecture for e‑commerce traffic data by introducing a new lake ingestion solution that reduces development and operational costs, ensures timely and stable data, and outlines future plans covering business background, ODS lake design, archiving tags, delayed data handling, and real‑time stability.

Big DataData LakeFlink
0 likes · 7 min read
How ByteDance Optimized Its E‑Commerce Data Lake to Cut Costs and Boost Real‑Time Accuracy
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 18, 2023 · Big Data

Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse

The article explains the mainstream Lambda data‑warehouse architecture, its benefits and challenges, then introduces Hudi as a lake‑house solution that unifies real‑time and offline storage, describes the multi‑layer service design, and showcases three practical scenarios—stream processing, real‑time multidimensional analysis, and stream‑batch data reuse—demonstrating how the integrated architecture improves latency, cost, and operational complexity.

Batch ProcessingData WarehouseHudi
0 likes · 13 min read
Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse
DataFunTalk
DataFunTalk
Sep 4, 2023 · Big Data

Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment

This article presents a comprehensive overview of a batch‑stream unified storage solution built on Hudi and the Lakehouse Analysis Service (LAS), covering background challenges, architectural design, data organization, read/write mechanisms, BTS architecture, real‑world deployment scenarios, and future development plans.

Batch-StreamData WarehouseHudi
0 likes · 22 min read
Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment
DataFunTalk
DataFunTalk
Aug 28, 2023 · Big Data

Practical Experience of an E‑commerce Platform’s Offline and Real‑time Data Warehouse

This article shares the practical architecture, technology selection, implementation details, and evolution of an e‑commerce platform’s offline and real‑time data warehouses, covering data modeling, processing pipelines, system components such as Hive, Spark, Flink, ClickHouse, Doris, and Hudi, and the lessons learned from multiple production deployments.

Big DataClickHouseData Warehouse
0 likes · 18 min read
Practical Experience of an E‑commerce Platform’s Offline and Real‑time Data Warehouse
Data Thinking Notes
Data Thinking Notes
Aug 27, 2023 · Big Data

How ByteDance’s LAS Team Unified Real‑Time and Offline Warehousing with a Lakehouse Solution

This article analyzes the shortcomings of mainstream Lambda‑style data warehouse architectures, introduces Hudi‑based lakehouse design principles, details the three‑layer unified storage architecture, data distribution, model and read/write mechanisms, and showcases real‑time streaming, multidimensional analysis, and stream‑batch reuse scenarios along with future roadmap plans.

HudiLakehouseStreaming
0 likes · 14 min read
How ByteDance’s LAS Team Unified Real‑Time and Offline Warehousing with a Lakehouse Solution
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 21, 2023 · Big Data

Key Features and Benefits of Lakehouse Frameworks Hudi, Iceberg, and Paimon

This note outlines how Hudi, Iceberg, and Paimon provide unified batch‑stream storage, UPSERT support, time‑travel capabilities, and lower development costs, enabling a streaming‑warehouse architecture that offers near‑real‑time latency, consistent semantics, persisted intermediate results, and easier historical data repair.

Batch ProcessingHudiIceberg
0 likes · 5 min read
Key Features and Benefits of Lakehouse Frameworks Hudi, Iceberg, and Paimon

How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake

This article analyzes the rise of lake‑house architecture in the Hadoop ecosystem, compares the technical capabilities of Hudi, Iceberg and Delta Lake, details implementation enhancements such as MOR and multi‑writer support, showcases Flink integration, presents a real‑time marketing use case, and outlines future development directions.

Big DataData GovernanceDelta Lake
0 likes · 14 min read
How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake
DataFunTalk
DataFunTalk
Jul 10, 2023 · Big Data

Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture

This article presents a comprehensive overview of Lakehouse‑based in‑lake warehousing, covering common data‑lake misconceptions, the evolution from databases to data warehouses and lakes, the advantages of Lakehouse over traditional architectures, a reference multi‑layer architecture, typical use cases, challenges, future plans, and a brief Q&A.

Big Data ArchitectureData LakeData Warehouse
0 likes · 20 min read
Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture
DataFunTalk
DataFunTalk
Apr 10, 2023 · Big Data

Interview on Data Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase data‑lake technology manager Ma Jin explains the distinction between data lakes and lakehouses, reviews the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, evaluates feature maturity and performance trade‑offs, and discusses systematic versus non‑systematic adoption in enterprises.

Big DataData LakehouseDelta Lake
0 likes · 13 min read
Interview on Data Lakehouse: Current Applications, Challenges, and Evolution
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 6, 2023 · Big Data

Real-Time Data Warehouse Solutions with Hudi: Scenarios, Challenges, and Optimizations

This article presents an in‑depth overview of real‑time data‑warehouse scenarios, discusses challenges such as timeliness, update efficiency, and resource consumption, and details practical solutions using Apache Hudi, Flink, Presto, and related optimizations for ingestion, indexing, compaction, and query performance.

Big DataData LakeFlink
0 likes · 17 min read
Real-Time Data Warehouse Solutions with Hudi: Scenarios, Challenges, and Optimizations
Liulishuo Tech Team
Liulishuo Tech Team
Nov 17, 2022 · Big Data

Real‑time Data Warehouse Architecture and Technical Solution at Liulishuo

This article describes Liulishuo's migration to a Flink‑based real‑time data warehouse, covering background, benefits, technology selection (storage, Flink platform, dimension table connectors), overall architecture, concrete Hudi and Elasticsearch ingestion examples, processing SQL, and future outlook for unified batch‑streaming storage.

ElasticsearchFlinkHudi
0 likes · 15 min read
Real‑time Data Warehouse Architecture and Technical Solution at Liulishuo
DataFunSummit
DataFunSummit
Nov 4, 2022 · Big Data

Real-Time Data Lake Practice at ByteDance: Architecture, Challenges, and Solutions

ByteDance’s data platform team explains their real‑time data lake implementation, covering its evolving definition, six core capabilities, challenges such as data management, concurrent updates, performance and log ingestion, and detailed case studies of multi‑stage deployment, indexing, metadata services, and future roadmap.

HudiReal-time Data LakeStreaming
0 likes · 32 min read
Real-Time Data Lake Practice at ByteDance: Architecture, Challenges, and Solutions
StarRocks
StarRocks
Nov 4, 2022 · Big Data

Building a High‑Performance, Cost‑Effective Cloud Lakehouse with StarRocks and EMR

This article explains how to design and implement a cloud‑native Lakehouse using StarRocks and Tencent Cloud EMR, covering core technical requirements, a five‑layer architecture, data ingestion with Iceberg/Hudi, performance tricks like Z‑order clustering, cost‑control through elastic scaling, and the key product features of EMR StarRocks.

Big DataEMRHudi
0 likes · 24 min read
Building a High‑Performance, Cost‑Effective Cloud Lakehouse with StarRocks and EMR

How a Leading E‑commerce Platform Built a Scalable Data Warehouse with Lambda & Hudi

This article explains how an e‑commerce company designed and implemented a modern data warehouse—combining batch Spark jobs, real‑time Flink streams, and Hudi data‑lake storage—to handle terabytes of daily logs, ensure data quality, and provide fast, reliable analytics for business decision‑making.

Data LakeData WarehouseETL
0 likes · 16 min read
How a Leading E‑commerce Platform Built a Scalable Data Warehouse with Lambda & Hudi
DataFunTalk
DataFunTalk
Oct 4, 2022 · Big Data

Near‑Real‑Time Data Lake Practices in TikTok E‑commerce Data Warehouse

The presentation by TikTok e‑commerce data‑warehouse engineer Ma Wenyuan explains data‑lake characteristics, near‑real‑time architecture, and practical e‑commerce use cases, highlighting Apache Hudi features, hybrid batch‑stream processing, and future challenges for scaling and integration.

Data LakeHudiStreaming
0 likes · 13 min read
Near‑Real‑Time Data Lake Practices in TikTok E‑commerce Data Warehouse
ITPUB
ITPUB
Sep 24, 2022 · Big Data

How ByteDance Scales Real‑Time Data Warehouses with Hudi and Flink

This article details ByteDance's practical experience building real‑time data warehouses on a data lake using Hudi, Flink, and related optimizations, covering scenario analysis, architecture, performance challenges, and future roadmap for scalable, low‑latency analytics.

FlinkHudi
0 likes · 19 min read
How ByteDance Scales Real‑Time Data Warehouses with Hudi and Flink
DataFunTalk
DataFunTalk
Sep 17, 2022 · Big Data

Real-Time Data Warehouse Practices with Hudi at ByteDance

This presentation details ByteDance's real‑time data‑warehouse implementations using Apache Hudi, covering scenario classifications, challenges of traditional offline warehouses, practical solutions for ingestion, upsert, validation, indexing, query optimization, and future plans for extensible indexing and unified batch‑stream processing.

Data LakeHudiStreaming
0 likes · 16 min read
Real-Time Data Warehouse Practices with Hudi at ByteDance
DataFunSummit
DataFunSummit
Sep 7, 2022 · Big Data

Integrating Apache Doris with Hudi: Architecture, Design, and Implementation

This article explains the background, architecture, design choices, and step‑by‑step implementation for enabling Apache Doris to query Hudi data lake tables, covering Doris features, Hudi formats, Lambda/Kappa architectures, solution alternatives, and future roadmap for real‑time analytics.

Apache DorisBig DataData Lake
0 likes · 10 min read
Integrating Apache Doris with Hudi: Architecture, Design, and Implementation
DataFunTalk
DataFunTalk
Aug 10, 2022 · Big Data

Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service

The article reviews recent developments in data‑lake table formats—Delta Lake 2.0, Iceberg, and Hudi—examining their features, benchmark results, and ecosystem impact, and then introduces Arctic, an open‑source streaming lakehouse service built on Iceberg that aims to bridge batch‑stream gaps for enterprises.

BenchmarkData LakeDelta Lake
0 likes · 24 min read
Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service
ITPUB
ITPUB
Jul 24, 2022 · Databases

How Apache Doris Enables Real‑Time Queries on Hudi Data Lakes

This article explains Apache Doris’s architecture, introduces the Hudi data‑lake format, compares Lambda and Kappa approaches, and details the design and implementation of Doris’s Hudi external table support, including practical steps, code examples, and future roadmap.

Apache DorisBig DataData Lake
0 likes · 10 min read
How Apache Doris Enables Real‑Time Queries on Hudi Data Lakes
DataFunTalk
DataFunTalk
Jul 18, 2022 · Big Data

Integrating Apache Doris with Hudi: Design, Implementation, and Future Plans

This article introduces Apache Doris, an MPP analytical database, and explains how it integrates with the Hudi data lake format, covering architectural features, design choices, implementation steps including external table creation and query processing, and outlines future enhancements for supporting MOR snapshots and incremental queries.

Apache DorisData LakeHudi
0 likes · 12 min read
Integrating Apache Doris with Hudi: Design, Implementation, and Future Plans
DataFunTalk
DataFunTalk
Jul 14, 2022 · Big Data

Real‑Time Data Lake Practices at ByteDance and Alibaba: Architecture, Challenges, and Solutions

This article presents detailed case studies of ByteDance and Alibaba implementing real‑time data lake solutions with Hudi and Flink, describing the business drivers, architectural challenges, and the specific technical strategies such as unified metadata layers, optimistic locking, scalable hash indexing, and CDC‑based incremental ETL to achieve low‑latency, high‑throughput data processing.

FlinkHudiReal-time Data Lake
0 likes · 9 min read
Real‑Time Data Lake Practices at ByteDance and Alibaba: Architecture, Challenges, and Solutions
DataFunTalk
DataFunTalk
May 24, 2022 · Big Data

Integrating Apache Flink with Apache Hudi: From Data Warehouse to Data Lake

This article explains how Apache Flink integrates with Apache Hudi to enable real‑time data lake ingestion, covering the evolution from traditional data warehouses to data lakes, Hudi’s core concepts such as timeline and file grouping, copy‑on‑write vs merge‑on‑read modes, and Flink’s CDC‑based ETL pipeline.

Big DataCDCData Lake
0 likes · 18 min read
Integrating Apache Flink with Apache Hudi: From Data Warehouse to Data Lake
DataFunTalk
DataFunTalk
May 23, 2022 · Big Data

Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions

ByteDance shares its real‑time data lake implementation, covering the evolving definition of data lakes, six core capabilities, challenges such as data management, weak concurrent updates, performance, and log ingestion, and detailed solutions including Hudi Metastore Server, bucket indexing, multi‑stage use cases, and future roadmap.

Batch ProcessingHudiReal-time Data Lake
0 likes · 32 min read
Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions
Bilibili Tech
Bilibili Tech
Apr 25, 2022 · Big Data

Optimizing Full Partition Tables with Zipper Tables, Hudi+Flink CDC, and Data Warehouse Strategies

Facing server‑hardware constraints, Bilibili’s data platform replaced wasteful full‑partition tables with a zipper‑table approach—preserving change history while cutting storage from petabytes to terabytes—and complemented it with Hudi + Flink CDC for near‑real‑time updates, dramatically lowering I/O, compute usage and latency.

Big DataFlink CDCHudi
0 likes · 11 min read
Optimizing Full Partition Tables with Zipper Tables, Hudi+Flink CDC, and Data Warehouse Strategies
Yanxuan Tech Team
Yanxuan Tech Team
Mar 29, 2022 · Big Data

How NetEase Yanxuan Built a Real‑Time Data Lake to Boost Efficiency

This article explains how NetEase Yanxuan evolved from a traditional data‑warehouse pipeline to a cloud‑native data‑lake architecture, detailing the business challenges, design choices, technology stack (Delta, Iceberg, Hudi), implementation steps, and the resulting gains in real‑time data access, cost reduction, and feature‑engineering support.

Data LakeDelta LakeHudi
0 likes · 18 min read
How NetEase Yanxuan Built a Real‑Time Data Lake to Boost Efficiency
StarRocks
StarRocks
Mar 23, 2022 · Databases

Accelerating Zepp Health’s Analytics with StarRocks: An OLAP Case Study

Facing inflexible point‑lookup limits and slow query times on HBase, Zepp Health redesigned its massive event‑tracking data pipeline—migrating ingestion through Kafka, Flink, and Hudi to a StarRocks‑based OLAP layer—achieving sub‑100 ms average query latency, 20 % storage savings, and dramatically faster multi‑dimensional analytics.

Big DataFlinkHudi
0 likes · 9 min read
Accelerating Zepp Health’s Analytics with StarRocks: An OLAP Case Study
ByteDance Data Platform
ByteDance Data Platform
Feb 25, 2022 · Big Data

Optimizing SparkSQL: ByteDance EMR’s Data Lake Integration and Multi‑Tenant Server

ByteDance’s EMR team details how they integrated data‑lake engines such as Hudi and Iceberg into SparkSQL, streamlined jar management, built a custom Spark SQL Server with Hive compatibility, multi‑tenant support, engine pre‑warming, and transaction capabilities, dramatically improving performance and resource efficiency for enterprise workloads.

EMRHudiIceberg
0 likes · 11 min read
Optimizing SparkSQL: ByteDance EMR’s Data Lake Integration and Multi‑Tenant Server
Volcano Engine Developer Services
Volcano Engine Developer Services
Feb 16, 2022 · Big Data

ByteDance’s Journey to a Unified Data Lake with Flink and Hudi

This article recounts ByteDance’s evolution from batch‑only Flink pipelines to a unified data‑lake integration platform, detailing the three integration modes, challenges with Spark‑based CDC, the decision to adopt Hudi over Iceberg, and how Hudi’s indexing and Merge‑On‑Read formats enable near‑real‑time analytics at massive scale.

CDCFlinkHudi
0 likes · 10 min read
ByteDance’s Journey to a Unified Data Lake with Flink and Hudi
DataFunTalk
DataFunTalk
Feb 3, 2022 · Big Data

Improving Data Processing Efficiency at Kuaishou with Apache Hudi

This article explains how Kuashou tackled latency and efficiency problems in large‑scale data pipelines by adopting Apache Hudi, detailing the pain points, reasons for choosing Hudi, its architecture, model design, handling of bursty updates, back‑fill scenarios, and operational safeguards.

Big DataData LakeFlink
0 likes · 13 min read
Improving Data Processing Efficiency at Kuaishou with Apache Hudi
ByteDance Data Platform
ByteDance Data Platform
Dec 31, 2021 · Big Data

How ByteDance Leverages Hudi for a Real‑Time Data Lake Platform

This article introduces ByteDance’s real‑time data lake platform built on Apache Hudi, covering Hudi fundamentals, table types, indexing, practical use cases, platform optimizations, and future roadmap, illustrating how the system enables low‑latency, scalable analytics across batch and streaming workloads.

HudiLakehousemetadata management
0 likes · 11 min read
How ByteDance Leverages Hudi for a Real‑Time Data Lake Platform
Big Data Technology Architecture
Big Data Technology Architecture
Nov 23, 2021 · Big Data

Step-by-Step Guide to Setting Up Flink CDC with MySQL, Hudi, and Hive Integration on a Hadoop Cluster

This comprehensive tutorial walks through configuring a Hadoop‑based environment (Flink 1.13.1, Scala 2.11, CDH 6.2.0, Hive 2.1.1, Hudi 0.10), compiling Hudi, setting up Flink and MySQL binlog, creating CDC source and Hudi sink tables, running Flink jobs, and synchronizing the results to Hive partitions for query via Hive and Presto.

CDCFlinkHive
0 likes · 15 min read
Step-by-Step Guide to Setting Up Flink CDC with MySQL, Hudi, and Hive Integration on a Hadoop Cluster
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Oct 18, 2021 · Big Data

Applying and Practicing Apache Hudi on Tongcheng Elong: Architecture, Challenges, and Solutions

This article describes the background, design choices, and practical challenges of using Apache Hudi for data updates on the Tongcheng Elong platform, analyzes three architectural alternatives, details Hudi's core configurations and write strategies, and presents concrete solutions to version compatibility, upsert semantics, insert behavior, partition management, streaming backlog monitoring, and business‑specific requirements, culminating in a productized Hudi service and future roadmap.

HiveHudiUpsert
0 likes · 18 min read
Applying and Practicing Apache Hudi on Tongcheng Elong: Architecture, Challenges, and Solutions
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 24, 2021 · Big Data

Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake

This article provides an in-depth overview of data lake concepts, definitions, and essential features, followed by detailed case studies of enterprise data lake implementations and comparative analysis of leading data lake table formats—Iceberg, Hudi, and Delta Lake—highlighting their architectures, capabilities, and trade‑offs.

Data LakeDelta LakeFlink
0 likes · 19 min read
Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake
DataFunTalk
DataFunTalk
May 11, 2021 · Big Data

Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake

This article details Baixin Bank's construction of a Flink‑driven real‑time computing platform integrated with Hudi as a real‑time data lake, covering background, architecture, data collection, transformation, storage layers, technical challenges, future roadmap, and practical lessons for similar big‑data initiatives.

Big DataFlinkHudi
0 likes · 12 min read
Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake
Beike Product & Technology
Beike Product & Technology
Feb 21, 2019 · Big Data

DATABUS Data Integration Platform: Architecture, Capabilities, and TiDB Ecosystem

The article presents an in‑depth overview of the DATABUS data integration platform, detailing its background, current challenges, core capabilities such as data syncing, metadata automation, real‑time subscriptions, and its reliance on TiDB, TiSpark, Hudi, and related big‑data technologies to enable near‑real‑time data warehousing.

Big DataData IntegrationHive
0 likes · 13 min read
DATABUS Data Integration Platform: Architecture, Capabilities, and TiDB Ecosystem