Tagged articles
91 articles
Page 1 of 1
JD Tech Talk
JD Tech Talk
Dec 12, 2025 · Big Data

Understanding Hudi Core Concepts: Timeline, Indexes, and Table Types Explained

This article explains Apache Hudi’s core concepts, including its timeline architecture, file layout, indexing mechanisms, and the two primary table types—Copy on Write and Merge on Read—along with their trade‑offs and the various query modes such as snapshot, time‑travel, and incremental queries.

Apache HudiBig DataData Lake
0 likes · 9 min read
Understanding Hudi Core Concepts: Timeline, Indexes, and Table Types Explained
JD Cloud Developers
JD Cloud Developers
Dec 12, 2025 · Big Data

Apache Hudi Core Concepts: Timeline, Indexes, Table Types & Queries

This article explains Apache Hudi’s core architecture, detailing the timeline mechanism, file layout, indexing strategies, the two main table types (Copy‑On‑Write and Merge‑On‑Read), and various query modes such as snapshot, time‑travel, read‑optimized and incremental queries.

Apache HudiBig DataData Lake
0 likes · 9 min read
Apache Hudi Core Concepts: Timeline, Indexes, Table Types & Queries
DataFunTalk
DataFunTalk
Apr 9, 2025 · Big Data

Highlights of the Apache Hudi Asia Technical Salon Hosted by Kuaishou – Practices and Innovations from Leading Companies

The Kuaishou‑hosted Apache Hudi Asia technical salon gathered over 230 attendees and featured seven experts from Kuaishou, Meituan, TikTok, Huawei, JD and others, who shared best practices, architecture designs, and performance optimizations for large‑scale data lake applications across AI, BI, and real‑time workloads.

AIApache HudiBatch Processing
0 likes · 14 min read
Highlights of the Apache Hudi Asia Technical Salon Hosted by Kuaishou – Practices and Innovations from Leading Companies
DataFunSummit
DataFunSummit
Apr 3, 2025 · Big Data

Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD

The Apache Hudi Asia technical salon held in Beijing on March 29 gathered over 230 on‑site participants and 16,000 online viewers, featuring expert talks from leading Chinese tech companies that showcased real‑world Hudi implementations, performance optimizations, and future roadmap for data‑lake technologies.

Apache HudiBig DataData Lake
0 likes · 13 min read
Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD
Kuaishou Tech
Kuaishou Tech
Apr 2, 2025 · Big Data

Apache Hudi Asia Summit Successfully Held

The first Apache Hudi Asia Summit in Beijing attracted over 230 attendees, featuring technical discussions on data lake optimization and case studies from companies like Fastly and Meituan.

Apache HudiBig DataData Lake
0 likes · 12 min read
Apache Hudi Asia Summit Successfully Held
DataFunSummit
DataFunSummit
Feb 23, 2025 · Big Data

Douyin Group’s ByteLake Data Lake Table Optimization and Management Practices

This article presents Douyin Group’s ByteLake, a heavily customized Apache Hudi‑based data lake table framework, detailing its core concepts, metadata services, write and read optimizations, operational challenges, a fully managed table management service, and its integration with the Amoro open‑source platform.

AmoroApache HudiBig Data
0 likes · 11 min read
Douyin Group’s ByteLake Data Lake Table Optimization and Management Practices
DataFunSummit
DataFunSummit
Sep 30, 2024 · Big Data

Apache Hudi from Zero to One: The Swiss Army Knife for Data Ingestion – Hudi Streamer (Part 9)

This article introduces Apache Hudi Streamer, a versatile Spark‑based data ingestion tool likened to a Swiss Army knife, detailing its core options—including table configuration, continuous mode, source classes, transformers, table services, catalog synchronization, and advanced features—while guiding users on practical pipeline setup.

Apache HudiBig DataSpark
0 likes · 10 min read
Apache Hudi from Zero to One: The Swiss Army Knife for Data Ingestion – Hudi Streamer (Part 9)
DataFunSummit
DataFunSummit
Sep 26, 2024 · Big Data

Apache Hudi Incremental Processing and Change Data Capture (CDC): Overview, Incremental Query, and CDC

This article explains Apache Hudi's incremental processing capabilities, covering an overview of the medallion architecture, detailed configuration for incremental queries, the introduction of Change Data Capture (CDC) with required table properties, and a review of how these features enable richer data insights in modern data lake environments.

Apache HudiBig DataChange Data Capture
0 likes · 9 min read
Apache Hudi Incremental Processing and Change Data Capture (CDC): Overview, Incremental Query, and CDC
DataFunSummit
DataFunSummit
Sep 14, 2024 · Big Data

Apache Hudi Concurrency Control: Overview, MVCC, and OCC

This article provides a comprehensive overview of concurrency control in Apache Hudi, explaining ACID properties, the role of MVCC and OCC, and how Hudi coordinates multiple writers and table services to achieve serializable scheduling while maintaining high performance.

Apache HudiBig DataConcurrency Control
0 likes · 8 min read
Apache Hudi Concurrency Control: Overview, MVCC, and OCC
DataFunSummit
DataFunSummit
Aug 31, 2024 · Big Data

Apache Hudi Clustering: Workflow and Layout Optimization Strategies (Part 6)

This article explains Apache Hudi's clustering service, detailing its workflow, three execution modes, and layout optimization strategies—including linear, Z‑order, and Hilbert space‑filling curves—to improve storage locality and query performance in large‑scale data lake environments.

Apache HudiBig DataSpace-filling Curves
0 likes · 8 min read
Apache Hudi Clustering: Workflow and Layout Optimization Strategies (Part 6)
DataFunSummit
DataFunSummit
Aug 4, 2024 · Big Data

Apache Hudi from Zero to One: Comprehensive Guide to Write Indexing (Part 4)

This article explains Apache Hudi’s write‑side indexing, detailing the indexing API, various index types—including simple, Bloom, bucket, HBase, and record‑level indexes—and their mechanisms, helping readers understand how Hudi validates record existence and optimizes updates and deletions.

Apache HudiBig DataData Lake
0 likes · 9 min read
Apache Hudi from Zero to One: Comprehensive Guide to Write Indexing (Part 4)
DataFunTalk
DataFunTalk
Apr 25, 2024 · Big Data

Apache Hudi 1.0: Design Reconsiderations and Key New Features

This article provides a comprehensive overview of Apache Hudi 1.0, detailing its architectural redesign, five major development directions, and the most important new capabilities such as LSM‑tree timeline, function indexes, file‑group readers/writers, partial updates, and non‑blocking concurrency control, along with performance evaluations and resource links.

Apache HudiBig DataFunction Index
0 likes · 14 min read
Apache Hudi 1.0: Design Reconsiderations and Key New Features
DataFunSummit
DataFunSummit
Mar 4, 2024 · Big Data

Near Real-Time Metric System Architecture for Dongchedi Used Car Business

This article introduces Dongchedi's near real‑time metric system architecture, covering business background, technical challenges, the unified storage‑compute and query service design using the Las lakehouse built on Apache Hudi, solutions to consistency issues, achieved results, and future plans for further real‑time improvements.

Apache HudiFlinkReal-time analytics
0 likes · 13 min read
Near Real-Time Metric System Architecture for Dongchedi Used Car Business
vivo Internet Technology
vivo Internet Technology
Dec 13, 2023 · Big Data

Hudi Data Lake Implementation and Optimization Practice at vivo

Vivo’s big‑data team deployed Apache Hudi to create a lakehouse that unifies streaming and batch workloads, leverages COW and MOR storage modes, automates small‑file clustering and compaction, and applies extensive version, streaming, batch, and lifecycle optimizations, delivering minute‑level latency, hundred‑million‑records‑per‑minute ingestion, and query speeds up to 20 % faster than Hive.

Apache HudiBatch ProcessingBig Data
0 likes · 11 min read
Hudi Data Lake Implementation and Optimization Practice at vivo
DataFunSummit
DataFunSummit
Oct 18, 2023 · Big Data

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

This article explains why Kuaishou built a data lake, outlines the shortcomings of its previous Lambda architecture, describes the adoption of Apache Hudi for unified batch‑stream processing, and details the five major technical challenges and the corresponding solutions implemented to improve performance, consistency, and operational reliability.

Apache HudiBig DataData Architecture
0 likes · 17 min read
Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions
DataFunTalk
DataFunTalk
Sep 13, 2023 · Big Data

Design and Implementation of a Lakehouse Data Platform Based on Apache Hudi at Taikang Life Insurance

This article details Taikang Life Insurance's end‑to‑end technical selection, architecture design, implementation, and custom enhancements of an Apache Hudi‑driven lakehouse platform for large‑scale health‑insurance data, covering background, component evaluation, performance benchmarking, multi‑layer architecture, and real‑world results.

Apache HudiBig DataData Governance
0 likes · 44 min read
Design and Implementation of a Lakehouse Data Platform Based on Apache Hudi at Taikang Life Insurance
DataFunTalk
DataFunTalk
Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data
0 likes · 18 min read
Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg
Big Data Technology & Architecture
Big Data Technology & Architecture
May 29, 2023 · Big Data

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

This article explains why Kuaishou built a data lake, describes its lake architecture based on Apache Hudi and Flink, outlines five major production challenges—including ingestion bottlenecks, snapshot queries, update bottlenecks, merge limitations, and operational reliability—and details the practical solutions and future roadmap.

Apache HudiFlinkdata engineering
0 likes · 18 min read
Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
May 28, 2023 · Big Data

Apache Hudi: Capabilities, Architecture, Use Cases, and Future Outlook

This article introduces Apache Hudi as a next‑generation streaming data‑lake platform, explains its core concepts, architecture, and table types, and showcases real‑world use cases at Tencent such as CDC ingestion, minute‑level real‑time warehousing, streaming analytics, multi‑stream joins, ad attribution, and stream‑to‑batch processing, while also outlining future directions.

Apache HudiCDCData Lake
0 likes · 16 min read
Apache Hudi: Capabilities, Architecture, Use Cases, and Future Outlook
ITPUB
ITPUB
Mar 28, 2023 · Big Data

How We Turned a Hive Data Warehouse into a Real‑Time Lakehouse with Apache Hudi

This article details the migration from a traditional Hive‑based data warehouse to a lakehouse architecture using Apache Hudi, covering the original Lambda setup, its pain points, lake‑vs‑warehouse differences, Hudi features, integration challenges, practical solutions, and future roadmap.

Apache HudiBig DataData Warehouse
0 likes · 11 min read
How We Turned a Hive Data Warehouse into a Real‑Time Lakehouse with Apache Hudi
DataFunTalk
DataFunTalk
Feb 25, 2023 · Big Data

T3 Travel’s Modern Data Stack and Feature Platform: Architecture and Practices

This article details T3 Travel’s exploration of the Modern Data Stack, describing its four‑point overview, business scenarios, the initial MDS implementation using Apache Hudi and Kyuubi, and the design of a feature platform that integrates Metricflow, Feast, and other components to support data processing, analytics, and machine‑learning workflows.

Apache HudiBig DataData Lake
0 likes · 22 min read
T3 Travel’s Modern Data Stack and Feature Platform: Architecture and Practices
DataFunTalk
DataFunTalk
Dec 27, 2022 · Big Data

Multi‑Stream Join and Concurrency Control in Apache Hudi: Design, Implementation, and Usage

This article presents a comprehensive solution for multi‑stream joins in Apache Hudi, detailing the challenges of dimension and multi‑stream joins, the novel storage‑layer join approach, timeline‑based concurrency control, marker mechanisms, early conflict detection, payload customization, and practical usage with Flink and Spark, along with performance benefits and future directions.

Apache HudiData LakeFlink
0 likes · 31 min read
Multi‑Stream Join and Concurrency Control in Apache Hudi: Design, Implementation, and Usage
DataFunTalk
DataFunTalk
Dec 23, 2022 · Big Data

Building a Lakehouse on Alibaba Cloud AnalyticDB (ADB) with Apache Hudi: Architecture, Challenges, and Practices

This article presents a comprehensive technical overview of Alibaba Cloud AnalyticDB's Lakehouse edition, detailing its unified architecture, key advantages, the challenges of ingesting billions of records with Apache Hudi, and the engineering solutions—including Flink integration, hotspot mitigation, memory optimization, OSS throttling handling, concurrent write support, lifecycle management, and TableService—that enable a cost‑effective, high‑performance lake‑to‑warehouse platform.

Apache HudiFlinkLakehouse
0 likes · 19 min read
Building a Lakehouse on Alibaba Cloud AnalyticDB (ADB) with Apache Hudi: Architecture, Challenges, and Practices
DataFunSummit
DataFunSummit
Nov 23, 2022 · Big Data

Lakehouse Analysis Service (LAS): Architecture, Challenges, and Service Design

The article introduces the Lakehouse Analysis Service (LAS), explains its layered architecture that unifies data lake and warehouse capabilities, discusses challenges with Apache Hudi metadata and consistency, and details the design of the unified MetaServer, Table Management Service, concurrency control, async compaction, event bus, and future roadmap.

Apache HudiData Lake
0 likes · 18 min read
Lakehouse Analysis Service (LAS): Architecture, Challenges, and Service Design
ByteDance Data Platform
ByteDance Data Platform
Nov 16, 2022 · Big Data

How ByteDance’s Data Lake Powers Near‑Real‑Time E‑Commerce Analytics

This article explains ByteDance’s data lake technology, its Apache Hudi‑based features, near‑real‑time architecture, and practical e‑commerce use cases such as marketing promotion, traffic diagnosis, logistics monitoring, risk governance, and operational monitoring, while outlining future challenges and plans.

Apache HudiBig Data ArchitectureData Lake
0 likes · 15 min read
How ByteDance’s Data Lake Powers Near‑Real‑Time E‑Commerce Analytics
ITPUB
ITPUB
Oct 15, 2022 · Big Data

Flink & Apache Hudi: Design, Practices, and Roadmap for Streaming Data Lakes

This talk introduces the evolution of data lakes, outlines Apache Hudi’s core features, details the Flink‑Hudi integration architecture—including write pipelines, small‑file handling, and read strategies—covers real‑world use cases such as near‑real‑time DB ingestion, OLAP, and ETL, and previews upcoming Hudi roadmap items.

Apache HudiBig DataData Lake
0 likes · 21 min read
Flink & Apache Hudi: Design, Practices, and Roadmap for Streaming Data Lakes
Big Data Technology Architecture
Big Data Technology Architecture
Oct 10, 2022 · Big Data

Integrating Apache Hudi with MinIO: A Comprehensive Tutorial

This tutorial explains how to set up Apache Hudi on cloud‑native object storage with MinIO, covering Hudi’s architecture, file format, timeline, write and read paths, core features, schema evolution, and step‑by‑step Spark commands for ingesting, updating, deleting, and querying data in a streaming data‑lake environment.

Apache HudiMinioSpark
0 likes · 26 min read
Integrating Apache Hudi with MinIO: A Comprehensive Tutorial
dbaplus Community
dbaplus Community
Sep 14, 2022 · Databases

How Apache Doris Enables Real‑Time Analysis of Hudi Data Lakes

This article explains the architecture of Apache Doris, introduces Apache Hudi as a data‑lake format, compares Lambda and Kappa approaches, and details the design, implementation steps, and future roadmap for querying Hudi tables directly from Doris.

Apache DorisApache HudiBig Data
0 likes · 10 min read
How Apache Doris Enables Real‑Time Analysis of Hudi Data Lakes
Shopee Tech Team
Shopee Tech Team
Sep 2, 2022 · Big Data

Shopee Data System Challenges and Apache Hudi Practices

Shopee tackled its data‑system bottlenecks by customizing Apache Hudi to provide unified stream‑batch integration, efficient state‑detail snapshots, and low‑latency wide‑table generation, using CDC‑based bootstrapping, COW/MOR tables, savepoints and partial updates, which cut latency to ten minutes, lowered resource use, and yielded several community‑backed enhancements.

Apache HudiBig DataData Integration
0 likes · 18 min read
Shopee Data System Challenges and Apache Hudi Practices
Big Data Technology Architecture
Big Data Technology Architecture
Aug 23, 2022 · Big Data

Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates

The Apache Hudi 0.12.0 release introduces a native Presto connector, archive‑beyond‑savepoint capability, file‑system based locking, new deltastreamer termination strategies, expanded Spark and Flink support, numerous performance enhancements, and a series of configuration and API updates for better data‑lake management.

Apache HudiFlinkPresto
0 likes · 12 min read
Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates
Big Data Technology Architecture
Big Data Technology Architecture
Aug 23, 2022 · Big Data

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

This article examines the technical differences and feature sets of Apache Hudi, Delta Lake, and Apache Iceberg, highlighting incremental pipelines, concurrency control, merge‑on‑read storage, partition evolution, multi‑mode indexing, and real‑world use cases to help practitioners choose the most suitable lakehouse solution for their workloads.

Apache HudiApache IcebergConcurrency Control
0 likes · 18 min read
Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures
DataFunTalk
DataFunTalk
Jul 16, 2022 · Big Data

Deep Dive into Apache Hudi 0.11.0: Multi‑Level Index, Spark SQL Enhancements, Flink Integration, and Other Improvements

The article provides an in‑depth overview of Apache Hudi 0.11.0, covering its new multi‑level index design, Spark SQL enhancements, Flink integration improvements, and additional performance and usability features aimed at boosting read/write efficiency in large‑scale data lake environments.

Apache HudiBig DataData Lake
0 likes · 15 min read
Deep Dive into Apache Hudi 0.11.0: Multi‑Level Index, Spark SQL Enhancements, Flink Integration, and Other Improvements
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 1, 2022 · Big Data

Curated List of Big Data Resources: ClickHouse, Apache Doris, and Apache Hudi

This article compiles a comprehensive set of Chinese-language resources covering major big-data technologies such as ClickHouse, Apache Doris, and Apache Hudi, including series on distributed tables, MergeTree, replication, optimization techniques, and practical tutorials, with direct links to each detailed guide.

Apache DorisApache HudiBig Data
0 likes · 6 min read
Curated List of Big Data Resources: ClickHouse, Apache Doris, and Apache Hudi
Bilibili Tech
Bilibili Tech
Jun 10, 2022 · Big Data

Incremental Data Lake Design and Hudi Core Optimizations with Flink

The article describes how combining Apache Flink with Hudi enables an incremental data lake that delivers near‑real‑time analytics by switching to merge‑on‑read, fixing log handling bugs, improving compaction planning, and refactoring table‑service scheduling, while showcasing use cases such as CDC ingestion, data quality control, and real‑time materialized views, and outlines future enhancements like optimistic concurrency and unified schema evolution.

Apache HudiCDCCompaction Optimization
0 likes · 21 min read
Incremental Data Lake Design and Hudi Core Optimizations with Flink
Big Data Technology Architecture
Big Data Technology Architecture
Jun 7, 2022 · Big Data

Multi-Modal Index in Apache Hudi 0.11.0: Design, Implementation, and Performance Benefits

This article explains the motivation, design principles, implementation details, and performance improvements of the new multi‑modal indexing subsystem introduced in Apache Hudi 0.11.0 for Lakehouse architectures, covering scalable metadata, ACID updates, fast lookups, file listing, data skipping, upsert performance, and future work.

Apache Hudiindexingmetadata
0 likes · 19 min read
Multi-Modal Index in Apache Hudi 0.11.0: Design, Implementation, and Performance Benefits
Big Data Technology & Architecture
Big Data Technology & Architecture
May 17, 2022 · Big Data

Apache Hudi: Core Concepts, Architecture, Storage Types, Write Operations, Querying, and Management

This article provides a comprehensive guide to Apache Hudi, covering its basic concepts, timeline architecture, storage types (Copy‑On‑Write and Merge‑On‑Read), write operations, DeltaStreamer usage, Hive/Spark/Presto query integration, data management, indexing, compaction, and best‑practice recommendations for big‑data lake workloads.

Apache HudiBig DataCopy-on-Write
0 likes · 43 min read
Apache Hudi: Core Concepts, Architecture, Storage Types, Write Operations, Querying, and Management
Big Data Technology & Architecture
Big Data Technology & Architecture
May 4, 2022 · Big Data

Apache Hudi 0.11.0 Release Highlights: Multi‑Mode Index, Data Skipping, Async Index, Spark & Flink Integration, and New Utilities

The Apache Hudi 0.11.0 release introduces multi‑mode metadata indexing, enhanced data‑skipping, asynchronous indexing, extensive Spark and Flink integration improvements, new bundle utilities, and expanded metadata synchronization with BigQuery, AWS Glue, and DataHub, while also adding bucket indexing and encryption support.

Apache HudiAsync IndexBig Data
0 likes · 13 min read
Apache Hudi 0.11.0 Release Highlights: Multi‑Mode Index, Data Skipping, Async Index, Spark & Flink Integration, and New Utilities
Big Data Technology Architecture
Big Data Technology Architecture
Apr 29, 2022 · Big Data

Halodoc’s Data Platform Evolution: From Redshift to a LakeHouse Architecture with Apache Hudi

This article describes how Halodoc’s data engineering team identified limitations of their Redshift‑based platform, evaluated a LakeHouse design, selected Apache Hudi for mutable data handling, and outlined the challenges and benefits of building a scalable, decoupled storage‑compute architecture for their growing healthcare services.

Apache HudiData Platformdata engineering
0 likes · 9 min read
Halodoc’s Data Platform Evolution: From Redshift to a LakeHouse Architecture with Apache Hudi
Shopee Tech Team
Shopee Tech Team
Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiBatch Processing
0 likes · 20 min read
Building Real-Time Data Warehouse with Flink + Hudi at Shopee
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 9, 2021 · Big Data

How Apache Hudi & Pulsar Enable Real‑Time CDC Data Lake Ingestion

This article explains CDC fundamentals, compares query‑based and log‑based capture, describes typical CDC‑to‑lake architectures using Pulsar and Hudi, dives into Hudi's core design, optimization techniques, and future roadmap, and provides practical insights for building scalable data lakes.

Apache HudiCDCPulsar
0 likes · 17 min read
How Apache Hudi & Pulsar Enable Real‑Time CDC Data Lake Ingestion
DataFunTalk
DataFunTalk
Sep 3, 2021 · Big Data

Building an Exabyte‑Scale Data Lake with Apache Hudi at ByteDance: Architecture, Design Choices, and Performance Optimizations

This article details ByteDance's implementation of an exabyte‑scale data lake using Apache Hudi, covering scenario requirements, engine selection, functional support, schema management, extensive performance tuning, and future directions, while also noting recruitment opportunities within the team.

Apache HudiBig DataByteDance
0 likes · 9 min read
Building an Exabyte‑Scale Data Lake with Apache Hudi at ByteDance: Architecture, Design Choices, and Performance Optimizations
DataFunTalk
DataFunTalk
Apr 27, 2021 · Big Data

Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System

This article describes how Linkflow migrated mutable customer data from MySQL to an Apache Hudi data lake using Debezium‑in‑Flink CDC, addressing challenges such as snapshot resumability, partial updates, row‑key merging, schema evolution, indexing, and concurrent writes to achieve minute‑level data freshness and improved offline processing performance.

Apache HudiBig DataCDC
0 likes · 21 min read
Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System
Big Data Technology Architecture
Big Data Technology Architecture
Apr 19, 2021 · Big Data

Reframing Apache Hudi as a Data Lake Platform: Vision, Capabilities, and Future Directions

Apache Hudi is being re‑positioned from a simple table format to a full‑featured data lake platform, offering transactional storage, MVCC concurrency, metadata services, Deltastreamer ingestion, and plans for cache and timeline metadata services, aligning its vision with modern lakehouse architectures.

Apache HudiTransactional Storagemetadata
0 likes · 5 min read
Reframing Apache Hudi as a Data Lake Platform: Vision, Capabilities, and Future Directions
DataFunTalk
DataFunTalk
Apr 18, 2021 · Big Data

Comparing Apache Hudi, Apache Iceberg, and Delta Lake for Data Lake Storage

This article compares Apache Hudi, Apache Iceberg, and Delta Lake, examining their storage formats, platform compatibility, update performance, concurrency guarantees, and integration with lakeFS to help readers choose the most suitable solution for their data lake use case.

Apache HudiApache IcebergDelta Lake
0 likes · 16 min read
Comparing Apache Hudi, Apache Iceberg, and Delta Lake for Data Lake Storage
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 14, 2020 · Big Data

Comparative Analysis of Apache Hudi, Apache CarbonData, and Delta Lake for Data Lake Solutions

This article examines the core requirements of data lakes and provides an in‑depth comparison of three major open‑source solutions—Apache Hudi, Apache CarbonData, and Delta Lake—highlighting their architectures, ACID support, query capabilities, and suitability for various real‑time and batch use cases.

ACIDApache CarbonDataApache Hudi
0 likes · 9 min read
Comparative Analysis of Apache Hudi, Apache CarbonData, and Delta Lake for Data Lake Solutions
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 2, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Features, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark to ingest, manage, and incrementally query large analytical datasets on HDFS‑compatible storage, offering features such as timeline management, copy‑on‑write and merge‑on‑read tables, and support for snapshot, incremental, and read‑optimized queries across engines like Hive, Spark SQL and Presto.

Apache HudiBig DataData Lake
0 likes · 12 min read
An Overview of Apache Hudi: Architecture, Features, and Query Types
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 23, 2020 · Big Data

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

This article introduces Apache Hudi, explaining its storage types, query views, timeline feature, typical use cases such as near‑real‑time ingestion and incremental pipelines, and provides a step‑by‑step Scala/Spark quick‑start guide with code examples for compiling, inserting, updating, querying, and syncing data to Hive.

Apache HudiBig DataData Lake
0 likes · 18 min read
Apache Hudi Overview, Core Concepts, and Quick‑Start Guide
Big Data Technology Architecture
Big Data Technology Architecture
Jun 28, 2020 · Big Data

Key Requirements for Building PB‑Scale Data Lakes and How Apache Hudi Meets Them

The article outlines the essential requirements for constructing petabyte‑scale data lakes—such as incremental CDC ingestion, log deduplication, storage management, ACID transactions, fast ETL, and compliance—and explains how Apache Hudi’s COW and Merge‑on‑Read architectures, async compaction, and advanced features address each need.

ACID TransactionsApache HudiAsync Compaction
0 likes · 13 min read
Key Requirements for Building PB‑Scale Data Lakes and How Apache Hudi Meets Them
Big Data Technology Architecture
Big Data Technology Architecture
May 31, 2020 · Big Data

Applying Apache Hudi in Medical Big Data: Architecture, Synchronization, Storage Choices, and Future Directions

This article examines the use of Apache Hudi for building a hospital‑wide medical big‑data platform, covering construction background, reasons for selecting Hudi, data synchronization methods, storage mode choices, query optimizations, and future development considerations.

Apache HudiCopy-on-WriteMedical Big Data
0 likes · 7 min read
Applying Apache Hudi in Medical Big Data: Architecture, Synchronization, Storage Choices, and Future Directions
Architect
Architect
May 12, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Concepts, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark and Hadoop‑compatible storage to provide efficient ingestion, incremental processing, and multiple query modes such as snapshot, incremental, and read‑optimized for large analytical datasets.

Apache HudiBig DataData Lake
0 likes · 11 min read
An Overview of Apache Hudi: Architecture, Concepts, and Query Types
Big Data Technology Architecture
Big Data Technology Architecture
Mar 24, 2020 · Big Data

Comparative Analysis of Delta Lake, Apache Iceberg, and Apache Hudi for Data Lake Solutions

This article examines the three leading open‑source data‑lake projects—Delta Lake, Apache Iceberg, and Apache Hudi—by outlining their origins, core problems they address, key features, and a detailed seven‑dimension comparison to help practitioners choose the most suitable solution for their scenarios.

Apache HudiApache IcebergComparison
0 likes · 17 min read
Comparative Analysis of Delta Lake, Apache Iceberg, and Apache Hudi for Data Lake Solutions
dbaplus Community
dbaplus Community
Mar 17, 2020 · Big Data

Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi

An in‑depth comparison of the three leading open‑source data lake platforms—Delta Lake, Apache Iceberg, and Apache Hudi—examines their origins, core challenges they address, key features, and performance across seven evaluation dimensions to guide practitioners in selecting the optimal solution for their workloads.

Apache HudiApache IcebergData Lake
0 likes · 15 min read
Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi
Big Data Technology Architecture
Big Data Technology Architecture
Feb 1, 2020 · Big Data

Apache Hudi 0.5.1 Release Highlights and Upgrade Guide

The Apache Hudi 0.5.1 release introduces upgraded Spark, Avro, Parquet and Kafka dependencies, new Scala support, timeline layout changes, CLI enhancements, DeltaStreamer parameter updates, Kafka offset enum revisions, key‑generator package relocation, Hive sync options, dynamic Bloom filter, bulk‑insert support, and AWS cloud storage compatibility.

Apache HudiDeltaStreamerKafka
0 likes · 6 min read
Apache Hudi 0.5.1 Release Highlights and Upgrade Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 23, 2020 · Big Data

Understanding Apache Hudi: Incremental Processing and Low‑Latency Data Management on Hadoop

This article explains how Apache Hudi enables efficient, low‑latency incremental data ingestion and processing on Hadoop by providing a unified service layer, describing its motivation, architecture, storage components, write and read paths, compaction, fault recovery, and incremental query capabilities.

Apache HudiHadoopIncremental Processing
0 likes · 17 min read
Understanding Apache Hudi: Incremental Processing and Low‑Latency Data Management on Hadoop