Tagged articles

Apache Hudi

92 articles · Page 1 of 1

Dec 12, 2025 · Big Data

Understanding Hudi Core Concepts: Timeline, Indexes, and Table Types Explained

This article explains Apache Hudi’s core concepts, including its timeline architecture, file layout, indexing mechanisms, and the two primary table types—Copy on Write and Merge on Read—along with their trade‑offs and the various query modes such as snapshot, time‑travel, and incremental queries.

Apache HudiBig DataData Lake

0 likes · 9 min read

Understanding Hudi Core Concepts: Timeline, Indexes, and Table Types Explained

JD Cloud Developers

Dec 12, 2025 · Big Data

Apache Hudi Core Concepts: Timeline, Indexes, Table Types & Queries

This article explains Apache Hudi’s core architecture, detailing the timeline mechanism, file layout, indexing strategies, the two main table types (Copy‑On‑Write and Merge‑On‑Read), and various query modes such as snapshot, time‑travel, read‑optimized and incremental queries.

Apache HudiBig DataData Lake

0 likes · 9 min read

Apache Hudi Core Concepts: Timeline, Indexes, Table Types & Queries

Past Memory Big Data

Dec 12, 2025 · Big Data

How Uber Reduced Data Freshness from Hours to Minutes Using Flink Streaming

Uber rebuilt its data‑lake ingestion pipeline with Apache Flink, replacing batch jobs with a streaming architecture that cuts data freshness from hours to minutes, lowers compute usage by 25%, and solves challenges like small‑file proliferation, partition skew, and checkpoint‑commit synchronization at petabyte scale.

Apache FlinkApache HudiData Freshness

0 likes · 10 min read

How Uber Reduced Data Freshness from Hours to Minutes Using Flink Streaming

DataFunSummit

Dec 10, 2025 · Big Data

How Apache Hudi Powers the Next‑Gen AI‑Native Lakehouse: Insights from the Asia Meetup

The article recaps the Apache Hudi Asia Meetup hosted by JD, covering community updates, JD's data‑lake challenges, the upcoming Hudi 1.1 release, JD's architectural redesign, Kuaishou's real‑time lake adoption, and Huawei Cloud's deep optimizations, all aimed at building an AI‑native, real‑time lakehouse.

AI-nativeApache HudiData Lake

0 likes · 13 min read

How Apache Hudi Powers the Next‑Gen AI‑Native Lakehouse: Insights from the Asia Meetup

JD Retail Technology

Dec 1, 2025 · Big Data

How Apache Hudi 1.1 Powers AI‑Native Lakehouse and Real‑Time Data Lakes

The JD‑hosted Apache Hudi Meetup showcased the 1.1 release’s pluggable table format, Flink performance gains, LSM‑Tree MoR redesign, and AI‑native features such as vector indexing, illustrating how the open‑source lakehouse is evolving to meet BI and multimodal AI workloads.

AIApache HudiBig Data

0 likes · 12 min read

How Apache Hudi 1.1 Powers AI‑Native Lakehouse and Real‑Time Data Lakes

JD Tech Talk

Oct 16, 2025 · Big Data

Understanding Apache Hudi Core Concepts: Timeline, File Layout, and Table Types

This article explains Apache Hudi's architecture, including its timeline mechanism, file layout, indexing strategies, table types (COW and MOR), query options, storage format versioning, backward compatibility, and key configuration settings for managing data lake tables.

Apache HudiBig DataCopy-on-Write

0 likes · 8 min read

Understanding Apache Hudi Core Concepts: Timeline, File Layout, and Table Types

DataFunTalk

Apr 9, 2025 · Big Data

Highlights of the Apache Hudi Asia Technical Salon Hosted by Kuaishou – Practices and Innovations from Leading Companies

The Kuaishou‑hosted Apache Hudi Asia technical salon gathered over 230 attendees and featured seven experts from Kuaishou, Meituan, TikTok, Huawei, JD and others, who shared best practices, architecture designs, and performance optimizations for large‑scale data lake applications across AI, BI, and real‑time workloads.

AIApache HudiBatch Processing

0 likes · 14 min read

Highlights of the Apache Hudi Asia Technical Salon Hosted by Kuaishou – Practices and Innovations from Leading Companies

DataFunSummit

Apr 3, 2025 · Big Data

Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD

The Apache Hudi Asia technical salon held in Beijing on March 29 gathered over 230 on‑site participants and 16,000 online viewers, featuring expert talks from leading Chinese tech companies that showcased real‑world Hudi implementations, performance optimizations, and future roadmap for data‑lake technologies.

Apache HudiBig DataData Lake

0 likes · 13 min read

Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD

Kuaishou Tech

Apr 2, 2025 · Big Data

Apache Hudi Asia Summit Successfully Held

The first Apache Hudi Asia Summit in Beijing attracted over 230 attendees, featuring technical discussions on data lake optimization and case studies from companies like Fastly and Meituan.

Apache HudiBig DataData Engineering

0 likes · 12 min read

Apache Hudi Asia Summit Successfully Held

DataFunSummit

Feb 23, 2025 · Big Data

Douyin Group’s ByteLake Data Lake Table Optimization and Management Practices

This article presents Douyin Group’s ByteLake, a heavily customized Apache Hudi‑based data lake table framework, detailing its core concepts, metadata services, write and read optimizations, operational challenges, a fully managed table management service, and its integration with the Amoro open‑source platform.

AmoroApache HudiBig Data

0 likes · 11 min read

Douyin Group’s ByteLake Data Lake Table Optimization and Management Practices

DataFunSummit

Oct 11, 2024 · Big Data

Kuaishou’s Data Lake Technical Maturity Curve: Challenges and Solutions with Apache Hudi

Kuaishou’s data‑lake initiative tackled exploding offline warehouse costs, redundant model proliferation, and data‑consistency complexities by adopting Apache Hudi’s schema‑evolution capabilities and real‑time lake ingestion, improving cross‑team collaboration and narrowing the real‑time‑offline data gap.

Apache HudiData Engineering

0 likes · 6 min read

Kuaishou’s Data Lake Technical Maturity Curve: Challenges and Solutions with Apache Hudi

DataFunSummit

Oct 1, 2024 · Big Data

Apache Hudi from Zero to One: Highlighting Key Features of Version 1.0 (Part 10)

The article explains Apache Hudi’s three‑layer architecture and details four major 1.0 enhancements—LSM‑tree timeline, non‑blocking concurrency control, file‑group reader/writer APIs, and function indexes—while providing a brief review and links to the Hudi 1.x RFC.

Apache HudiBig DataConcurrency Control

0 likes · 9 min read

Apache Hudi from Zero to One: Highlighting Key Features of Version 1.0 (Part 10)

DataFunSummit

Sep 30, 2024 · Big Data

Apache Hudi from Zero to One: The Swiss Army Knife for Data Ingestion – Hudi Streamer (Part 9)

This article introduces Apache Hudi Streamer, a versatile Spark‑based data ingestion tool likened to a Swiss Army knife, detailing its core options—including table configuration, continuous mode, source classes, transformers, table services, catalog synchronization, and advanced features—while guiding users on practical pipeline setup.

Apache HudiBig DataSpark

0 likes · 10 min read

Apache Hudi from Zero to One: The Swiss Army Knife for Data Ingestion – Hudi Streamer (Part 9)

DataFunSummit

Sep 26, 2024 · Big Data

Apache Hudi Incremental Processing and Change Data Capture (CDC): Overview, Incremental Query, and CDC

This article explains Apache Hudi's incremental processing capabilities, covering an overview of the medallion architecture, detailed configuration for incremental queries, the introduction of Change Data Capture (CDC) with required table properties, and a review of how these features enable richer data insights in modern data lake environments.

Apache HudiBig DataChange Data Capture

0 likes · 9 min read

Apache Hudi Incremental Processing and Change Data Capture (CDC): Overview, Incremental Query, and CDC

DataFunSummit

Sep 14, 2024 · Big Data

Apache Hudi Concurrency Control: Overview, MVCC, and OCC

This article provides a comprehensive overview of concurrency control in Apache Hudi, explaining ACID properties, the role of MVCC and OCC, and how Hudi coordinates multiple writers and table services to achieve serializable scheduling while maintaining high performance.

Apache HudiBig DataConcurrency Control

0 likes · 8 min read

Apache Hudi Concurrency Control: Overview, MVCC, and OCC

DataFunSummit

Aug 31, 2024 · Big Data

Apache Hudi Clustering: Workflow and Layout Optimization Strategies (Part 6)

This article explains Apache Hudi's clustering service, detailing its workflow, three execution modes, and layout optimization strategies—including linear, Z‑order, and Hilbert space‑filling curves—to improve storage locality and query performance in large‑scale data lake environments.

Apache HudiBig DataClustering

0 likes · 8 min read

Apache Hudi Clustering: Workflow and Layout Optimization Strategies (Part 6)

DataFunSummit

Aug 30, 2024 · Big Data

Kuaishou's Data Lake Journey with Apache Hudi: Architecture Evolution, Use Cases, and Lessons Learned

The article details Kuaishou's adoption of a data lake powered by Apache Hudi, covering the challenges of growing data warehouses, the migration from Hive to Hudi, concrete business case studies, promotion strategies, and key takeaways for large‑scale data engineering.

Apache HudiBig DataData Lake

0 likes · 12 min read

Kuaishou's Data Lake Journey with Apache Hudi: Architecture Evolution, Use Cases, and Lessons Learned

DataFunSummit

Aug 19, 2024 · Big Data

Apache Hudi from Zero to One: Introduction to Table Services – Compaction, Cleaning, and Indexing (Part 5)

This article introduces Apache Hudi's table services, explaining the concepts, execution modes, and detailed workflows of compaction, cleaning, and indexing, and how they optimize storage layout and read/write performance in large‑scale data lake environments.

Apache HudiBig DataCleaning

0 likes · 8 min read

Apache Hudi from Zero to One: Introduction to Table Services – Compaction, Cleaning, and Indexing (Part 5)

DataFunSummit

Aug 4, 2024 · Big Data

Apache Hudi from Zero to One: Comprehensive Guide to Write Indexing (Part 4)

This article explains Apache Hudi’s write‑side indexing, detailing the indexing API, various index types—including simple, Bloom, bucket, HBase, and record‑level indexes—and their mechanisms, helping readers understand how Hudi validates record existence and optimizes updates and deletions.

Apache HudiBig DataData Lake

0 likes · 9 min read

Apache Hudi from Zero to One: Comprehensive Guide to Write Indexing (Part 4)

DataFunSummit

Aug 3, 2024 · Big Data

Apache Hudi Write Process: From Zero to One – Part 3 (Understanding Write Flow and Operations)

This article explains the complete Apache Hudi write pipeline, detailing each step from client creation to commit, and describes the various write operations such as Upsert, Insert, Bulk Insert, Delete, Delete Partition, and Insert‑Overwrite, providing a comprehensive overview for data‑lake practitioners.

Apache HudiBig DataData Lake

0 likes · 12 min read

Apache Hudi Write Process: From Zero to One – Part 3 (Understanding Write Flow and Operations)

DataFunSummit

Jun 28, 2024 · Big Data

Apache Hudi from Zero to One – Part 2: Reading Process and Query Types (Spark Example)

This article explains how Apache Hudi integrates with Spark to read data, detailing the Spark‑SQL planning stages, the Spark‑Hudi read workflow, and the four main Hudi query types—snapshot, read‑optimized, time‑travel, and incremental—along with example SQL commands and code snippets.

Apache HudiBig DataData Lake

0 likes · 11 min read

Apache Hudi from Zero to One – Part 2: Reading Process and Query Types (Spark Example)

DataFunSummit

Jun 19, 2024 · Big Data

Apache Hudi from Zero to One: Introduction to Hudi’s Storage Format (Part 1)

This article introduces Apache Hudi’s storage format, explaining the table layout, metadata and data file organization, the naming conventions of timeline actions, and the trade‑offs between Copy‑on‑Write and Merge‑on‑Read table types for transactional data lakes.

Apache HudiBig DataData Lake

0 likes · 8 min read

Apache Hudi from Zero to One: Introduction to Hudi’s Storage Format (Part 1)

DataFunTalk

Apr 25, 2024 · Big Data

Apache Hudi 1.0: Design Reconsiderations and Key New Features

This article provides a comprehensive overview of Apache Hudi 1.0, detailing its architectural redesign, five major development directions, and the most important new capabilities such as LSM‑tree timeline, function indexes, file‑group readers/writers, partial updates, and non‑blocking concurrency control, along with performance evaluations and resource links.

Apache HudiBig DataFunction Index

0 likes · 14 min read

Apache Hudi 1.0: Design Reconsiderations and Key New Features

DataFunSummit

Mar 4, 2024 · Big Data

Near Real-Time Metric System Architecture for Dongchedi Used Car Business

This article introduces Dongchedi's near real‑time metric system architecture, covering business background, technical challenges, the unified storage‑compute and query service design using the Las lakehouse built on Apache Hudi, solutions to consistency issues, achieved results, and future plans for further real‑time improvements.

Apache HudiFlinkreal-time analytics

0 likes · 13 min read

Near Real-Time Metric System Architecture for Dongchedi Used Car Business

DataFunTalk

Jan 9, 2024 · Big Data

Analyzing Lakehouse Storage Systems: Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Hudi, and Iceberg

This article examines the design of lakehouse storage systems by comparing Delta Lake, Apache Hudi, and Apache Iceberg, focusing on metadata management, Merge‑On‑Read mechanisms, and a series of query and write performance optimizations with real‑world EMR case studies.

Apache HudiApache IcebergBig Data

0 likes · 16 min read

Analyzing Lakehouse Storage Systems: Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Hudi, and Iceberg

vivo Internet Technology

Dec 13, 2023 · Big Data

Hudi Data Lake Implementation and Optimization Practice at vivo

Vivo’s big‑data team deployed Apache Hudi to create a lakehouse that unifies streaming and batch workloads, leverages COW and MOR storage modes, automates small‑file clustering and compaction, and applies extensive version, streaming, batch, and lifecycle optimizations, delivering minute‑level latency, hundred‑million‑records‑per‑minute ingestion, and query speeds up to 20 % faster than Hive.

Apache HudiBatch ProcessingBig Data

0 likes · 11 min read

Hudi Data Lake Implementation and Optimization Practice at vivo

DataFunSummit

Oct 18, 2023 · Big Data

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

This article explains why Kuaishou built a data lake, outlines the shortcomings of its previous Lambda architecture, describes the adoption of Apache Hudi for unified batch‑stream processing, and details the five major technical challenges and the corresponding solutions implemented to improve performance, consistency, and operational reliability.

Apache HudiBig DataData Architecture

0 likes · 17 min read

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

Data Thinking Notes

Oct 11, 2023 · Big Data

How Taikang Life Built a Scalable Lakehouse with Apache Hudi for Big Health Data

This article details Taikang Life's end‑to‑end design and implementation of a lakehouse‑style distributed data platform built on Apache Hudi, covering background, technical selection, architecture, custom Hudi extensions for the health insurance domain, performance benchmarks, real‑world results, and future work.

Apache HudiFlinkHealthcare

0 likes · 45 min read

How Taikang Life Built a Scalable Lakehouse with Apache Hudi for Big Health Data

DataFunTalk

Sep 13, 2023 · Big Data

Design and Implementation of a Lakehouse Data Platform Based on Apache Hudi at Taikang Life Insurance

This article details Taikang Life Insurance's end‑to‑end technical selection, architecture design, implementation, and custom enhancements of an Apache Hudi‑driven lakehouse platform for large‑scale health‑insurance data, covering background, component evaluation, performance benchmarking, multi‑layer architecture, and real‑world results.

Apache HudiBig DataData Governance

0 likes · 44 min read

Design and Implementation of a Lakehouse Data Platform Based on Apache Hudi at Taikang Life Insurance

DataFunTalk

Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data

0 likes · 18 min read

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

DataFunSummit

Jun 6, 2023 · Big Data

Optimizing Real-Time Data Lake Queries on Huawei Cloud with Apache Hudi: Architecture, Indexing, and Performance Enhancements

This article introduces Huawei Cloud's real-time data lake query optimizations using Apache Hudi, covering Hudi's query capabilities, clustering and MDT optimizations, various index types (Min‑max, Lucene, bitmap), caching strategies, and future plans for performance improvements.

Apache HudiData LakeHuawei Cloud

0 likes · 18 min read

Optimizing Real-Time Data Lake Queries on Huawei Cloud with Apache Hudi: Architecture, Indexing, and Performance Enhancements

DataFunSummit

Jun 3, 2023 · Big Data

Kuaishou’s Data Lake Architecture with Apache Hudi: Design, Challenges, Solutions, and Future Plans

This article presents Kuaishou’s journey in building a data lake using Apache Hudi, detailing the lake architecture, key challenges such as ingestion bottlenecks and update inefficiencies, the solutions implemented, practical case studies, and the roadmap for future enhancements.

Apache HudiData LakeFlink

0 likes · 20 min read

Kuaishou’s Data Lake Architecture with Apache Hudi: Design, Challenges, Solutions, and Future Plans

Big Data Technology & Architecture

May 29, 2023 · Big Data

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

This article explains why Kuaishou built a data lake, describes its lake architecture based on Apache Hudi and Flink, outlines five major production challenges—including ingestion bottlenecks, snapshot queries, update bottlenecks, merge limitations, and operational reliability—and details the practical solutions and future roadmap.

Apache HudiData EngineeringFlink

0 likes · 18 min read

DataFunSummit

May 28, 2023 · Big Data

Apache Hudi: Capabilities, Architecture, Use Cases, and Future Outlook

This article introduces Apache Hudi as a next‑generation streaming data‑lake platform, explains its core concepts, architecture, and table types, and showcases real‑world use cases at Tencent such as CDC ingestion, minute‑level real‑time warehousing, streaming analytics, multi‑stream joins, ad attribution, and stream‑to‑batch processing, while also outlining future directions.

Apache HudiCDCData Lake

0 likes · 16 min read

Apache Hudi: Capabilities, Architecture, Use Cases, and Future Outlook

DataFunTalk

May 15, 2023 · Big Data

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

This article explains why Kuaishou built a data lake, describes its Hudi‑based architecture, outlines five major challenges encountered during implementation, and presents the solutions and future development plans, illustrating performance improvements and practical use cases across various business scenarios.

Apache HudiBig DataData Lake

0 likes · 19 min read

ITPUB

Mar 28, 2023 · Big Data

How We Turned a Hive Data Warehouse into a Real‑Time Lakehouse with Apache Hudi

This article details the migration from a traditional Hive‑based data warehouse to a lakehouse architecture using Apache Hudi, covering the original Lambda setup, its pain points, lake‑vs‑warehouse differences, Hudi features, integration challenges, practical solutions, and future roadmap.

Apache HudiBig DataData Warehouse

0 likes · 11 min read

How We Turned a Hive Data Warehouse into a Real‑Time Lakehouse with Apache Hudi

StarRing Big Data Open Lab

Mar 22, 2023 · Big Data

Why Lakehouse Architecture Is Revolutionizing Data Analytics: Hudi vs Iceberg

This article explains how the lakehouse integrated architecture combines data lake and data warehouse capabilities, outlines its key features, compares three implementation paths, and provides an in‑depth technical overview of Apache Hudi and Apache Iceberg for modern big‑data analytics.

Apache HudiApache IcebergData Lake

0 likes · 15 min read

Why Lakehouse Architecture Is Revolutionizing Data Analytics: Hudi vs Iceberg

Big Data Technology & Architecture

Mar 20, 2023 · Big Data

Using SparkSQL to Connect and Operate with Apache Hudi: Configuration, Table Creation, Data Manipulation, and Deletion

This guide demonstrates how to configure Hive metastore, connect SparkSQL to Apache Hudi, create COW and MOR tables, perform insert, update, merge, delete, and insert‑overwrite operations, and illustrates each step with executable code snippets and sample results.

Apache HudiBig DataData Lake

0 likes · 14 min read

Using SparkSQL to Connect and Operate with Apache Hudi: Configuration, Table Creation, Data Manipulation, and Deletion

DataFunTalk

Feb 25, 2023 · Big Data

T3 Travel’s Modern Data Stack and Feature Platform: Architecture and Practices

This article details T3 Travel’s exploration of the Modern Data Stack, describing its four‑point overview, business scenarios, the initial MDS implementation using Apache Hudi and Kyuubi, and the design of a feature platform that integrates Metricflow, Feast, and other components to support data processing, analytics, and machine‑learning workflows.

Apache HudiBig DataData Lake

0 likes · 22 min read

T3 Travel’s Modern Data Stack and Feature Platform: Architecture and Practices

Big Data Technology Architecture

Feb 24, 2023 · Big Data

Implementing Change Data Capture (CDC) on Data Lake Formats with Apache Hudi

This article reviews lake‑format concepts, Apache Hudi architecture, CDC fundamentals, design considerations for CDC on lake formats, implementation details of Hudi CDC, and streaming optimizations including automated lake‑table management and a simplified StreamingSQL for Spark.

Apache HudiCDCDelta Lake

0 likes · 19 min read

Implementing Change Data Capture (CDC) on Data Lake Formats with Apache Hudi

DataFunTalk

Dec 27, 2022 · Big Data

Multi‑Stream Join and Concurrency Control in Apache Hudi: Design, Implementation, and Usage

This article presents a comprehensive solution for multi‑stream joins in Apache Hudi, detailing the challenges of dimension and multi‑stream joins, the novel storage‑layer join approach, timeline‑based concurrency control, marker mechanisms, early conflict detection, payload customization, and practical usage with Flink and Spark, along with performance benefits and future directions.

Apache HudiData LakeFlink

0 likes · 31 min read

Multi‑Stream Join and Concurrency Control in Apache Hudi: Design, Implementation, and Usage

DataFunTalk

Dec 23, 2022 · Big Data

Building a Lakehouse on Alibaba Cloud AnalyticDB (ADB) with Apache Hudi: Architecture, Challenges, and Practices

This article presents a comprehensive technical overview of Alibaba Cloud AnalyticDB's Lakehouse edition, detailing its unified architecture, key advantages, the challenges of ingesting billions of records with Apache Hudi, and the engineering solutions—including Flink integration, hotspot mitigation, memory optimization, OSS throttling handling, concurrent write support, lifecycle management, and TableService—that enable a cost‑effective, high‑performance lake‑to‑warehouse platform.

Apache HudiFlinkLakehouse

0 likes · 19 min read

Building a Lakehouse on Alibaba Cloud AnalyticDB (ADB) with Apache Hudi: Architecture, Challenges, and Practices

Big Data Technology & Architecture

Dec 19, 2022 · Big Data

Near Real-Time Data Lake Practices in TikTok E-commerce: Architecture, Techniques, and Case Studies

This article presents a comprehensive overview of TikTok e-commerce's near‑real‑time data lake implementation, detailing data lake characteristics, architecture choices, practical use cases across analysis and operations, and for future challenges and plans.

Apache HudiBig DataData Lake

0 likes · 16 min read

Near Real-Time Data Lake Practices in TikTok E-commerce: Architecture, Techniques, and Case Studies

DataFunSummit

Nov 23, 2022 · Big Data

Lakehouse Analysis Service (LAS): Architecture, Challenges, and Service Design

The article introduces the Lakehouse Analysis Service (LAS), explains its layered architecture that unifies data lake and warehouse capabilities, discusses challenges with Apache Hudi metadata and consistency, and details the design of the unified MetaServer, Table Management Service, concurrency control, async compaction, event bus, and future roadmap.

Apache HudiData Lake

0 likes · 18 min read

Lakehouse Analysis Service (LAS): Architecture, Challenges, and Service Design

ByteDance Data Platform

Nov 16, 2022 · Big Data

How ByteDance’s Data Lake Powers Near‑Real‑Time E‑Commerce Analytics

This article explains ByteDance’s data lake technology, its Apache Hudi‑based features, near‑real‑time architecture, and practical e‑commerce use cases such as marketing promotion, traffic diagnosis, logistics monitoring, risk governance, and operational monitoring, while outlining future challenges and plans.

Apache HudiBig Data ArchitectureData Lake

0 likes · 15 min read

How ByteDance’s Data Lake Powers Near‑Real‑Time E‑Commerce Analytics

ITPUB

Oct 15, 2022 · Big Data

Flink & Apache Hudi: Design, Practices, and Roadmap for Streaming Data Lakes

This talk introduces the evolution of data lakes, outlines Apache Hudi’s core features, details the Flink‑Hudi integration architecture—including write pipelines, small‑file handling, and read strategies—covers real‑world use cases such as near‑real‑time DB ingestion, OLAP, and ETL, and previews upcoming Hudi roadmap items.

Apache HudiBig DataData Lake

0 likes · 21 min read

Flink & Apache Hudi: Design, Practices, and Roadmap for Streaming Data Lakes

DataFunTalk

Oct 14, 2022 · Big Data

Exploring Flink and Apache Hudi for Streaming Data Lakes: Design, Practices, and Roadmap

This article presents a comprehensive overview of using Flink with Apache Hudi to build streaming data lake solutions, covering Hudi's background, core features, Flink‑Hudi integration design, practical use cases, recent roadmap updates, and a Q&A session.

Apache HudiData LakeFlink

0 likes · 19 min read

Exploring Flink and Apache Hudi for Streaming Data Lakes: Design, Practices, and Roadmap

Big Data Technology & Architecture

Oct 13, 2022 · Big Data

Hudi Clustering After Batch Processing: Merging Small Files Before Streaming

This guide details how to execute Apache Hudi file clustering after a batch job and before streaming, using Spark commands to merge numerous small HDFS files into larger ones, configure clustering and cleaning policies, and verify the results with HDFS counts.

Apache HudiBig DataData Lake

0 likes · 15 min read

Hudi Clustering After Batch Processing: Merging Small Files Before Streaming

Big Data Technology Architecture

Oct 10, 2022 · Big Data

Integrating Apache Hudi with MinIO: A Comprehensive Tutorial

This tutorial explains how to set up Apache Hudi on cloud‑native object storage with MinIO, covering Hudi’s architecture, file format, timeline, write and read paths, core features, schema evolution, and step‑by‑step Spark commands for ingesting, updating, deleting, and querying data in a streaming data‑lake environment.

Apache HudiSparkminio

0 likes · 26 min read

Integrating Apache Hudi with MinIO: A Comprehensive Tutorial

DataFunSummit

Oct 3, 2022 · Big Data

Optimizing Point‑Query Performance in Presto with Apache Hudi Data Skipping and Layout Techniques

This article explains how Huawei Cloud leverages Apache Hudi and HetuEngine (Presto) to improve point‑query performance on Lakehouse architectures through data layout optimization, file‑skipping techniques, metadata tables, and extensive benchmark results demonstrating multi‑fold speedups.

Apache HudiBig DataData Skipping

0 likes · 11 min read

Optimizing Point‑Query Performance in Presto with Apache Hudi Data Skipping and Layout Techniques

dbaplus Community

Sep 14, 2022 · Databases

How Apache Doris Enables Real‑Time Analysis of Hudi Data Lakes

This article explains the architecture of Apache Doris, introduces Apache Hudi as a data‑lake format, compares Lambda and Kappa approaches, and details the design, implementation steps, and future roadmap for querying Hudi tables directly from Doris.

Apache DorisApache HudiBig Data

0 likes · 10 min read

How Apache Doris Enables Real‑Time Analysis of Hudi Data Lakes

Shopee Tech Team

Sep 2, 2022 · Big Data

Shopee Data System Challenges and Apache Hudi Practices

Shopee tackled its data‑system bottlenecks by customizing Apache Hudi to provide unified stream‑batch integration, efficient state‑detail snapshots, and low‑latency wide‑table generation, using CDC‑based bootstrapping, COW/MOR tables, savepoints and partial updates, which cut latency to ten minutes, lowered resource use, and yielded several community‑backed enhancements.

Apache HudiBig DataData Integration

0 likes · 18 min read

Shopee Data System Challenges and Apache Hudi Practices

Big Data Technology Architecture

Aug 23, 2022 · Big Data

Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates

The Apache Hudi 0.12.0 release introduces a native Presto connector, archive‑beyond‑savepoint capability, file‑system based locking, new deltastreamer termination strategies, expanded Spark and Flink support, numerous performance enhancements, and a series of configuration and API updates for better data‑lake management.

Apache HudiFlinkSpark

0 likes · 12 min read

Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates

Big Data Technology Architecture

Aug 23, 2022 · Big Data

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

This article examines the technical differences and feature sets of Apache Hudi, Delta Lake, and Apache Iceberg, highlighting incremental pipelines, concurrency control, merge‑on‑read storage, partition evolution, multi‑mode indexing, and real‑world use cases to help practitioners choose the most suitable lakehouse solution for their workloads.

Apache HudiApache IcebergConcurrency Control

0 likes · 18 min read

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

DataFunTalk

Jul 16, 2022 · Big Data

Deep Dive into Apache Hudi 0.11.0: Multi‑Level Index, Spark SQL Enhancements, Flink Integration, and Other Improvements

The article provides an in‑depth overview of Apache Hudi 0.11.0, covering its new multi‑level index design, Spark SQL enhancements, Flink integration improvements, and additional performance and usability features aimed at boosting read/write efficiency in large‑scale data lake environments.

Apache HudiBig DataData Lake

0 likes · 15 min read

Deep Dive into Apache Hudi 0.11.0: Multi‑Level Index, Spark SQL Enhancements, Flink Integration, and Other Improvements

Big Data Technology & Architecture

Jul 1, 2022 · Big Data

Curated List of Big Data Resources: ClickHouse, Apache Doris, and Apache Hudi

This article compiles a comprehensive set of Chinese-language resources covering major big-data technologies such as ClickHouse, Apache Doris, and Apache Hudi, including series on distributed tables, MergeTree, replication, optimization techniques, and practical tutorials, with direct links to each detailed guide.

Apache DorisApache HudiBig Data

0 likes · 6 min read

Curated List of Big Data Resources: ClickHouse, Apache Doris, and Apache Hudi

Bilibili Tech

Jun 10, 2022 · Big Data

Incremental Data Lake Design and Hudi Core Optimizations with Flink

The article describes how combining Apache Flink with Hudi enables an incremental data lake that delivers near‑real‑time analytics by switching to merge‑on‑read, fixing log handling bugs, improving compaction planning, and refactoring table‑service scheduling, while showcasing use cases such as CDC ingestion, data quality control, and real‑time materialized views, and outlines future enhancements like optimistic concurrency and unified schema evolution.

Apache HudiCDCCompaction Optimization

0 likes · 21 min read

Incremental Data Lake Design and Hudi Core Optimizations with Flink

Big Data Technology Architecture

Jun 7, 2022 · Big Data

Multi-Modal Index in Apache Hudi 0.11.0: Design, Implementation, and Performance Benefits

This article explains the motivation, design principles, implementation details, and performance improvements of the new multi‑modal indexing subsystem introduced in Apache Hudi 0.11.0 for Lakehouse architectures, covering scalable metadata, ACID updates, fast lookups, file listing, data skipping, upsert performance, and future work.

Apache HudiIndexingMetadata

0 likes · 19 min read

Multi-Modal Index in Apache Hudi 0.11.0: Design, Implementation, and Performance Benefits

Big Data Technology & Architecture

May 17, 2022 · Big Data

Apache Hudi: Core Concepts, Architecture, Storage Types, Write Operations, Querying, and Management

This article provides a comprehensive guide to Apache Hudi, covering its basic concepts, timeline architecture, storage types (Copy‑On‑Write and Merge‑On‑Read), write operations, DeltaStreamer usage, Hive/Spark/Presto query integration, data management, indexing, compaction, and best‑practice recommendations for big‑data lake workloads.

Apache HudiBig DataCopy-on-Write

0 likes · 43 min read

Apache Hudi: Core Concepts, Architecture, Storage Types, Write Operations, Querying, and Management

Big Data Technology & Architecture

May 4, 2022 · Big Data

Apache Hudi 0.11.0 Release Highlights: Multi‑Mode Index, Data Skipping, Async Index, Spark & Flink Integration, and New Utilities

The Apache Hudi 0.11.0 release introduces multi‑mode metadata indexing, enhanced data‑skipping, asynchronous indexing, extensive Spark and Flink integration improvements, new bundle utilities, and expanded metadata synchronization with BigQuery, AWS Glue, and DataHub, while also adding bucket indexing and encryption support.

Apache HudiAsync IndexBig Data

0 likes · 13 min read

Apache Hudi 0.11.0 Release Highlights: Multi‑Mode Index, Data Skipping, Async Index, Spark & Flink Integration, and New Utilities

Big Data Technology Architecture

Apr 29, 2022 · Big Data

Halodoc’s Data Platform Evolution: From Redshift to a LakeHouse Architecture with Apache Hudi

This article describes how Halodoc’s data engineering team identified limitations of their Redshift‑based platform, evaluated a LakeHouse design, selected Apache Hudi for mutable data handling, and outlined the challenges and benefits of building a scalable, decoupled storage‑compute architecture for their growing healthcare services.

Apache HudiData EngineeringData Platform

0 likes · 9 min read

Halodoc’s Data Platform Evolution: From Redshift to a LakeHouse Architecture with Apache Hudi

Shopee Tech Team

Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiBatch Processing

0 likes · 20 min read

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Big Data Technology & Architecture

Mar 4, 2022 · Big Data

Managing Small Files in Apache Hudi and Spark Optimization Guide

The article explains how Apache Hudi automatically manages file sizes to avoid small‑file issues, details key configuration parameters, provides a step‑by‑step example of file merging, and offers practical Spark tuning recommendations for optimal performance in data‑lake workloads.

Apache HudiBig DataData Lake

0 likes · 11 min read

Managing Small Files in Apache Hudi and Spark Optimization Guide

Big Data Technology & Architecture

Feb 28, 2022 · Big Data

Integrating Apache Hudi with Hive, Presto, and Spark SQL: Installation, Operations, and Query Examples

This article provides a step‑by‑step guide on integrating Apache Hudi with Hive and Presto, demonstrates core Hudi operations such as insert, upsert, delete, query, and Hive synchronization using Scala code, and shows how to manage Hudi tables through Spark SQL DDL/DML commands.

Apache HudiBig DataData Lake

0 likes · 16 min read

Integrating Apache Hudi with Hive, Presto, and Spark SQL: Installation, Operations, and Query Examples

Big Data Technology & Architecture

Feb 8, 2022 · Big Data

Apache Hudi Overview: Design Principles, Table Architecture, and Read/Write Processes

This article provides a comprehensive overview of Apache Hudi, covering its storage reliance on HDFS, core design principles, table architecture, timeline management, file and index structures, as well as detailed read and write workflows for both Copy‑On‑Write and Merge‑On‑Read table types.

Apache HudiBig DataCopy-on-Write

0 likes · 16 min read

Apache Hudi Overview: Design Principles, Table Architecture, and Read/Write Processes

Alibaba Cloud Native

Jan 26, 2022 · Big Data

How to Build a Lakehouse with RocketMQ and Apache Hudi: A Step‑by‑Step Guide

This article explains the Lakehouse architecture, its required features, the evolution of big‑data stacks, and provides a detailed, hands‑on guide for constructing a Lakehouse using RocketMQ (Connector & Stream) and Apache Hudi, including configuration, deployment, and sample code.

Apache HudiBig DataCloud Native

0 likes · 18 min read

How to Build a Lakehouse with RocketMQ and Apache Hudi: A Step‑by‑Step Guide

Big Data Technology & Architecture

Dec 10, 2021 · Big Data

Integrating Apache Hudi with Flink CDC for Real‑Time Data Lake Solutions

This article explains how to integrate Apache Hudi with Flink CDC to build a near‑real‑time data lake, covering Hudi’s storage model, streaming primitives, version compatibility, Maven setup, SQL table definitions, data flow from MySQL through Kafka, and practical troubleshooting tips.

Apache HudiBig DataData Integration

0 likes · 18 min read

Integrating Apache Hudi with Flink CDC for Real‑Time Data Lake Solutions

Big Data Technology Architecture

Nov 2, 2021 · Big Data

ByteLake: ByteDance’s Real‑Time Data Lake Platform Built on Apache Hudi

This article presents ByteDance’s ByteLake, a real‑time data lake platform built on Apache Hudi, covering Hudi fundamentals, ByteLake’s use cases, the platform’s architectural optimizations, new features such as a commit‑based metastore and bucket indexing, and future roadmap plans.

Apache HudiBucket IndexByteLake

0 likes · 10 min read

ByteLake: ByteDance’s Real‑Time Data Lake Platform Built on Apache Hudi

Big Data Technology Architecture

Oct 26, 2021 · Big Data

Understanding Apache Hudi Table Types: Copy On Write (COW) vs Merge On Read (MOR)

This article explains Apache Hudi's two table formats—Copy On Write and Merge On Read—by introducing key terminology, describing their file structures and versioning, comparing write and read latency, I/O cost, and write amplification, and concluding with guidance on choosing the appropriate format.

Apache HudiCOWMOR

0 likes · 9 min read

Understanding Apache Hudi Table Types: Copy On Write (COW) vs Merge On Read (MOR)

Alibaba Cloud Developer

Sep 9, 2021 · Big Data

How Apache Hudi & Pulsar Enable Real‑Time CDC Data Lake Ingestion

This article explains CDC fundamentals, compares query‑based and log‑based capture, describes typical CDC‑to‑lake architectures using Pulsar and Hudi, dives into Hudi's core design, optimization techniques, and future roadmap, and provides practical insights for building scalable data lakes.

Apache HudiCDCPulsar

0 likes · 17 min read

How Apache Hudi & Pulsar Enable Real‑Time CDC Data Lake Ingestion

DataFunTalk

Sep 3, 2021 · Big Data

Building an Exabyte‑Scale Data Lake with Apache Hudi at ByteDance: Architecture, Design Choices, and Performance Optimizations

This article details ByteDance's implementation of an exabyte‑scale data lake using Apache Hudi, covering scenario requirements, engine selection, functional support, schema management, extensive performance tuning, and future directions, while also noting recruitment opportunities within the team.

Apache HudiBig DataByteDance

0 likes · 9 min read

Building an Exabyte‑Scale Data Lake with Apache Hudi at ByteDance: Architecture, Design Choices, and Performance Optimizations

Big Data Technology Architecture

Jul 20, 2021 · Big Data

Apache Hudi Practice at Kuaishou: Solving Data Efficiency Challenges

This article details Kuaishou's adoption of Apache Hudi to address data scheduling, synchronization, and massive update inefficiencies, describing the pain points, evaluation of alternatives, architectural integration with Spark/Flink, implementation challenges, and the performance improvements achieved.

Apache HudiKuaishou

0 likes · 4 min read

Apache Hudi Practice at Kuaishou: Solving Data Efficiency Challenges

Big Data Technology Architecture

May 6, 2021 · Big Data

Using Spark SQL to Operate on Apache Hudi Tables – Step‑by‑Step Guide

This tutorial demonstrates how to use Spark SQL to create, insert, update, delete, merge, and drop Apache Hudi tables, covering environment setup, Spark‑SQL launch, configuration, and a series of SQL commands with example outputs.

Apache HudiSQLSpark SQL

0 likes · 7 min read

Using Spark SQL to Operate on Apache Hudi Tables – Step‑by‑Step Guide

DataFunTalk

Apr 27, 2021 · Big Data

Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System

This article describes how Linkflow migrated mutable customer data from MySQL to an Apache Hudi data lake using Debezium‑in‑Flink CDC, addressing challenges such as snapshot resumability, partial updates, row‑key merging, schema evolution, indexing, and concurrent writes to achieve minute‑level data freshness and improved offline processing performance.

Apache HudiBig DataCDC

0 likes · 21 min read

Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System

Big Data Technology Architecture

Apr 19, 2021 · Big Data

Reframing Apache Hudi as a Data Lake Platform: Vision, Capabilities, and Future Directions

Apache Hudi is being re‑positioned from a simple table format to a full‑featured data lake platform, offering transactional storage, MVCC concurrency, metadata services, Deltastreamer ingestion, and plans for cache and timeline metadata services, aligning its vision with modern lakehouse architectures.

Apache HudiMetadataTransactional Storage

0 likes · 5 min read

Reframing Apache Hudi as a Data Lake Platform: Vision, Capabilities, and Future Directions

DataFunTalk

Apr 18, 2021 · Big Data

Comparing Apache Hudi, Apache Iceberg, and Delta Lake for Data Lake Storage

This article compares Apache Hudi, Apache Iceberg, and Delta Lake, examining their storage formats, platform compatibility, update performance, concurrency guarantees, and integration with lakeFS to help readers choose the most suitable solution for their data lake use case.

Apache HudiApache IcebergDelta Lake

0 likes · 16 min read

Comparing Apache Hudi, Apache Iceberg, and Delta Lake for Data Lake Storage

Big Data Technology & Architecture

Nov 14, 2020 · Big Data

Comparative Analysis of Apache Hudi, Apache CarbonData, and Delta Lake for Data Lake Solutions

This article examines the core requirements of data lakes and provides an in‑depth comparison of three major open‑source solutions—Apache Hudi, Apache CarbonData, and Delta Lake—highlighting their architectures, ACID support, query capabilities, and suitability for various real‑time and batch use cases.

ACIDApache CarbonDataApache Hudi

0 likes · 9 min read

Comparative Analysis of Apache Hudi, Apache CarbonData, and Delta Lake for Data Lake Solutions

Big Data Technology & Architecture

Oct 21, 2020 · Big Data

An Introduction to Apache Hudi: Concepts, Design Principles, and Architecture

This article introduces Apache Hudi, explaining its core concepts, design principles, table architecture, write and compaction mechanisms, and the three query modes that enable efficient batch and incremental processing on modern data lakes.

Apache HudiBig DataData Lake

0 likes · 21 min read

An Introduction to Apache Hudi: Concepts, Design Principles, and Architecture

Big Data Technology Architecture

Sep 30, 2020 · Big Data

Querying Apache Hudi Tables on Amazon S3 Using Redshift Spectrum

This article explains how to use Amazon Redshift Spectrum to directly query Apache Hudi (and Delta Lake) tables stored in Amazon S3, covering supported formats, required DDL statements, partition handling, and common troubleshooting tips.

AWSApache HudiRedshift Spectrum

0 likes · 5 min read

Querying Apache Hudi Tables on Amazon S3 Using Redshift Spectrum

Big Data Technology & Architecture

Sep 2, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Features, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark to ingest, manage, and incrementally query large analytical datasets on HDFS‑compatible storage, offering features such as timeline management, copy‑on‑write and merge‑on‑read tables, and support for snapshot, incremental, and read‑optimized queries across engines like Hive, Spark SQL and Presto.

Apache HudiBig DataData Lake

0 likes · 12 min read

An Overview of Apache Hudi: Architecture, Features, and Query Types

Big Data Technology & Architecture

Aug 23, 2020 · Big Data

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

This article introduces Apache Hudi, explaining its storage types, query views, timeline feature, typical use cases such as near‑real‑time ingestion and incremental pipelines, and provides a step‑by‑step Scala/Spark quick‑start guide with code examples for compiling, inserting, updating, querying, and syncing data to Hive.

Apache HudiBig DataData Lake

0 likes · 18 min read

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

Big Data Technology Architecture

Jun 28, 2020 · Big Data

Key Requirements for Building PB‑Scale Data Lakes and How Apache Hudi Meets Them

The article outlines the essential requirements for constructing petabyte‑scale data lakes—such as incremental CDC ingestion, log deduplication, storage management, ACID transactions, fast ETL, and compliance—and explains how Apache Hudi’s COW and Merge‑on‑Read architectures, async compaction, and advanced features address each need.

ACID TransactionsApache HudiAsync Compaction

0 likes · 13 min read

Key Requirements for Building PB‑Scale Data Lakes and How Apache Hudi Meets Them

Big Data Technology Architecture

Jun 15, 2020 · Big Data

Apache Hudi Copy‑On‑Write Tutorial: Core Concepts and Hands‑On Spark Implementation

This article introduces Apache Hudi’s core concepts and demonstrates how to operate in Copy‑On‑Write mode on a Spark‑based data lake, covering prerequisites, table types, configuration properties, upsert, incremental queries, and record deletion with Scala code examples.

Apache HudiCopy-On-WriteScala

0 likes · 14 min read

Apache Hudi Copy‑On‑Write Tutorial: Core Concepts and Hands‑On Spark Implementation

Big Data Technology Architecture

Jun 10, 2020 · Big Data

Apache Hudi: Architecture, Uber’s Use Cases, Improvements, and Future Roadmap

This article explains the design of Apache Hudi, its core concepts such as upserts and incremental pulls, how Uber leverages it for large‑scale data‑lake operations, the enhancements made over time, and the project’s future plans within the Apache ecosystem.

Apache HudiIncremental PullUber

0 likes · 17 min read

Apache Hudi: Architecture, Uber’s Use Cases, Improvements, and Future Roadmap

Big Data Technology Architecture

May 31, 2020 · Big Data

Applying Apache Hudi in Medical Big Data: Architecture, Synchronization, Storage Choices, and Future Directions

This article examines the use of Apache Hudi for building a hospital‑wide medical big‑data platform, covering construction background, reasons for selecting Hudi, data synchronization methods, storage mode choices, query optimizations, and future development considerations.

Apache HudiCopy-on-WriteData synchronization

0 likes · 7 min read

Applying Apache Hudi in Medical Big Data: Architecture, Synchronization, Storage Choices, and Future Directions

Big Data Technology Architecture

May 21, 2020 · Big Data

Near Real-Time Ingestion, Analysis, Incremental Pipelines, and Data Distribution with Apache Hudi

The article explains how Apache Hudi enables near‑real‑time data ingestion from various sources, supports low‑latency analytics, provides incremental processing pipelines, and simplifies data distribution on Hadoop, improving efficiency and reducing operational complexity.

Apache HudiBig DataHadoop

0 likes · 6 min read

Near Real-Time Ingestion, Analysis, Incremental Pipelines, and Data Distribution with Apache Hudi

Architect

May 12, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Concepts, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark and Hadoop‑compatible storage to provide efficient ingestion, incremental processing, and multiple query modes such as snapshot, incremental, and read‑optimized for large analytical datasets.

Apache HudiBig DataData Lake

0 likes · 11 min read

An Overview of Apache Hudi: Architecture, Concepts, and Query Types

Big Data Technology Architecture

Mar 24, 2020 · Big Data

Comparative Analysis of Delta Lake, Apache Iceberg, and Apache Hudi for Data Lake Solutions

This article examines the three leading open‑source data‑lake projects—Delta Lake, Apache Iceberg, and Apache Hudi—by outlining their origins, core problems they address, key features, and a detailed seven‑dimension comparison to help practitioners choose the most suitable solution for their scenarios.

Apache HudiApache IcebergComparison

0 likes · 17 min read

Comparative Analysis of Delta Lake, Apache Iceberg, and Apache Hudi for Data Lake Solutions

dbaplus Community

Mar 17, 2020 · Big Data

Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi

An in‑depth comparison of the three leading open‑source data lake platforms—Delta Lake, Apache Iceberg, and Apache Hudi—examines their origins, core challenges they address, key features, and performance across seven evaluation dimensions to guide practitioners in selecting the optimal solution for their workloads.

Apache HudiApache IcebergData Lake

0 likes · 15 min read

Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi

Big Data Technology Architecture

Mar 16, 2020 · Big Data

Understanding Apache Hudi: Concepts, Architecture, Usage, and Best Practices

This article introduces Apache Hudi, explains its architecture and storage models, describes how it enables upserts and incremental queries on Hadoop, provides step‑by‑step guidance for integrating Hudi with Apache Spark, and outlines best practices and comparisons with Apache Kudu.

Apache HudiHadoopSpark

0 likes · 10 min read

Understanding Apache Hudi: Concepts, Architecture, Usage, and Best Practices

Big Data Technology Architecture

Feb 1, 2020 · Big Data

Apache Hudi 0.5.1 Release Highlights and Upgrade Guide

The Apache Hudi 0.5.1 release introduces upgraded Spark, Avro, Parquet and Kafka dependencies, new Scala support, timeline layout changes, CLI enhancements, DeltaStreamer parameter updates, Kafka offset enum revisions, key‑generator package relocation, Hive sync options, dynamic Bloom filter, bulk‑insert support, and AWS cloud storage compatibility.

Apache HudiDeltaStreamerRelease

0 likes · 6 min read

Apache Hudi 0.5.1 Release Highlights and Upgrade Guide

Big Data Technology & Architecture

Jan 23, 2020 · Big Data

Understanding Apache Hudi: Incremental Processing and Low‑Latency Data Management on Hadoop

This article explains how Apache Hudi enables efficient, low‑latency incremental data ingestion and processing on Hadoop by providing a unified service layer, describing its motivation, architecture, storage components, write and read paths, compaction, fault recovery, and incremental query capabilities.

Apache HudiHadoopIncremental Processing

0 likes · 17 min read

Understanding Apache Hudi: Incremental Processing and Low‑Latency Data Management on Hadoop