Tag

Hudi

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Nov 12, 2024 · Big Data

Data Lake and Data Warehouse Architectures: Expert Insights from Industry Leaders

The article summarizes a roundtable discussion where experts compare four lake‑warehouse architectural patterns, explain their suitability for different business scenarios, contrast them with traditional data warehouses, and highlight practical considerations for choosing and evolving data platforms.

Big DataData LakeData Warehouse
0 likes · 6 min read
Data Lake and Data Warehouse Architectures: Expert Insights from Industry Leaders
DataFunSummit
DataFunSummit
Nov 8, 2024 · Big Data

Roundtable Discussion on Data Lake Technology Maturity and Governance Practices

Experts from Kuaishou, former Tencent, Ping An Insurance and others discuss data lake maturity, column‑level governance, resource management of unstructured data, and automated optimization techniques such as Iceberg small‑file merging, highlighting how these advances improve data quality and business decision‑making.

Big DataColumn-level GovernanceData Governance
0 likes · 6 min read
Roundtable Discussion on Data Lake Technology Maturity and Governance Practices
JD Retail Technology
JD Retail Technology
Oct 11, 2024 · Big Data

JD Retail Data Lake Architecture: Challenges, Optimizations, and Future Plans

This article presents JD Retail's data lake architecture overhaul, detailing the shortcomings of the Lambda model, the migration to Flink‑Hudi‑Spark pipelines, performance gains, storage savings, unified APIs, and upcoming improvements for resilience and automation.

Big DataData LakeHudi
0 likes · 11 min read
JD Retail Data Lake Architecture: Challenges, Optimizations, and Future Plans
DataFunTalk
DataFunTalk
Oct 3, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions

Amid growing data demands, this article explains the data lake technology maturity curve, detailing lake‑warehouse architectural patterns, design principles, core functionalities, and the four leading open‑source solutions (Hudi, Iceberg, Delta Lake, Paimon) to guide enterprises in building flexible, scalable, and governed data platforms.

Big DataData ArchitectureData Lake
0 likes · 10 min read
Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions
DataFunTalk
DataFunTalk
Sep 24, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications

This article explains the rapid growth of data-driven businesses, the challenges of traditional data warehouses, and how modern data lake technologies such as Delta Lake, Hudi, Iceberg, and Paimon form a maturity curve that guides enterprises in architecture choices, design principles, core capabilities, and practical applications.

Big DataData LakeDelta Lake
0 likes · 12 min read
Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications
DataFunTalk
DataFunTalk
Jul 25, 2024 · Big Data

Real‑time Data Warehouse Evolution with Data Lake: Challenges, Solutions, and Future Outlook

This article presents a comprehensive overview of JD Tech's real‑time data warehouse evolution, detailing the legacy Lambda architecture, its shortcomings, the integration of a data‑lake‑based solution, iterative redesigns, technical trade‑offs, and future directions for real‑time analytics.

Big DataClickHouseData Lake
0 likes · 25 min read
Real‑time Data Warehouse Evolution with Data Lake: Challenges, Solutions, and Future Outlook
DataFunSummit
DataFunSummit
Jul 1, 2024 · Big Data

Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks

This article details JD Retail's transition from a complex Lambda architecture to a unified real‑time data pipeline using Flink, Hudi, and StarRocks, addressing data completeness versus latency, reducing maintenance costs, improving storage efficiency, and delivering faster, more consistent analytics for business users.

Data WarehouseHudiJD Retail
0 likes · 13 min read
Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks
DataFunTalk
DataFunTalk
Jun 2, 2024 · Big Data

Applying Data Lake (Hudi) at Kuaishou: Architecture Evolution, Use Cases, and Lessons Learned

This article shares Kuaishou's practical experience with data lake technology (Hudi), detailing the challenges of growing data warehouses, the migration from Hive to Hudi, the promotion strategy, real-world use cases such as CDC sync and batch‑stream integration, and key takeaways for future deployments.

Big DataData LakeData Warehouse
0 likes · 12 min read
Applying Data Lake (Hudi) at Kuaishou: Architecture Evolution, Use Cases, and Lessons Learned
DataFunSummit
DataFunSummit
May 17, 2024 · Big Data

Comprehensive Hudi Real-Time Data Lake Ingestion Solutions

This article presents a complete guide to Hudi-based real-time data lake ingestion, covering overall data integration architecture, batch and streaming ingestion strategies, advanced table design, and practical recommendations for handling challenges such as deduplication, latency, partitioning, and performance optimization.

Big DataData LakeHudi
0 likes · 12 min read
Comprehensive Hudi Real-Time Data Lake Ingestion Solutions
DataFunSummit
DataFunSummit
Apr 18, 2024 · Big Data

Real‑time Data Warehouse Evolution with Data Lake: Architecture, Challenges, and Solutions

This article presents a comprehensive overview of JD Tech's real‑time data warehouse evolution, detailing the legacy Lambda‑based design, its shortcomings, the transition to a data‑lake‑integrated architecture, iterative improvements, encountered technical and non‑technical issues, and future outlooks.

ClickHouseData LakeHudi
0 likes · 24 min read
Real‑time Data Warehouse Evolution with Data Lake: Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
Mar 25, 2024 · Big Data

Exploring Real-Time Data Lake Practices at Kangaroo Cloud

This article shares Kangaroo Cloud's exploration and practice of a real-time data lake, covering background, data lake concepts, challenges, solution architecture using the Shuzhan platform with Iceberg/Hudi, CDC ingestion, small file handling, cross-cluster ingestion, materialized view acceleration, and future development plans.

CDCCross-Cluster IngestionHudi
0 likes · 12 min read
Exploring Real-Time Data Lake Practices at Kangaroo Cloud
DataFunTalk
DataFunTalk
Oct 28, 2023 · Big Data

Data Lake Architecture, Ingestion Options, Real-time Optimization, and Query Practices

This article presents a comprehensive overview of a unified data lake architecture, evaluates three ingestion solutions, details real‑time ingestion optimizations for Flink‑Hudi pipelines, and describes how Kyuubi enables unified query access across multiple engines, offering practical guidance for large‑scale data processing.

Big DataData LakeHudi
0 likes · 14 min read
Data Lake Architecture, Ingestion Options, Real-time Optimization, and Query Practices
DataFunTalk
DataFunTalk
Sep 4, 2023 · Big Data

Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment

This article presents a comprehensive overview of a batch‑stream unified storage solution built on Hudi and the Lakehouse Analysis Service (LAS), covering background challenges, architectural design, data organization, read/write mechanisms, BTS architecture, real‑world deployment scenarios, and future development plans.

Batch-StreamData WarehouseHudi
0 likes · 22 min read
Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment
DataFunTalk
DataFunTalk
Aug 28, 2023 · Big Data

Practical Experience of an E‑commerce Platform’s Offline and Real‑time Data Warehouse

This article shares the practical architecture, technology selection, implementation details, and evolution of an e‑commerce platform’s offline and real‑time data warehouses, covering data modeling, processing pipelines, system components such as Hive, Spark, Flink, ClickHouse, Doris, and Hudi, and the lessons learned from multiple production deployments.

Big DataClickHouseData Warehouse
0 likes · 18 min read
Practical Experience of an E‑commerce Platform’s Offline and Real‑time Data Warehouse
DataFunSummit
DataFunSummit
Aug 26, 2023 · Big Data

Bilibili's Practice of Building a Streaming Data Lake with Hudi and Flink

This article details Bilibili's implementation of a streaming data lake using Hudi and Flink, covering background challenges, four case studies, batch‑stream integration optimizations, infrastructure and kernel enhancements, and future work directions.

Batch-Stream IntegrationBig DataData Lake
0 likes · 14 min read
Bilibili's Practice of Building a Streaming Data Lake with Hudi and Flink
ByteDance Data Platform
ByteDance Data Platform
Aug 9, 2023 · Big Data

Why Traditional Data Warehouses Fail and How a Real‑Time Lakehouse Solves the Pain

This article analyzes the shortcomings of mainstream data‑warehouse and data‑lake architectures, explains the design of ByteDance's real‑time/offline unified lakehouse solution, and demonstrates its practical applications and future roadmap across streaming, multi‑dimensional analysis, and batch‑stream reuse scenarios.

Big DataData WarehouseHudi
0 likes · 14 min read
Why Traditional Data Warehouses Fail and How a Real‑Time Lakehouse Solves the Pain
DataFunTalk
DataFunTalk
Jul 10, 2023 · Big Data

Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture

This article presents a comprehensive overview of Lakehouse‑based in‑lake warehousing, covering common data‑lake misconceptions, the evolution from databases to data warehouses and lakes, the advantages of Lakehouse over traditional architectures, a reference multi‑layer architecture, typical use cases, challenges, future plans, and a brief Q&A.

Data LakeData WarehouseHudi
0 likes · 20 min read
Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture
DataFunSummit
DataFunSummit
Apr 25, 2023 · Big Data

Building a Real-Time Data Lake with Hudi: Architecture, Challenges, and Practices

This article presents Huawei's end‑to‑end solution for constructing a real‑time data lake on Hudi, covering requirement analysis, technology selection, architectural design, ingestion and processing challenges, practical optimizations, and future improvement directions.

Big DataData LakeETL/ELT
0 likes · 14 min read
Building a Real-Time Data Lake with Hudi: Architecture, Challenges, and Practices
DataFunTalk
DataFunTalk
Apr 10, 2023 · Big Data

Interview on Data Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase data‑lake technology manager Ma Jin explains the distinction between data lakes and lakehouses, reviews the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, evaluates feature maturity and performance trade‑offs, and discusses systematic versus non‑systematic adoption in enterprises.

Big DataData LakehouseDelta Lake
0 likes · 13 min read
Interview on Data Lakehouse: Current Applications, Challenges, and Evolution