Tag

real-time data lake

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Apr 23, 2024 · Big Data

Apache Paimon Graduates to Top‑Level Project – Milestones, Core Capabilities, and Community Highlights

Apache Paimon, originally launched as Flink Table Store, has graduated to an Apache Top‑Level Project after a year of incubation, showcasing real‑time lakehouse capabilities, extensive ecosystem integration, and strong adoption by major enterprises, marking a significant milestone for streaming and batch data processing.

Apache PaimonBig DataLakehouse
0 likes · 9 min read
Apache Paimon Graduates to Top‑Level Project – Milestones, Core Capabilities, and Community Highlights
DataFunSummit
DataFunSummit
Mar 25, 2024 · Big Data

Exploring Real-Time Data Lake Practices at Kangaroo Cloud

This article shares Kangaroo Cloud's exploration and practice of a real-time data lake, covering background, data lake concepts, challenges, solution architecture using the Shuzhan platform with Iceberg/Hudi, CDC ingestion, small file handling, cross-cluster ingestion, materialized view acceleration, and future development plans.

CDCCross-Cluster IngestionHudi
0 likes · 12 min read
Exploring Real-Time Data Lake Practices at Kangaroo Cloud
DataFunSummit
DataFunSummit
Nov 4, 2022 · Big Data

Real-Time Data Lake Practice at ByteDance: Architecture, Challenges, and Solutions

ByteDance’s data platform team explains their real‑time data lake implementation, covering its evolving definition, six core capabilities, challenges such as data management, concurrent updates, performance and log ingestion, and detailed case studies of multi‑stage deployment, indexing, metadata services, and future roadmap.

Big DataHudiIndexing
0 likes · 32 min read
Real-Time Data Lake Practice at ByteDance: Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
Oct 21, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform architecture and three real‑time lake initiatives—log ingestion, CDC ingestion, and lake analysis—showcasing how Apache Iceberg, Flink, and custom shuffling algorithms solve small‑file and cross‑cloud challenges while enabling schema evolution and future multi‑cloud optimizations.

Apache IcebergBig DataCDC
0 likes · 16 min read
Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg
DataFunTalk
DataFunTalk
Aug 6, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform engineering, describing how Apache Iceberg is leveraged for real‑time data lake ingestion, CDC pipelines, multi‑cloud storage, small‑file mitigation, schema evolution, and future plans across storage, compute, and management within a big‑data ecosystem.

Apache IcebergBig DataCDC
0 likes · 16 min read
Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg
DataFunTalk
DataFunTalk
Jul 14, 2022 · Big Data

Real‑Time Data Lake Practices at ByteDance and Alibaba: Architecture, Challenges, and Solutions

This article presents detailed case studies of ByteDance and Alibaba implementing real‑time data lake solutions with Hudi and Flink, describing the business drivers, architectural challenges, and the specific technical strategies such as unified metadata layers, optimistic locking, scalable hash indexing, and CDC‑based incremental ETL to achieve low‑latency, high‑throughput data processing.

Big DataFlinkHudi
0 likes · 9 min read
Real‑Time Data Lake Practices at ByteDance and Alibaba: Architecture, Challenges, and Solutions
DataFunTalk
DataFunTalk
May 23, 2022 · Big Data

Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions

ByteDance shares its real‑time data lake implementation, covering the evolving definition of data lakes, six core capabilities, challenges such as data management, weak concurrent updates, performance, and log ingestion, and detailed solutions including Hudi Metastore Server, bucket indexing, multi‑stage use cases, and future roadmap.

Big DataHudiStreaming
0 likes · 32 min read
Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions
DataFunTalk
DataFunTalk
May 11, 2021 · Big Data

Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake

This article details Baixin Bank's construction of a Flink‑driven real‑time computing platform integrated with Hudi as a real‑time data lake, covering background, architecture, data collection, transformation, storage layers, technical challenges, future roadmap, and practical lessons for similar big‑data initiatives.

Big DataData EngineeringFlink
0 likes · 12 min read
Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake
DataFunTalk
DataFunTalk
Jan 28, 2021 · Big Data

Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank

This talk by Ba Xueyu, a senior big data platform engineer at Zhongyuan Bank, outlines the background, architecture, and engineering practices of a real‑time financial data lake, highlighting its open, timely, and integrated design, streaming platform implementation, and use cases such as anti‑fraud and real‑time BI.

Big DataFlinkanti-fraud
0 likes · 15 min read
Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank