Tag

Merge Into

1 views collected around this technical thread.

Tencent Cloud Developer
Tencent Cloud Developer
Aug 23, 2023 · Big Data

WeChat Experiment Platform: Architecture Design and Iceberg Lakehouse Optimization

The WeChat Experiment Platform migrated its 60,000 metric, 200,000 core, 30 PB plus data pipeline to an Iceberg based lakehouse, leveraging three layer metadata, fine grained partitioning, MERGE into writes, time travel snapshots and skew handling UDFs, which cut core time by 69%, saved ~100 PB storage, and reduced latency by up to 70%.

Data WarehouseMerge IntoMetric Computation
0 likes · 18 min read
WeChat Experiment Platform: Architecture Design and Iceberg Lakehouse Optimization
DataFunTalk
DataFunTalk
Nov 13, 2022 · Big Data

Iceberg Data Lake: Technology Overview, Xiaomi Practices, and Stream‑Batch Integration

This article presents an overview of the Iceberg table format, its core architecture and advantages, details Xiaomi’s large‑scale deployment and use cases, explores stream‑batch integration with Spark and Flink, outlines data correction methods, future plans, and answers common technical questions.

Data LakeFlinkMerge Into
0 likes · 20 min read
Iceberg Data Lake: Technology Overview, Xiaomi Practices, and Stream‑Batch Integration
Shopee Tech Team
Shopee Tech Team
Sep 2, 2022 · Big Data

Shopee Data System Challenges and Apache Hudi Practices

Shopee tackled its data‑system bottlenecks by customizing Apache Hudi to provide unified stream‑batch integration, efficient state‑detail snapshots, and low‑latency wide‑table generation, using CDC‑based bootstrapping, COW/MOR tables, savepoints and partial updates, which cut latency to ten minutes, lowered resource use, and yielded several community‑backed enhancements.

Apache HudiBig DataIncremental Processing
0 likes · 18 min read
Shopee Data System Challenges and Apache Hudi Practices