Tagged articles

Data Lakehouse

18 articles · Page 1 of 1

Apr 13, 2026 · Big Data

Why Iceberg v3 Marks the “iPhone Moment” for Data Lakehouses

Apache Iceberg v3 introduces deletion vectors, row‑level lineage, a native VARIANT type, default column values, and nanosecond timestamps, delivering up to ten‑fold faster updates, native CDC, seamless semi‑structured data handling, and industry‑wide adoption that effectively ends the format war between lake and warehouse solutions.

Apache IcebergData LakehouseDefault Column Values

0 likes · 14 min read

Why Iceberg v3 Marks the “iPhone Moment” for Data Lakehouses

StarRocks

Jul 16, 2025 · Cloud Native

Build a Decoupled Storage‑Compute Data Platform with StarRocks and MinIO

This step‑by‑step tutorial shows how to deploy StarRocks and MinIO in a decoupled storage‑compute architecture using Docker Compose and Kubernetes, configure local caching, create storage volumes, load public datasets, and run SQL queries to explore the combined data.

Data LakehouseDecoupled StorageDocker Compose

0 likes · 14 min read

Build a Decoupled Storage‑Compute Data Platform with StarRocks and MinIO

Alibaba Cloud Big Data AI Platform

Jul 4, 2025 · Big Data

From Real-Time Data Analytics to Real-Time AI: Flink Forward Asia 2025 Highlights

The Flink Forward Asia 2025 conference in Singapore showcased Apache Flink's evolution with new AI‑driven projects such as Flink Agents, the integration of AI Functions in Flink 2.1, the disaggregated state management architecture of Flink 2.0, and complementary lakehouse technologies like Paimon and Fluss, underscoring the platform's role as the real‑time backbone for modern AI applications.

Apache FlinkData LakehouseDisaggregated State Management

0 likes · 9 min read

From Real-Time Data Analytics to Real-Time AI: Flink Forward Asia 2025 Highlights

Baidu Tech Salon

Nov 11, 2024 · Cloud Native

Baidu Cloud Native Data Platform: Empowering Enterprise AI in the LLM Era

To empower enterprise AI in the LLM era, Baidu Cloud unveils a cloud‑native data platform featuring upgraded databases—PegaDB, GaiaDB 5.0, Vector DB 2.0, Palo 2.0—and integrated services like DBSC 2.0, EDAP 2.0, and DBStack, delivering high‑performance, cost‑effective handling of structured, unstructured, and vector data for fine‑tuning and Enterprise RAG.

DBStackData LakehouseEDAP

0 likes · 10 min read

Baidu Cloud Native Data Platform: Empowering Enterprise AI in the LLM Era

Alibaba Cloud Big Data AI Platform

Aug 13, 2024 · Big Data

How Alibaba Cloud Is Shaping the Future of Big Data and AI Integration

This article summarizes Alibaba Cloud researcher Xu Sheng's presentation on the company's big data and AI product portfolio, covering current offerings, market trends, lakehouse evolution, open‑source contributions, serverless solutions, search capabilities, and the future roadmap for integrated big data‑AI services.

AIAlibaba CloudBig Data

0 likes · 22 min read

How Alibaba Cloud Is Shaping the Future of Big Data and AI Integration

StarRocks

Jul 24, 2024 · Big Data

Why Lakehouse Architecture Is Redefining Big Data Infrastructure in the AI Era

The article examines the rapid rise of lakehouse architecture, its market momentum, core components—including storage, metadata, table formats, and compute layers—compares Iceberg, Hudi, and Delta Lake, discusses the shift from HDFS to object storage, and outlines the strategic importance of lakehouses for AI-driven data management and future data infrastructure trends.

AIApache IcebergBig Data

0 likes · 28 min read

Why Lakehouse Architecture Is Redefining Big Data Infrastructure in the AI Era

Wukong Talks Architecture

Jul 23, 2024 · Databases

An Overview of StarRocks: Architecture, Features, and Performance Benchmarks

StarRocks, an open‑source, high‑performance MPP analytical database under the Linux Foundation, offers vectorized engines, CBO optimizer, materialized views, and storage‑compute separation, integrates with BI tools and data lakes, and demonstrates superior query speed in benchmark tests against ClickHouse, Druid, and Trino.

Data LakehouseMPPPerformance Benchmark

0 likes · 10 min read

An Overview of StarRocks: Architecture, Features, and Performance Benchmarks

DataFunSummit

Mar 17, 2024 · Big Data

OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices

This article presents OPPO's smart data lakehouse solution, describing its massive EB‑scale architecture, the integration of batch and streaming engines, the Glacier service for table management, schema‑adaptive ingestion, performance optimizations, and future technical road‑maps for unified data processing.

Big DataData LakehouseFlink

0 likes · 15 min read

OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices

DataFunTalk

Feb 3, 2024 · Big Data

Alluxio: Introduction, Architecture, and Practical Experience for Big Data Construction

This article introduces Alluxio as an open‑source data orchestration layer, explains its architecture and core features such as unified namespace, caching strategies, and cloud‑native deployment, and shares practical experiences on using Alluxio to simplify data lakehouse construction, migration, and hot‑cold data separation in complex big‑data environments.

AlluxioBig DataCaching

0 likes · 13 min read

Alluxio: Introduction, Architecture, and Practical Experience for Big Data Construction

vivo Internet Technology

Dec 13, 2023 · Big Data

Hudi Data Lake Implementation and Optimization Practice at vivo

Vivo’s big‑data team deployed Apache Hudi to create a lakehouse that unifies streaming and batch workloads, leverages COW and MOR storage modes, automates small‑file clustering and compaction, and applies extensive version, streaming, batch, and lifecycle optimizations, delivering minute‑level latency, hundred‑million‑records‑per‑minute ingestion, and query speeds up to 20 % faster than Hive.

Apache HudiBatch ProcessingBig Data

0 likes · 11 min read

Hudi Data Lake Implementation and Optimization Practice at vivo

DataFunTalk

Oct 13, 2023 · Big Data

Design Principles, Architecture, and Applications of the Open‑Source LakeSoul Lakehouse Framework

This article provides a comprehensive technical overview of LakeSoul, an open‑source, cloud‑native lakehouse framework, covering its design philosophy, core features, architecture, performance benchmarks, real‑time ingestion, incremental computation, multi‑stream joining, security, community progress, and future roadmap.

Big DataData LakehouseFlink

0 likes · 16 min read

Design Principles, Architecture, and Applications of the Open‑Source LakeSoul Lakehouse Framework

DataFunTalk

Sep 13, 2023 · Big Data

Design and Implementation of a Lakehouse Data Platform Based on Apache Hudi at Taikang Life Insurance

This article details Taikang Life Insurance's end‑to‑end technical selection, architecture design, implementation, and custom enhancements of an Apache Hudi‑driven lakehouse platform for large‑scale health‑insurance data, covering background, component evaluation, performance benchmarking, multi‑layer architecture, and real‑world results.

Apache HudiBig DataData Governance

0 likes · 44 min read

Design and Implementation of a Lakehouse Data Platform Based on Apache Hudi at Taikang Life Insurance

DataFunSummit

Aug 4, 2023 · Big Data

LakeSoul: An Open‑Source Real‑Time Data Lakehouse Framework – Design, Architecture, Benchmarks and Future Roadmap

This article introduces LakeSoul, an open‑source end‑to‑end real‑time lakehouse framework, detailing its design philosophy, key technologies such as ELT, metadata management, upsert and merge‑on‑read capabilities, performance benchmarks, real‑world use cases, and the roadmap for future enhancements.

Big DataData LakehouseELT

0 likes · 18 min read

LakeSoul: An Open‑Source Real‑Time Data Lakehouse Framework – Design, Architecture, Benchmarks and Future Roadmap

DataFunTalk

Apr 10, 2023 · Big Data

Interview on Data Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase data‑lake technology manager Ma Jin explains the distinction between data lakes and lakehouses, reviews the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, evaluates feature maturity and performance trade‑offs, and discusses systematic versus non‑systematic adoption in enterprises.

Big DataData LakehouseDelta Lake

0 likes · 13 min read

Interview on Data Lakehouse: Current Applications, Challenges, and Evolution

Volcano Engine Developer Services

Sep 21, 2022 · Big Data

Unlocking Enterprise Data Lakehouse: Trends, Challenges, and Volcano Engine EMR Solutions

This article explores the open‑source lakehouse trend, outlines the architectural features of Volcano Engine EMR, examines key challenges of building enterprise‑grade data lakehouses, and presents best‑practice case studies demonstrating how EMR enables scalable, real‑time analytics, storage‑compute separation, and seamless integration with modern big‑data engines.

Data LakehouseEMRStorage Compute Separation

0 likes · 22 min read

Unlocking Enterprise Data Lakehouse: Trends, Challenges, and Volcano Engine EMR Solutions

Alibaba Cloud Developer

May 13, 2022 · Big Data

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

Delta Lake, an open‑source storage layer from Databricks, provides ACID transactions, data versioning, schema evolution, and unified batch‑stream processing, with a detailed file structure and metadata mechanism, while Alibaba Cloud EMR enhances it with advanced DML, performance optimizations, deep DLF integration, and solutions for G‑SCD and CDC.

CDCDLFData Lakehouse

0 likes · 11 min read

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

Shopee Tech Team

Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiBatch Processing

0 likes · 20 min read

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

政采云技术

Feb 8, 2022 · Industry Insights

Unlocking Enterprise Value with a Data Middle Platform: Architecture & Indicators

This article traces the evolution from traditional data warehouses to modern data lakes and data middle platforms, explains why siloed data development hampers efficiency, and details the architecture and indicator‑library design used by Zhengcaiyun to achieve unified, reusable data services.

Big DataData GovernanceData Lakehouse

0 likes · 14 min read

Unlocking Enterprise Value with a Data Middle Platform: Architecture & Indicators