Tagged articles
201 articles
Page 2 of 3
DataFunTalk
DataFunTalk
Mar 16, 2024 · Big Data

Performance Optimization Practices for KwaiBI Big Data Analysis Platform

This article introduces KwaiBI, the internal data analysis product of Kuaishou, outlines its five major functional areas, details the performance challenges of large‑scale analytics, and presents a comprehensive set of optimization techniques—including cache warming, query rewriting, materialized acceleration, and the Bleem lake‑house engine—along with future directions and a brief Q&A.

Big DataData AnalyticsKwaiBI
0 likes · 15 min read
Performance Optimization Practices for KwaiBI Big Data Analysis Platform
DataFunSummit
DataFunSummit
Mar 14, 2024 · Big Data

Tencent Game Data Analysis: Lakehouse Integration Practice

This article presents Tencent Game's comprehensive lakehouse integration practice, detailing the project background, storage‑compute separation, data layering, unified DDL/DML operations, performance optimizations, and future plans, illustrating how StarRocks, Iceberg, and Spark are combined to achieve scalable, cost‑effective analytics for massive game data.

Compute-Storage SeparationData WarehouseIceberg
0 likes · 16 min read
Tencent Game Data Analysis: Lakehouse Integration Practice
DataFunTalk
DataFunTalk
Mar 4, 2024 · Big Data

Design and Implementation of a Lakehouse‑Integrated Data Platform for Financial Innovation by Shuxin Network

This article presents Shuxin Network's practical experience in building a cloud‑native, lakehouse‑integrated data platform for the financial sector, covering architecture evolution, challenges of domestic‑innovation (信创), the DataCyber solution, core components, deployment roadmap, and real‑world case studies.

Big DataCloud NativeData Platform
0 likes · 21 min read
Design and Implementation of a Lakehouse‑Integrated Data Platform for Financial Innovation by Shuxin Network
DataFunSummit
DataFunSummit
Feb 26, 2024 · Big Data

Building a New Lakehouse Analytics Paradigm with StarRocks and Paimon

This article introduces a new lakehouse analytics paradigm by combining StarRocks and Paimon, covering the evolution of data lake technologies, key integration scenarios, core technical mechanisms such as JNI connectors, materialized views, and future roadmap for enhanced lakehouse capabilities.

AnalyticsBig DataData Lake
0 likes · 16 min read
Building a New Lakehouse Analytics Paradigm with StarRocks and Paimon
DataFunTalk
DataFunTalk
Feb 9, 2024 · Big Data

Alluxio’s Role in Lakehouse Architecture: Benefits, Challenges, and Real‑World Use Cases

This article explains how Alluxio enables lake‑warehouse integration by providing a data orchestration layer that caches data near compute, reduces storage‑compute separation costs, improves performance, and addresses challenges such as security, scalability, and multi‑cloud deployment, illustrated with several industry case studies.

AIAlluxioBig Data
0 likes · 16 min read
Alluxio’s Role in Lakehouse Architecture: Benefits, Challenges, and Real‑World Use Cases
DataFunTalk
DataFunTalk
Jan 27, 2024 · Big Data

JuiceFS: A Cloud‑Native Distributed File System for Data Lake and Lakehouse

This article presents JuiceFS, a cloud‑native distributed file system that bridges the gaps between HDFS and object storage, explaining Data Lake and Lakehouse concepts, comparing storage options, detailing JuiceFS's architecture and performance benefits, and showcasing real‑world user case studies.

Big DataDistributed File SystemJuiceFS
0 likes · 23 min read
JuiceFS: A Cloud‑Native Distributed File System for Data Lake and Lakehouse
DataFunSummit
DataFunSummit
Jan 9, 2024 · Big Data

Introducing Yunqi Lakehouse: An Integrated Cloud‑Native Data Platform with Incremental Computing and Auto Materialized Views

This article introduces Yunqi's self‑developed Lakehouse product, explaining its cloud‑native, one‑stop data platform architecture, incremental computing that balances freshness, performance and cost, and the autoMV feature that automatically creates materialized views to boost query speed up to nine times.

Auto Materialized ViewBig DataData Platform
0 likes · 14 min read
Introducing Yunqi Lakehouse: An Integrated Cloud‑Native Data Platform with Incremental Computing and Auto Materialized Views
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Dec 27, 2023 · Big Data

Recap of Tongcheng Travel’s 7th Big Data Technology Salon – Talks on StarRocks, Paimon, Iceberg, Data+AI, Vector Retrieval, Real‑Time Computing, and Hotel Ranking

The 7th Tongcheng Travel Big Data Technology Salon in Beijing featured a series of expert talks covering StarRocks architecture evolution, lake‑house solutions with Paimon, Iceberg real‑time upsert, Data+AI for travel recommendation, vector retrieval in AI, JD Logistics real‑time computing governance, and multi‑task hotel ranking modeling, providing deep technical insights and future roadmaps.

AIBig DataLakehouse
0 likes · 10 min read
Recap of Tongcheng Travel’s 7th Big Data Technology Salon – Talks on StarRocks, Paimon, Iceberg, Data+AI, Vector Retrieval, Real‑Time Computing, and Hotel Ranking
DataFunTalk
DataFunTalk
Dec 27, 2023 · Big Data

Amoro Mixed Hive: A Unified Lakehouse Solution for Real‑Time and Batch Data Processing

This article describes how NetEase Youdao replaced its Doris‑based real‑time data warehouse with Amoro Mixed Hive, detailing the architectural challenges, the Mixed Hive design, implementation steps, performance optimizations, community contributions, and future roadmap to achieve a unified lakehouse with minute‑level freshness and reduced development and operational costs.

AmoroBig DataFlink
0 likes · 12 min read
Amoro Mixed Hive: A Unified Lakehouse Solution for Real‑Time and Batch Data Processing
DataFunTalk
DataFunTalk
Dec 27, 2023 · Big Data

Apache Flink 2023: Core Technical Achievements and Future Directions

The article reviews Apache Flink's rapid development over the past decade, highlighting its 2023 community growth, SIGMOD award, major releases, streaming SQL enhancements, incremental checkpointing, batch maturity, cloud‑native scaling, and integration with the emerging Lakehouse architecture.

Apache FlinkBig DataCheckpoint
0 likes · 11 min read
Apache Flink 2023: Core Technical Achievements and Future Directions
DataFunSummit
DataFunSummit
Dec 20, 2023 · Cloud Native

Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro

This article introduces the background, challenges, and cloud‑native solutions of lakehouse architecture, explains Apache Iceberg’s open table format and its cloud‑native features, details Amoro’s management and self‑optimizing capabilities, showcases three real‑world cloud migration cases, and outlines future development plans.

AmoroApache IcebergData Management
0 likes · 12 min read
Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro
StarRocks
StarRocks
Dec 19, 2023 · Big Data

How WeChat Achieved Sub‑Second Real‑Time Analytics with StarRocks Lakehouse

WeChat transformed its data platform from Hadoop and ClickHouse to a StarRocks‑based lakehouse, tackling massive data volume, ultra‑low latency, and storage fragmentation by deploying lake‑on‑warehouse and warehouse‑lake fusion architectures, real‑time incremental materialized views, and unified SQL access, resulting in dramatic cost cuts and performance gains.

Big DataLakehouseStarRocks
0 likes · 15 min read
How WeChat Achieved Sub‑Second Real‑Time Analytics with StarRocks Lakehouse
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 8, 2023 · Cloud Computing

How Alibaba Cloud EMR Powers Serverless StarRocks for Seamless Lakehouse Analytics

This article summarizes Li Yu's presentation on Alibaba Cloud EMR's deep collaboration with the StarRocks community, detailing major contributions across versions, the serverless StarRocks product’s core capabilities, and future plans to enhance OLAP‑lakehouse integration, performance, and cloud‑native elasticity.

Alibaba CloudEMRLakehouse
0 likes · 7 min read
How Alibaba Cloud EMR Powers Serverless StarRocks for Seamless Lakehouse Analytics
StarRocks
StarRocks
Nov 23, 2023 · Databases

How StarRocks Redefines Lakehouse Architecture with Compute‑Storage Separation

StarRocks, an open‑source MPP analytical database, consolidates BI, interactive, and real‑time analytics into a single engine by evolving from version 1.0 to 3.x, introducing compute‑storage separation, unified catalog, generated columns, operator spill, and advanced materialized views, while outlining its cloud‑native lakehouse roadmap.

Compute-Storage SeparationLakehouseMPP database
0 likes · 22 min read
How StarRocks Redefines Lakehouse Architecture with Compute‑Storage Separation
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 23, 2023 · Big Data

Why Apache Paimon Is Revolutionizing Streaming Lakehouse Architecture with Flink

The article traces the shift from traditional Hive‑based warehouses to modern lakehouse architectures, explains the advantages of lake formats, introduces Apache Paimon as a streaming‑first data lake integrated with Flink, presents performance benchmarks showing its superiority over Hudi, and demonstrates a real‑time streaming lakehouse workflow.

Apache PaimonBig DataFlink
0 likes · 15 min read
Why Apache Paimon Is Revolutionizing Streaming Lakehouse Architecture with Flink
Big Data Technology Architecture
Big Data Technology Architecture
Nov 14, 2023 · Big Data

Open Source Big Data Platform 3.0: Streaming Lakehouse, Serverless Architecture, and AI Integration

The talk outlines the evolution of Alibaba Cloud's open‑source big data platform from Hadoop‑based EMR to a 3.0 architecture featuring a streaming lakehouse, full serverless compute and storage, AI‑driven operations, and upcoming vector search services, highlighting technical motivations, challenges, and product releases.

Big DataLakehouseServerless
0 likes · 14 min read
Open Source Big Data Platform 3.0: Streaming Lakehouse, Serverless Architecture, and AI Integration
dbaplus Community
dbaplus Community
Nov 8, 2023 · Big Data

Choosing Between Data Warehouse, Data Lake, and Lakehouse: When to Use Each

This article compares traditional data warehouses, modern data lakes, and emerging lakehouse architectures, explaining their design patterns, advantages, disadvantages, and suitable use cases, while detailing implementation considerations such as schema design, ETL/ELT processes, file formats like Delta, Iceberg, and Hudi, and factors influencing platform selection.

Apache SparkData LakeData Warehouse
0 likes · 20 min read
Choosing Between Data Warehouse, Data Lake, and Lakehouse: When to Use Each
StarRocks
StarRocks
Oct 31, 2023 · Databases

How Ctrip Accelerated Report Queries 10× with StarRocks: A Real‑World Lakehouse Migration

Ctrip migrated its Artnova reporting platform from Hive‑based queries to StarRocks, first loading data into OLAP tables and then using StarRocks as a lakehouse with Hive catalog, Data Cache and materialized views, achieving average query latency reductions from 20 seconds to 1.5 seconds, over 7× speed‑up versus Trino and up to 40× acceleration for complex workloads.

Big DataData CacheLakehouse
0 likes · 15 min read
How Ctrip Accelerated Report Queries 10× with StarRocks: A Real‑World Lakehouse Migration
DataFunSummit
DataFunSummit
Oct 16, 2023 · Big Data

Bilibili's Iceberg‑Based Lakehouse Platform: Technical Practices for Sub‑Second Query Response

This article details Bilibili's implementation of an Iceberg‑based lakehouse platform that unifies storage and analytics, addressing Hive’s performance and latency issues through multidimensional sorting, various file‑level indexes, cube pre‑aggregation, star‑tree structures, and an automated Magnus service for intelligent optimization, achieving near‑second query responses.

Big DataIcebergLakehouse
0 likes · 14 min read
Bilibili's Iceberg‑Based Lakehouse Platform: Technical Practices for Sub‑Second Query Response
Sohu Tech Products
Sohu Tech Products
Oct 11, 2023 · Industry Insights

How StarRocks Materialized Views Power Real‑Time Lakehouse Analytics

The article provides a deep technical overview of StarRocks 3.0’s data‑lake analysis capabilities, its unified Lakehouse architecture, Catalog integration, Trino compatibility, extensive I/O optimizations, materialized view features, resource isolation techniques, real‑world use cases, and future development directions.

AnalyticsData LakeLakehouse
0 likes · 22 min read
How StarRocks Materialized Views Power Real‑Time Lakehouse Analytics
DataFunTalk
DataFunTalk
Oct 5, 2023 · Big Data

Building a Unified Streaming‑Batch Lakehouse with Amoro Mixed Iceberg

This article describes how Shanghai Steel Union leveraged Amoro Mixed Iceberg on top of Apache Iceberg to create a unified streaming‑batch lakehouse, addressing small‑file and upsert challenges, simplifying architecture, improving data freshness, and providing a scalable solution for real‑time and batch analytics.

AmoroApache IcebergBig Data
0 likes · 13 min read
Building a Unified Streaming‑Batch Lakehouse with Amoro Mixed Iceberg
DataFunSummit
DataFunSummit
Sep 25, 2023 · Big Data

Trino in Bilibili Lakehouse: Compute Engine, Stability, and Containerization Practices

This article presents Bilibili's practical implementation of Trino within a lakehouse architecture, focusing on the compute engine placement, stability enhancements, and containerized deployment, while detailing indexing strategies, pre‑computation techniques, Iceberg metadata optimizations, and performance gains for large‑scale analytical queries.

IcebergLakehousePrecomputation
0 likes · 14 min read
Trino in Bilibili Lakehouse: Compute Engine, Stability, and Containerization Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 18, 2023 · Big Data

Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse

The article explains the mainstream Lambda data‑warehouse architecture, its benefits and challenges, then introduces Hudi as a lake‑house solution that unifies real‑time and offline storage, describes the multi‑layer service design, and showcases three practical scenarios—stream processing, real‑time multidimensional analysis, and stream‑batch data reuse—demonstrating how the integrated architecture improves latency, cost, and operational complexity.

Batch ProcessingData WarehouseHudi
0 likes · 13 min read
Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse
DataFunTalk
DataFunTalk
Sep 16, 2023 · Big Data

StarRocks Data Lake Analysis, Materialized Views, and Lakehouse Architecture

This article explains how StarRocks 3.0 extends real‑time data‑warehouse capabilities to support data‑lake analysis, external catalog integration, Trino compatibility, extensive I/O optimizations, and powerful materialized‑view features that together enable a unified, cloud‑native Lakehouse solution with high performance and flexible resource isolation.

Big DataData LakeLakehouse
0 likes · 20 min read
StarRocks Data Lake Analysis, Materialized Views, and Lakehouse Architecture
DataFunSummit
DataFunSummit
Sep 8, 2023 · Big Data

Tianqiong OLAP Real‑time Lakehouse Fusion Platform Architecture Practice

This article explains why lake‑warehouse fusion is needed, describes the challenges of integrating real‑time data warehouses with data lakes, introduces a new StarRocks‑based architecture that supports real‑time ingestion, cooling, offline loading, and adaptive hot‑cold query rewriting, and outlines future plans and Q&A.

Big DataData IntegrationData Warehouse
0 likes · 21 min read
Tianqiong OLAP Real‑time Lakehouse Fusion Platform Architecture Practice
StarRocks
StarRocks
Sep 6, 2023 · Big Data

How Paimon + StarRocks Revolutionize Lakehouse Analytics

This article reviews traditional Lambda and Kappa data‑warehouse architectures, then details four Paimon‑StarRocks lakehouse solutions—including a data‑lake center, accelerated query with materialized views, hot‑cold data separation, and the JNI connector—while also outlining StarRocks’ future roadmap for lakehouse analytics.

Big DataLakehousePaimon
0 likes · 11 min read
How Paimon + StarRocks Revolutionize Lakehouse Analytics
DataFunTalk
DataFunTalk
Sep 4, 2023 · Big Data

Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment

This article presents a comprehensive overview of a batch‑stream unified storage solution built on Hudi and the Lakehouse Analysis Service (LAS), covering background challenges, architectural design, data organization, read/write mechanisms, BTS architecture, real‑world deployment scenarios, and future development plans.

Batch-StreamData WarehouseHudi
0 likes · 22 min read
Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment
Data Thinking Notes
Data Thinking Notes
Aug 27, 2023 · Big Data

How ByteDance’s LAS Team Unified Real‑Time and Offline Warehousing with a Lakehouse Solution

This article analyzes the shortcomings of mainstream Lambda‑style data warehouse architectures, introduces Hudi‑based lakehouse design principles, details the three‑layer unified storage architecture, data distribution, model and read/write mechanisms, and showcases real‑time streaming, multidimensional analysis, and stream‑batch reuse scenarios along with future roadmap plans.

HudiLakehouseStreaming
0 likes · 14 min read
How ByteDance’s LAS Team Unified Real‑Time and Offline Warehousing with a Lakehouse Solution
Tencent Cloud Developer
Tencent Cloud Developer
Aug 23, 2023 · Big Data

WeChat Experiment Platform: Architecture Design and Iceberg Lakehouse Optimization

The WeChat Experiment Platform migrated its 60,000 metric, 200,000 core, 30 PB plus data pipeline to an Iceberg based lakehouse, leveraging three layer metadata, fine grained partitioning, MERGE into writes, time travel snapshots and skew handling UDFs, which cut core time by 69%, saved ~100 PB storage, and reduced latency by up to 70%.

Big DataData WarehouseIceberg
0 likes · 18 min read
WeChat Experiment Platform: Architecture Design and Iceberg Lakehouse Optimization
ITPUB
ITPUB
Aug 23, 2023 · Cloud Native

Build a Cloud‑Native Lakehouse on AWS with Apache Iceberg and Amoro

This guide explains the cloud‑native lakehouse concept, outlines its advantages and challenges, compares lake‑table projects such as Iceberg, and provides a step‑by‑step AWS deployment of Apache Iceberg and Amoro—including environment setup, AMS installation, catalog configuration, optimizer launch, data ingestion with Flink, and query verification with Spark.

AWSAmoroApache Iceberg
0 likes · 33 min read
Build a Cloud‑Native Lakehouse on AWS with Apache Iceberg and Amoro
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 21, 2023 · Big Data

Key Features and Benefits of Lakehouse Frameworks Hudi, Iceberg, and Paimon

This note outlines how Hudi, Iceberg, and Paimon provide unified batch‑stream storage, UPSERT support, time‑travel capabilities, and lower development costs, enabling a streaming‑warehouse architecture that offers near‑real‑time latency, consistent semantics, persisted intermediate results, and easier historical data repair.

Batch ProcessingHudiIceberg
0 likes · 5 min read
Key Features and Benefits of Lakehouse Frameworks Hudi, Iceberg, and Paimon

How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake

This article analyzes the rise of lake‑house architecture in the Hadoop ecosystem, compares the technical capabilities of Hudi, Iceberg and Delta Lake, details implementation enhancements such as MOR and multi‑writer support, showcases Flink integration, presents a real‑time marketing use case, and outlines future development directions.

Big DataData GovernanceDelta Lake
0 likes · 14 min read
How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake
StarRocks
StarRocks
Aug 9, 2023 · Databases

StarRocks 3.1 Highlights: Faster Lakehouse Analytics and Advanced Materialized Views

StarRocks 3.1 introduces a cloud‑native, lakehouse‑oriented architecture with enhanced storage‑compute separation, up to 3‑6× faster data‑lake queries than Trino/Presto, expanded Iceberg and Paimon support, richer materialized view capabilities, new random bucketing, expression partitioning, generated columns, and spill‑to‑disk stability, all backed by extensive performance optimizations and open‑source contributions.

Data LakeLakehouseMaterialized Views
0 likes · 17 min read
StarRocks 3.1 Highlights: Faster Lakehouse Analytics and Advanced Materialized Views
DataFunTalk
DataFunTalk
Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data
0 likes · 18 min read
Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg
DataFunTalk
DataFunTalk
Jul 10, 2023 · Big Data

Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture

This article presents a comprehensive overview of Lakehouse‑based in‑lake warehousing, covering common data‑lake misconceptions, the evolution from databases to data warehouses and lakes, the advantages of Lakehouse over traditional architectures, a reference multi‑layer architecture, typical use cases, challenges, future plans, and a brief Q&A.

Big Data ArchitectureData LakeData Warehouse
0 likes · 20 min read
Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 27, 2023 · Big Data

How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing

This article details Alibaba Cloud MaxCompute’s lakehouse evolution, describing its unified storage‑metadata‑compute design, the Transactional Table 2.0 format, near‑real‑time incremental ingestion, clustering and compaction services, transaction handling, TimeTravel and incremental queries, and future roadmap for big‑data workloads.

Big DataData WarehouseIncremental Processing
0 likes · 23 min read
How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing
DataFunTalk
DataFunTalk
Jun 24, 2023 · Big Data

Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing

This article explains the evolution of Alibaba Cloud's MaxCompute platform into a lakehouse architecture that supports near‑real‑time incremental processing, detailing its development history, core design of transactional tables, five‑module technical stack, data ingestion methods, optimization services, transaction management, query capabilities, ecosystem integration, practical applications, future roadmap, and common user questions.

Big DataData LakeIncremental Processing
0 likes · 24 min read
Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing
Bilibili Tech
Bilibili Tech
Jun 20, 2023 · Big Data

Design and Evolution of Bilibili's Billions 3.0 Log Platform: A Lakehouse Architecture with ClickHouse, Iceberg, and Trino

Bilibili evolved its log platform from ClickHouse‑based Billions 2.0 to Billions 3.0 lakehouse using Iceberg, HDFS, Trino, retaining ClickHouse for acceleration; this reduces storage cost by over 20%, improves observability, solves the compute‑storage mismatch, adds flexible indexing, and supports complex ETL while staying open‑source.

ClickHouseIcebergLakehouse
0 likes · 36 min read
Design and Evolution of Bilibili's Billions 3.0 Log Platform: A Lakehouse Architecture with ClickHouse, Iceberg, and Trino
DataFunSummit
DataFunSummit
Jun 13, 2023 · Big Data

Building a Sub‑Second Response Lakehouse Platform with Apache Iceberg at Bilibili

This article details Bilibili's implementation of a sub‑second response lakehouse platform using Apache Iceberg, covering background challenges, query acceleration techniques such as multi‑dimensional sorting, indexing, cube pre‑aggregation, and intelligent automated optimizations via the Magnus service, and reports current production metrics.

CubeIcebergLakehouse
0 likes · 14 min read
Building a Sub‑Second Response Lakehouse Platform with Apache Iceberg at Bilibili
DataFunTalk
DataFunTalk
May 23, 2023 · Big Data

Building a Millisecond‑Response Lakehouse Platform with Apache Iceberg: Architecture, Query Acceleration, and Intelligent Optimization

This article details Bilibili's technical practice of constructing a millisecond‑response lake‑warehouse platform using Apache Iceberg, covering the background challenges, unified architecture, multi‑dimensional sorting and indexing for query acceleration, the Magnus service for intelligent optimization, and the current production deployment and performance metrics.

Big DataCubeIceberg
0 likes · 14 min read
Building a Millisecond‑Response Lakehouse Platform with Apache Iceberg: Architecture, Query Acceleration, and Intelligent Optimization
DataFunTalk
DataFunTalk
May 22, 2023 · Big Data

Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices

This article explains Alibaba Cloud's data lake architecture, unified metadata services, storage management optimizations, and format handling techniques, illustrating how lakehouse concepts, multi‑engine support, and lifecycle policies enable efficient, secure, and cost‑effective big data processing in the cloud.

Big DataCloud ServicesData Lake
0 likes · 22 min read
Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices
DataFunSummit
DataFunSummit
May 16, 2023 · Big Data

LakeSoul Joins LF AI & Data Foundation as an Open‑Source Cloud‑Native Lakehouse Framework

LakeSoul, China's only open‑source lakehouse project, has been donated to the LF AI & Data Foundation, becoming its first lake‑warehouse framework and offering ACID‑guaranteed high‑concurrency upserts, a high‑performance Rust‑based I/O layer, real‑time data‑warehouse capabilities, and seamless AI/BI integration for modern big‑data applications.

AIData WarehouseLakeSoul
0 likes · 7 min read
LakeSoul Joins LF AI & Data Foundation as an Open‑Source Cloud‑Native Lakehouse Framework
DataFunTalk
DataFunTalk
Apr 13, 2023 · Big Data

Four Paradigms of StarRocks Lakehouse Integration and an Overview of StarRocks 3.0

This article explains why lake‑warehouse integration is needed, outlines its challenges, describes StarRocks' four integration paradigms—including query acceleration, layered modeling, real‑time warehouse‑lake fusion, and the cloud‑native 3.0 solution—and previews the upcoming StarRocks 3.0 release.

Big DataCloud NativeData Lake
0 likes · 18 min read
Four Paradigms of StarRocks Lakehouse Integration and an Overview of StarRocks 3.0
StarRocks
StarRocks
Apr 7, 2023 · Databases

StarRocks 3.0 Highlights: Storage‑Compute Separation, New RBAC, and Lakehouse Features

StarRocks 3.0 introduces a storage‑compute separation architecture, a full‑featured RBAC permission framework, enhanced materialized views, Trino‑compatible query dialect, richer Primary‑Key update/delete syntax, automatic partition creation, and numerous performance optimizations, marking a major step from OLAP to lakehouse analytics.

LakehouseRBACStarRocks
0 likes · 10 min read
StarRocks 3.0 Highlights: Storage‑Compute Separation, New RBAC, and Lakehouse Features
DataFunTalk
DataFunTalk
Mar 21, 2023 · Databases

Design and Technical Details of Apache Doris for Lakehouse Architecture

This article explains how Apache Doris extends its real‑time OLAP capabilities to support Lakehouse architectures, covering unified metadata, query acceleration, elastic compute, performance benchmarks, and future roadmap for richer data‑source integration and resource isolation.

Apache DorisBig DataData Warehouse
0 likes · 20 min read
Design and Technical Details of Apache Doris for Lakehouse Architecture
ITPUB
ITPUB
Feb 22, 2023 · Databases

How Huawei’s GaussDB(DWS) 3.0 Redefines Cloud‑Native Data Warehousing

This article summarizes Wang Chuanting’s DTCC2022 talk on Huawei Cloud GaussDB(DWS) 3.0, detailing its cloud‑native architecture, layered elasticity, lake‑warehouse integration, performance acceleration techniques, and how it seamlessly couples data‑processing pipelines with AI workloads for modern, real‑time analytics.

AI integrationCloud NativeData Warehouse
0 likes · 16 min read
How Huawei’s GaussDB(DWS) 3.0 Redefines Cloud‑Native Data Warehousing
DataFunTalk
DataFunTalk
Jan 28, 2023 · Big Data

Data Lake vs Data Warehouse: Differences, Evolution, and Integrated Lakehouse Design

This article explores the ongoing debate between data lakes and data warehouses, clarifies their distinct purposes and technologies, discusses how they can coexist or complement each other, and introduces the concept of an integrated lakehouse architecture while promoting a comprehensive data intelligence knowledge map.

Big DataData LakeData Warehouse
0 likes · 5 min read
Data Lake vs Data Warehouse: Differences, Evolution, and Integrated Lakehouse Design
ITPUB
ITPUB
Jan 26, 2023 · Big Data

How NetEase’s Arctic Unifies Streaming and Batch with Iceberg for Real‑Time Lakehouse

This article explains the challenges of a Lambda‑architecture data pipeline, introduces NetEase’s Arctic lakehouse built on Apache Iceberg, details its table‑store design, optimization cycles, consistency mechanisms, real‑time features, practical use cases, and future roadmap, highlighting its advantages over similar solutions.

ArcticData IntegrationFlink
0 likes · 14 min read
How NetEase’s Arctic Unifies Streaming and Batch with Iceberg for Real‑Time Lakehouse
Tencent Cloud Developer
Tencent Cloud Developer
Jan 3, 2023 · Big Data

How Tencent’s Cloud‑Native Lakehouse Tackles PB‑Scale Performance Challenges

This article analyzes Tencent Cloud’s DLC lakehouse solution, explaining the unified data lake‑warehouse architecture, the performance hurdles of object‑storage‑based analytics, and the multi‑dimensional caching, virtual‑cluster elasticity, and advanced filter techniques that enable second‑level analysis on petabyte‑scale data while reducing costs.

Big DataDLCLakehouse
0 likes · 13 min read
How Tencent’s Cloud‑Native Lakehouse Tackles PB‑Scale Performance Challenges
DataFunSummit
DataFunSummit
Dec 29, 2022 · Big Data

Understanding Lakehouse Systems: Architecture, Practices, and Innovations by Databricks

This article explains the Lakehouse concept, why it is needed, the limitations of traditional data warehouses and data lakes, and how Databricks’ unified architecture—through open storage formats, fine‑grained governance, and optimized query engines—delivers high‑quality, low‑latency data for BI, analytics, and machine learning workloads.

DatabricksDelta LakeLakehouse
0 likes · 21 min read
Understanding Lakehouse Systems: Architecture, Practices, and Innovations by Databricks
DataFunTalk
DataFunTalk
Dec 23, 2022 · Big Data

Building a Lakehouse on Alibaba Cloud AnalyticDB (ADB) with Apache Hudi: Architecture, Challenges, and Practices

This article presents a comprehensive technical overview of Alibaba Cloud AnalyticDB's Lakehouse edition, detailing its unified architecture, key advantages, the challenges of ingesting billions of records with Apache Hudi, and the engineering solutions—including Flink integration, hotspot mitigation, memory optimization, OSS throttling handling, concurrent write support, lifecycle management, and TableService—that enable a cost‑effective, high‑performance lake‑to‑warehouse platform.

Apache HudiFlinkLakehouse
0 likes · 19 min read
Building a Lakehouse on Alibaba Cloud AnalyticDB (ADB) with Apache Hudi: Architecture, Challenges, and Practices
DataFunTalk
DataFunTalk
Dec 8, 2022 · Big Data

Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg

This article introduces NetEase’s Arctic, a real‑time lakehouse system built on Apache Iceberg that unifies streaming and batch processing, explains the challenges of Lambda architecture, details Arctic’s features such as change/base stores, hidden queue, transaction handling, and shares internal practice cases and future roadmap.

Apache IcebergArcticData Lake
0 likes · 12 min read
Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg
StarRocks
StarRocks
Nov 4, 2022 · Big Data

Building a High‑Performance, Cost‑Effective Cloud Lakehouse with StarRocks and EMR

This article explains how to design and implement a cloud‑native Lakehouse using StarRocks and Tencent Cloud EMR, covering core technical requirements, a five‑layer architecture, data ingestion with Iceberg/Hudi, performance tricks like Z‑order clustering, cost‑control through elastic scaling, and the key product features of EMR StarRocks.

Big DataEMRHudi
0 likes · 24 min read
Building a High‑Performance, Cost‑Effective Cloud Lakehouse with StarRocks and EMR
Bilibili Tech
Bilibili Tech
Sep 30, 2022 · Big Data

Bilibili's Efficient Lakehouse Platform Built on Trino and Iceberg

Bilibili’s new lake‑house platform, built on Trino and Iceberg, replaces Hive‑based pipelines by ingesting logs and DB data into Iceberg tables, applying advanced sorting, Z‑order/Hilbert clustering, bitmap and bloom indexes, virtual join columns and pre‑aggregation, enabling 70 000 daily queries on 2 PB with average scans of 2 GB and sub‑2‑second response times.

Big DataData SkippingIceberg
0 likes · 15 min read
Bilibili's Efficient Lakehouse Platform Built on Trino and Iceberg
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 13, 2022 · Big Data

From Hadoop to Cloud‑Native: The Evolution of Data Lakes and Modern Architecture

This article traces the history of data lakes from their 2010 inception with Hadoop through cloud‑native object storage, lakehouse formats like Delta Lake, and Alibaba Cloud's multi‑layer solution, outlining key architectural stages and practical construction challenges for enterprise‑grade implementations.

Alibaba CloudBig DataCloud Native
0 likes · 9 min read
From Hadoop to Cloud‑Native: The Evolution of Data Lakes and Modern Architecture
Tencent Cloud Developer
Tencent Cloud Developer
Sep 9, 2022 · Big Data

Data Lake, Data Warehouse, and Lakehouse: Concepts, Architectures, and Industry Practices

The article explains how data lakes excel at ingesting massive, varied data, data warehouses optimize storage and query performance, and lake‑house architectures combine both strengths—offering scalable, low‑cost storage with high‑speed analytics—highlighting industry solutions from Snowflake, Databricks, and major cloud providers.

AnalyticsBig DataData Lake
0 likes · 8 min read
Data Lake, Data Warehouse, and Lakehouse: Concepts, Architectures, and Industry Practices
Shopee Tech Team
Shopee Tech Team
Sep 2, 2022 · Big Data

Shopee Data System Challenges and Apache Hudi Practices

Shopee tackled its data‑system bottlenecks by customizing Apache Hudi to provide unified stream‑batch integration, efficient state‑detail snapshots, and low‑latency wide‑table generation, using CDC‑based bootstrapping, COW/MOR tables, savepoints and partial updates, which cut latency to ten minutes, lowered resource use, and yielded several community‑backed enhancements.

Apache HudiBig DataData Integration
0 likes · 18 min read
Shopee Data System Challenges and Apache Hudi Practices
DataFunTalk
DataFunTalk
Aug 10, 2022 · Big Data

Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service

The article reviews recent developments in data‑lake table formats—Delta Lake 2.0, Iceberg, and Hudi—examining their features, benchmark results, and ecosystem impact, and then introduces Arctic, an open‑source streaming lakehouse service built on Iceberg that aims to bridge batch‑stream gaps for enterprises.

BenchmarkData LakeDelta Lake
0 likes · 24 min read
Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service
DataFunTalk
DataFunTalk
Aug 5, 2022 · Big Data

Delta Lake Principles, eBay Migration, and Practical Enhancements

This talk by eBay software engineer Zhu Feng explains the fundamentals of Delta Lake and Lakehouse architecture, outlines eBay’s migration from Teradata to a Spark‑based platform, and details the custom enhancements, performance optimizations, and operational improvements implemented to support large‑scale update and delete workloads.

Data LakeDelta LakeLakehouse
0 likes · 16 min read
Delta Lake Principles, eBay Migration, and Practical Enhancements
DataFunTalk
DataFunTalk
Aug 1, 2022 · Big Data

Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices

This article details Bilibili's lakehouse implementation using Apache Iceberg and Alluxio, covering background challenges, architectural components, data organization techniques like Z‑order and bitmap indexes, performance benchmarks, and future optimization plans for large‑scale analytics.

AlluxioBitmap IndexIceberg
0 likes · 21 min read
Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices
High Availability Architecture
High Availability Architecture
Jul 7, 2022 · Big Data

Interview with Tencent Cloud’s Zhang Zhigang on Lakehouse Architecture and Cloud‑Native Integration

In this interview, Tencent Cloud expert Zhang Zhigang explains the fundamentals and key technologies of lakehouse architecture, discusses how cloud‑native practices enhance its performance and operability, and offers practical advice for big‑data professionals ahead of the 2022 GIAC Global Internet Architecture Conference in Shenzhen.

Cloud NativeData ArchitectureLakehouse
0 likes · 10 min read
Interview with Tencent Cloud’s Zhang Zhigang on Lakehouse Architecture and Cloud‑Native Integration
Baidu Geek Talk
Baidu Geek Talk
Jul 1, 2022 · Big Data

Evolution of Data Platform Technology: From Data Warehouse to Lakehouse Architecture

The article traces the evolution of data platforms from early data warehouses—using schema‑on‑write, columnar storage, and MPP engines—to data lakes that retain raw data with schema‑on‑read, and finally to lakehouse architectures that merge storage and compute, offering unified metadata, versioning, and support for BI, big‑data, AI, and HPC workloads.

Data ArchitectureLakehouseOLAP
0 likes · 25 min read
Evolution of Data Platform Technology: From Data Warehouse to Lakehouse Architecture
dbaplus Community
dbaplus Community
May 21, 2022 · Big Data

5 Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time Pipelines, Cloud Market

The article outlines five major 2022 data trends— the rise of analytics engineers, the intensifying lake‑house competition, the growth of real‑time streaming pipelines and operational analytics, the expanding cloud marketplaces for data tools, and the push toward unified data‑quality terminology—explaining their origins, market impact, and future outlook.

Data QualityLakehouseReal-time Streaming
0 likes · 21 min read
5 Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time Pipelines, Cloud Market
DataFunTalk
DataFunTalk
May 17, 2022 · Big Data

Exploring JuiceFS in Data Lake Storage Architecture

This presentation provides a comprehensive overview of JuiceFS, an open‑source cloud‑native distributed file system, detailing its role in modern data lake and lakehouse architectures, comparing it with HDFS and object storage, and highlighting its performance, integration, and community ecosystem.

Big DataData LakeDistributed File System
0 likes · 19 min read
Exploring JuiceFS in Data Lake Storage Architecture
HomeTech
HomeTech
Apr 27, 2022 · Big Data

AutoStream Real‑Time Computing Platform: Architecture, Resource Management, Scaling, Lakehouse Integration, and PyFlink Practices

This article details Car Home's AutoStream platform evolution from Storm to Flink‑based versions, covering real‑time application scenarios, strict budget‑controlled resource management, automatic scaling, lake‑house architecture with Iceberg, PyFlink integration, and future plans for resource optimisation and batch‑stream unification.

AutoStreamFlinkLakehouse
0 likes · 15 min read
AutoStream Real‑Time Computing Platform: Architecture, Resource Management, Scaling, Lakehouse Integration, and PyFlink Practices
DataFunTalk
DataFunTalk
Apr 7, 2022 · Big Data

Apache Kyuubi: Architecture, Use Cases, Community, and Mobile Cloud Deployment

This article introduces Apache Kyuubi—a multi‑tenant Thrift JDBC/ODBC service built on Spark—detailing its architecture, advantages over Spark Thrift Server, real‑world use cases, open‑source community progress, and practical deployment strategies on mobile cloud, Kubernetes, and with Trino.

Apache SparkBig DataKubernetes
0 likes · 16 min read
Apache Kyuubi: Architecture, Use Cases, Community, and Mobile Cloud Deployment
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 31, 2022 · Big Data

Bilibili’s Lakehouse Architecture: Integrating Data Lake and Warehouse with Apache Iceberg

To address the high cost and low efficiency of traditional Hadoop‑based data pipelines, Bilibili designed a lakehouse solution using Apache Iceberg, integrating Spark, Flink, Trino, and Alluxio to unify flexible data lake storage with warehouse‑level query performance, reducing data duplication and improving interactive analytics.

Big DataData WarehouseIceberg
0 likes · 17 min read
Bilibili’s Lakehouse Architecture: Integrating Data Lake and Warehouse with Apache Iceberg
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 17, 2022 · Big Data

How AutoStream Scales Real‑Time Data Processing with Flink, Iceberg, and PyFlink

This article details AutoStream's evolution from a Java‑only Storm platform to a Flink‑based, Kubernetes‑native streaming system that integrates budgeting controls, automatic scaling, lakehouse architecture with Iceberg, and PyFlink support, highlighting the technical challenges, solutions, and future roadmap for real‑time analytics.

FlinkIcebergLakehouse
0 likes · 23 min read
How AutoStream Scales Real‑Time Data Processing with Flink, Iceberg, and PyFlink
21CTO
21CTO
Feb 24, 2022 · Big Data

5 Data Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time

In 2022 the modern data stack will be driven by the rise of analytics engineers, intensified competition between lakehouse and warehouse solutions, growing demand for real‑time analytics, the explosive growth of cloud marketplaces, and the emergence of unified data‑quality terminology, all reshaping data infrastructure and operational practices.

Data QualityLakehouseReal-time analytics
0 likes · 17 min read
5 Data Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time
Bilibili Tech
Bilibili Tech
Feb 18, 2022 · Big Data

Evolution of Bilibili's Data Retrieval Services and Lakehouse Architecture

Bilibili’s data retrieval journey progressed from a fragmented, chimney‑style pipeline to a unified Flink‑based service layer with the Ark construction system and Akuya SQL engine, and finally to an Iceberg‑driven lakehouse that eliminates data duplication, streamlines cross‑engine optimization, and offers platformized, low‑latency analytics.

Big DataBilibiliData Retrieval
0 likes · 14 min read
Evolution of Bilibili's Data Retrieval Services and Lakehouse Architecture
DataFunTalk
DataFunTalk
Jan 8, 2022 · Big Data

Lakehouse: Concepts, Architecture, Implementation, and Cloud Practices

This article provides a comprehensive overview of the Lakehouse paradigm, tracing its origins from traditional data warehouses and data lakes, comparing architectures, detailing core components such as Delta Lake and Iceberg, and illustrating practical cloud implementations and future directions.

Apache IcebergBig DataCloud Data Platform
0 likes · 14 min read
Lakehouse: Concepts, Architecture, Implementation, and Cloud Practices
ByteDance Data Platform
ByteDance Data Platform
Dec 31, 2021 · Big Data

How ByteDance Leverages Hudi for a Real‑Time Data Lake Platform

This article introduces ByteDance’s real‑time data lake platform built on Apache Hudi, covering Hudi fundamentals, table types, indexing, practical use cases, platform optimizations, and future roadmap, illustrating how the system enables low‑latency, scalable analytics across batch and streaming workloads.

HudiLakehousemetadata management
0 likes · 11 min read
How ByteDance Leverages Hudi for a Real‑Time Data Lake Platform
DataFunTalk
DataFunTalk
May 16, 2021 · Big Data

Efficient Data Update/Delete and Real‑time Processing in the Arctic Lakehouse System

This article explains the evolution from traditional data warehouses to modern lakehouse architectures, introduces the Arctic system’s dynamic hash tree for fast update/delete, describes file splitting with sequence/offset ordering, and compares copy‑on‑write versus merge‑on‑read techniques for achieving low‑latency analytics.

ArcticBig DataCopy-on-Write
0 likes · 12 min read
Efficient Data Update/Delete and Real‑time Processing in the Arctic Lakehouse System
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 31, 2020 · Big Data

Data Lake vs Data Warehouse: Evolution, Comparison, and Alibaba Cloud Lakehouse Integration

This article examines the 20‑year evolution of big data architectures, contrasts data lakes and data warehouses, explores their respective strengths and challenges, and details Alibaba Cloud’s lake‑warehouse (lakehouse) solution that unifies storage, metadata, and compute for enterprise‑grade analytics and AI workloads.

Data ArchitectureData LakeData Warehouse
0 likes · 30 min read
Data Lake vs Data Warehouse: Evolution, Comparison, and Alibaba Cloud Lakehouse Integration
Big Data Technology Architecture
Big Data Technology Architecture
Jun 7, 2020 · Big Data

Comprehensive Overview of Data Lake Concepts, Architectures, Vendor Solutions, and Use Cases

This article provides an in‑depth, English‑language overview of data lakes, covering their definition, core characteristics, reference architectures, major cloud‑vendor implementations (AWS, Huawei, Alibaba Cloud, Azure), typical industry applications such as advertising and gaming, as well as practical guidance on building and evolving a data lake in a cloud‑native, big‑data environment.

AnalyticsData ArchitectureLakehouse
0 likes · 50 min read
Comprehensive Overview of Data Lake Concepts, Architectures, Vendor Solutions, and Use Cases