Tagged articles

Lakehouse

215 articles · Page 2 of 3

Jun 9, 2024 · Big Data

Optimizing ClickHouse Performance in WeChat: Observation Tools, Lakehouse Reading, Bitmap Acceleration, and AI Integration

This article details how the WeChat team leverages ClickHouse at massive scale, introduces a suite of performance observation tools, describes lakehouse reading and bitmap optimizations, and explains the integration of AI workloads, demonstrating overall query speedups of up to tenfold across diverse scenarios.

Big DataClickHouseLakehouse

0 likes · 10 min read

Optimizing ClickHouse Performance in WeChat: Observation Tools, Lakehouse Reading, Bitmap Acceleration, and AI Integration

Alibaba Cloud Big Data AI Platform

Jun 6, 2024 · Databases

How StarRocks Redefines Lakehouse Architecture with Ultra-Fast Unified Analytics

StarRocks combines extreme query speed and a unified architecture to deliver a lakehouse solution that separates storage and compute, supports multi‑warehouse resource isolation, offers Trino compatibility, materialized‑view acceleration, and cost‑effective scaling, making it suitable for real‑time analytics, data‑lake queries, and traditional OLAP workloads.

Big DataLakehouseStarRocks

0 likes · 23 min read

How StarRocks Redefines Lakehouse Architecture with Ultra-Fast Unified Analytics

DataFunSummit

Jun 5, 2024 · Big Data

Databricks Acquires Tabular to Unite Delta Lake and Apache Iceberg for an Open Lakehouse

Databricks announced the acquisition of Tabular, the company founded by the original creators of Apache Iceberg, aiming to integrate Delta Lake and Iceberg into a unified, open lakehouse architecture that enhances format compatibility, reduces data silos, and supports AI workloads.

Apache IcebergBig DataDatabricks

0 likes · 5 min read

Databricks Acquires Tabular to Unite Delta Lake and Apache Iceberg for an Open Lakehouse

Past Memory Big Data

Jun 5, 2024 · Industry Insights

Databricks Acquires Tabular, the Company Behind Apache Iceberg, to Boost Lakehouse Interoperability

Databricks announced its acquisition of Tabular, the creators of Apache Iceberg, aiming to unify lakehouse formats through Delta Lake UniForm, while highlighting the rise of lakehouse architecture, format fragmentation, and the push toward open data interoperability.

Apache IcebergData InteroperabilityDatabricks

0 likes · 9 min read

Databricks Acquires Tabular, the Company Behind Apache Iceberg, to Boost Lakehouse Interoperability

StarRocks

May 14, 2024 · Artificial Intelligence

How Tencent Games Boosted AI‑Generated SQL Accuracy to 89% with a Lakehouse Architecture

Tencent Games tackled the low accuracy of AI‑generated SQL in production by combining large language models with a StarRocks lake‑warehouse, introducing a semantic layer, async materialized views, and an agent‑based multi‑intelligence framework, ultimately raising one‑shot SQL correctness to 89% and cutting delivery time from 2 hours to 0.33 hours.

AIData EngineeringLLM

0 likes · 13 min read

How Tencent Games Boosted AI‑Generated SQL Accuracy to 89% with a Lakehouse Architecture

DataFunSummit

May 12, 2024 · Big Data

Practice of Lakehouse‑Integrated Data Platform Architecture in the Financial Innovation Sector

This article presents the evolution of data platform architectures, the specific challenges of financial‑sector information‑technology innovation, and the design, core components, deployment path, and real‑world case studies of the cloud‑native lakehouse solution DataCyber developed by Shuxin Network.

Big DataData PlatformFinancial Innovation

0 likes · 21 min read

Practice of Lakehouse‑Integrated Data Platform Architecture in the Financial Innovation Sector

DataFunSummit

May 5, 2024 · Big Data

Alluxio in Lakehouse Architecture: Benefits, Challenges, and Real‑World Use Cases

This article explains how Alluxio enables a unified lake‑warehouse architecture by decoupling compute and storage, outlines its core capabilities, evaluates the cost‑saving and performance benefits, discusses the technical challenges, and presents several practical deployment scenarios in finance and AI workloads.

AlluxioBig DataData Orchestration

0 likes · 15 min read

Alluxio in Lakehouse Architecture: Benefits, Challenges, and Real‑World Use Cases

DataFunSummit

Apr 25, 2024 · Big Data

Paimon Project Overview: Recent Developments, Core Capabilities, and Future Roadmap

This article presents a comprehensive overview of the Apache‑incubated Paimon project, covering its evolution from Flink Table Store, the current features of primary‑key and log tables, management tools such as snapshots, tags and branches, performance optimizations for Flink and Spark, and a detailed roadmap of upcoming functionalities.

Big DataData ManagementFlink

0 likes · 23 min read

Paimon Project Overview: Recent Developments, Core Capabilities, and Future Roadmap

DataFunTalk

Apr 23, 2024 · Big Data

Apache Paimon Graduates to Top‑Level Project – Milestones, Core Capabilities, and Community Highlights

Apache Paimon, originally launched as Flink Table Store, has graduated to an Apache Top‑Level Project after a year of incubation, showcasing real‑time lakehouse capabilities, extensive ecosystem integration, and strong adoption by major enterprises, marking a significant milestone for streaming and batch data processing.

Apache PaimonBig DataLakehouse

0 likes · 9 min read

Apache Paimon Graduates to Top‑Level Project – Milestones, Core Capabilities, and Community Highlights

DataFunTalk

Apr 20, 2024 · Big Data

Tencent Video Metrics Middle Platform and Lakehouse Integration: Architecture, Governance, and Practices

This article details Tencent Video’s data business, describing the design and implementation of its metrics middle platform and lake‑warehouse integration, covering architecture, governance, consistency, timeliness, usability, cost optimization, and future plans, with insights into technology choices such as Iceberg, StarRocks, and MQL.

Big DataData EngineeringData Governance

0 likes · 18 min read

Tencent Video Metrics Middle Platform and Lakehouse Integration: Architecture, Governance, and Practices

StarRocks

Apr 12, 2024 · Databases

How StarRocks Materialized Views Supercharge Metrics Platforms: Real‑World Cases & Modeling Paradigms

This article explains the concept of a metrics layer, why StarRocks is suited for building such platforms, and presents detailed case studies from Airbnb, a major bank, and a leading restaurant chain, while comparing three modeling paradigms and outlining the future vision for materialized views.

Case StudyLakehouseMaterialized Views

0 likes · 18 min read

How StarRocks Materialized Views Supercharge Metrics Platforms: Real‑World Cases & Modeling Paradigms

DataFunTalk

Mar 16, 2024 · Big Data

Performance Optimization Practices for KwaiBI Big Data Analysis Platform

This article introduces KwaiBI, the internal data analysis product of Kuaishou, outlines its five major functional areas, details the performance challenges of large‑scale analytics, and presents a comprehensive set of optimization techniques—including cache warming, query rewriting, materialized acceleration, and the Bleem lake‑house engine—along with future directions and a brief Q&A.

Big DataKwaiBILakehouse

0 likes · 15 min read

Performance Optimization Practices for KwaiBI Big Data Analysis Platform

DataFunSummit

Mar 14, 2024 · Big Data

Tencent Game Data Analysis: Lakehouse Integration Practice

This article presents Tencent Game's comprehensive lakehouse integration practice, detailing the project background, storage‑compute separation, data layering, unified DDL/DML operations, performance optimizations, and future plans, illustrating how StarRocks, Iceberg, and Spark are combined to achieve scalable, cost‑effective analytics for massive game data.

Compute-Storage SeparationData WarehouseIceberg

0 likes · 16 min read

Tencent Game Data Analysis: Lakehouse Integration Practice

DataFunTalk

Mar 4, 2024 · Big Data

Design and Implementation of a Lakehouse‑Integrated Data Platform for Financial Innovation by Shuxin Network

This article presents Shuxin Network's practical experience in building a cloud‑native, lakehouse‑integrated data platform for the financial sector, covering architecture evolution, challenges of domestic‑innovation (信创), the DataCyber solution, core components, deployment roadmap, and real‑world case studies.

Big DataCloud NativeData Platform

0 likes · 21 min read

Design and Implementation of a Lakehouse‑Integrated Data Platform for Financial Innovation by Shuxin Network

DataFunSummit

Feb 26, 2024 · Big Data

Building a New Lakehouse Analytics Paradigm with StarRocks and Paimon

This article introduces a new lakehouse analytics paradigm by combining StarRocks and Paimon, covering the evolution of data lake technologies, key integration scenarios, core technical mechanisms such as JNI connectors, materialized views, and future roadmap for enhanced lakehouse capabilities.

AnalyticsBig DataData Lake

0 likes · 16 min read

Building a New Lakehouse Analytics Paradigm with StarRocks and Paimon

DataFunTalk

Feb 9, 2024 · Big Data

Alluxio’s Role in Lakehouse Architecture: Benefits, Challenges, and Real‑World Use Cases

This article explains how Alluxio enables lake‑warehouse integration by providing a data orchestration layer that caches data near compute, reduces storage‑compute separation costs, improves performance, and addresses challenges such as security, scalability, and multi‑cloud deployment, illustrated with several industry case studies.

AIAlluxioBig Data

0 likes · 16 min read

Alluxio’s Role in Lakehouse Architecture: Benefits, Challenges, and Real‑World Use Cases

DataFunTalk

Jan 27, 2024 · Big Data

JuiceFS: A Cloud‑Native Distributed File System for Data Lake and Lakehouse

This article presents JuiceFS, a cloud‑native distributed file system that bridges the gaps between HDFS and object storage, explaining Data Lake and Lakehouse concepts, comparing storage options, detailing JuiceFS's architecture and performance benefits, and showcasing real‑world user case studies.

Big DataDistributed File SystemJuiceFS

0 likes · 23 min read

JuiceFS: A Cloud‑Native Distributed File System for Data Lake and Lakehouse

Past Memory Big Data

Jan 17, 2024 · Big Data

How WeChat Implements a StarRocks‑Powered Lakehouse Across Multiple Business Scenarios

WeChat evolved its data platform from Hadoop to ClickHouse and finally to a StarRocks‑based lakehouse, solving data fragmentation and storage redundancy while achieving sub‑second to minute‑level query latency, cutting storage costs by over 65%, halving operational tasks, and reducing offline job time by two hours across several business lines.

Big DataLakehouseMaterialized Views

0 likes · 16 min read

How WeChat Implements a StarRocks‑Powered Lakehouse Across Multiple Business Scenarios

DataFunTalk

Jan 9, 2024 · Big Data

Analyzing Lakehouse Storage Systems: Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Hudi, and Iceberg

This article examines the design of lakehouse storage systems by comparing Delta Lake, Apache Hudi, and Apache Iceberg, focusing on metadata management, Merge‑On‑Read mechanisms, and a series of query and write performance optimizations with real‑world EMR case studies.

Apache HudiApache IcebergBig Data

0 likes · 16 min read

Analyzing Lakehouse Storage Systems: Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Hudi, and Iceberg

DataFunSummit

Jan 9, 2024 · Big Data

Introducing Yunqi Lakehouse: An Integrated Cloud‑Native Data Platform with Incremental Computing and Auto Materialized Views

This article introduces Yunqi's self‑developed Lakehouse product, explaining its cloud‑native, one‑stop data platform architecture, incremental computing that balances freshness, performance and cost, and the autoMV feature that automatically creates materialized views to boost query speed up to nine times.

Auto Materialized ViewBig DataData Platform

0 likes · 14 min read

Introducing Yunqi Lakehouse: An Integrated Cloud‑Native Data Platform with Incremental Computing and Auto Materialized Views

StarRocks

Jan 3, 2024 · Big Data

How Xiaohongshu Scaled Real‑Time Analytics with StarRocks: 6‑7× Faster Queries and 35% Cost Savings

Xiaohongshu’s OLAP team migrated from Presto to StarRocks, doubling cluster count to 30, boosting query speed by 6‑7 times, cutting latency to 200 ms, and achieving up to 35% cost reduction through gray‑scale migration and AWS Spot‑based elastic scaling.

Data PlatformLakehousePerformance Optimization

0 likes · 18 min read

How Xiaohongshu Scaled Real‑Time Analytics with StarRocks: 6‑7× Faster Queries and 35% Cost Savings

Tongcheng Travel Technology Center

Dec 27, 2023 · Big Data

Recap of Tongcheng Travel’s 7th Big Data Technology Salon – Talks on StarRocks, Paimon, Iceberg, Data+AI, Vector Retrieval, Real‑Time Computing, and Hotel Ranking

The 7th Tongcheng Travel Big Data Technology Salon in Beijing featured a series of expert talks covering StarRocks architecture evolution, lake‑house solutions with Paimon, Iceberg real‑time upsert, Data+AI for travel recommendation, vector retrieval in AI, JD Logistics real‑time computing governance, and multi‑task hotel ranking modeling, providing deep technical insights and future roadmaps.

AIBig DataLakehouse

0 likes · 10 min read

Recap of Tongcheng Travel’s 7th Big Data Technology Salon – Talks on StarRocks, Paimon, Iceberg, Data+AI, Vector Retrieval, Real‑Time Computing, and Hotel Ranking

DataFunTalk

Dec 27, 2023 · Big Data

Amoro Mixed Hive: A Unified Lakehouse Solution for Real‑Time and Batch Data Processing

This article describes how NetEase Youdao replaced its Doris‑based real‑time data warehouse with Amoro Mixed Hive, detailing the architectural challenges, the Mixed Hive design, implementation steps, performance optimizations, community contributions, and future roadmap to achieve a unified lakehouse with minute‑level freshness and reduced development and operational costs.

AmoroBig DataFlink

0 likes · 12 min read

Amoro Mixed Hive: A Unified Lakehouse Solution for Real‑Time and Batch Data Processing

DataFunTalk

Dec 27, 2023 · Big Data

Apache Flink 2023: Core Technical Achievements and Future Directions

The article reviews Apache Flink's rapid development over the past decade, highlighting its 2023 community growth, SIGMOD award, major releases, streaming SQL enhancements, incremental checkpointing, batch maturity, cloud‑native scaling, and integration with the emerging Lakehouse architecture.

Apache FlinkBig DataCheckpoint

0 likes · 11 min read

Apache Flink 2023: Core Technical Achievements and Future Directions

DataFunSummit

Dec 20, 2023 · Cloud Native

Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro

This article introduces the background, challenges, and cloud‑native solutions of lakehouse architecture, explains Apache Iceberg’s open table format and its cloud‑native features, details Amoro’s management and self‑optimizing capabilities, showcases three real‑world cloud migration cases, and outlines future development plans.

AmoroApache IcebergData Management

0 likes · 12 min read

Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro

StarRocks

Dec 19, 2023 · Big Data

How WeChat Achieved Sub‑Second Real‑Time Analytics with StarRocks Lakehouse

WeChat transformed its data platform from Hadoop and ClickHouse to a StarRocks‑based lakehouse, tackling massive data volume, ultra‑low latency, and storage fragmentation by deploying lake‑on‑warehouse and warehouse‑lake fusion architectures, real‑time incremental materialized views, and unified SQL access, resulting in dramatic cost cuts and performance gains.

Big DataLakehouseStarRocks

0 likes · 15 min read

How WeChat Achieved Sub‑Second Real‑Time Analytics with StarRocks Lakehouse

DataFunTalk

Dec 12, 2023 · Big Data

Flink Forward Asia 2023 Recap: Keynote Highlights, Technical Advances, and Community Updates

The Flink Forward Asia 2023 conference recap highlights opening remarks, a keynote on Flink’s dominance in streaming compute, detailed 2023 technical advancements, case studies, the launch of Flink CDC 3.0, and a preview of Flink 2.0, along with links to photos and video recordings.

Apache FlinkBig DataFlink 2.0

0 likes · 5 min read

Flink Forward Asia 2023 Recap: Keynote Highlights, Technical Advances, and Community Updates

Alibaba Cloud Big Data AI Platform

Dec 8, 2023 · Cloud Computing

How Alibaba Cloud EMR Powers Serverless StarRocks for Seamless Lakehouse Analytics

This article summarizes Li Yu's presentation on Alibaba Cloud EMR's deep collaboration with the StarRocks community, detailing major contributions across versions, the serverless StarRocks product’s core capabilities, and future plans to enhance OLAP‑lakehouse integration, performance, and cloud‑native elasticity.

Alibaba CloudCloud ComputingEMR

0 likes · 7 min read

How Alibaba Cloud EMR Powers Serverless StarRocks for Seamless Lakehouse Analytics

DataFunTalk

Nov 24, 2023 · Big Data

Amoro Lakehouse Management System: Deployment Practices and AWS Integration for Apache Iceberg

This article introduces Amoro, a lakehouse management platform built on Apache Iceberg, explains why Webex adopted it to overcome Hive limitations, details its AWS GlueCatalog and S3 integration with DynamoDB lock management, and provides step‑by‑step Helm‑based deployment instructions on Kubernetes.

AWSAmoroApache Iceberg

0 likes · 19 min read

Amoro Lakehouse Management System: Deployment Practices and AWS Integration for Apache Iceberg

StarRocks

Nov 23, 2023 · Databases

How StarRocks Redefines Lakehouse Architecture with Compute‑Storage Separation

StarRocks, an open‑source MPP analytical database, consolidates BI, interactive, and real‑time analytics into a single engine by evolving from version 1.0 to 3.x, introducing compute‑storage separation, unified catalog, generated columns, operator spill, and advanced materialized views, while outlining its cloud‑native lakehouse roadmap.

Compute-Storage SeparationLakehouseMPP database

0 likes · 22 min read

How StarRocks Redefines Lakehouse Architecture with Compute‑Storage Separation

Alibaba Cloud Big Data AI Platform

Nov 23, 2023 · Big Data

Why Apache Paimon Is Revolutionizing Streaming Lakehouse Architecture with Flink

The article traces the shift from traditional Hive‑based warehouses to modern lakehouse architectures, explains the advantages of lake formats, introduces Apache Paimon as a streaming‑first data lake integrated with Flink, presents performance benchmarks showing its superiority over Hudi, and demonstrates a real‑time streaming lakehouse workflow.

Apache PaimonBig DataFlink

0 likes · 15 min read

Why Apache Paimon Is Revolutionizing Streaming Lakehouse Architecture with Flink

Big Data Technology Architecture

Nov 14, 2023 · Big Data

Open Source Big Data Platform 3.0: Streaming Lakehouse, Serverless Architecture, and AI Integration

The talk outlines the evolution of Alibaba Cloud's open‑source big data platform from Hadoop‑based EMR to a 3.0 architecture featuring a streaming lakehouse, full serverless compute and storage, AI‑driven operations, and upcoming vector search services, highlighting technical motivations, challenges, and product releases.

Big DataLakehouseServerless

0 likes · 14 min read

Open Source Big Data Platform 3.0: Streaming Lakehouse, Serverless Architecture, and AI Integration

dbaplus Community

Nov 8, 2023 · Big Data

Choosing Between Data Warehouse, Data Lake, and Lakehouse: When to Use Each

This article compares traditional data warehouses, modern data lakes, and emerging lakehouse architectures, explaining their design patterns, advantages, disadvantages, and suitable use cases, while detailing implementation considerations such as schema design, ETL/ELT processes, file formats like Delta, Iceberg, and Hudi, and factors influencing platform selection.

Apache SparkData LakeData Warehouse

0 likes · 20 min read

Choosing Between Data Warehouse, Data Lake, and Lakehouse: When to Use Each

StarRocks

Oct 31, 2023 · Databases

How Ctrip Accelerated Report Queries 10× with StarRocks: A Real‑World Lakehouse Migration

Ctrip migrated its Artnova reporting platform from Hive‑based queries to StarRocks, first loading data into OLAP tables and then using StarRocks as a lakehouse with Hive catalog, Data Cache and materialized views, achieving average query latency reductions from 20 seconds to 1.5 seconds, over 7× speed‑up versus Trino and up to 40× acceleration for complex workloads.

Big DataData CacheLakehouse

0 likes · 15 min read

How Ctrip Accelerated Report Queries 10× with StarRocks: A Real‑World Lakehouse Migration

DataFunSummit

Oct 16, 2023 · Big Data

Bilibili's Iceberg‑Based Lakehouse Platform: Technical Practices for Sub‑Second Query Response

This article details Bilibili's implementation of an Iceberg‑based lakehouse platform that unifies storage and analytics, addressing Hive’s performance and latency issues through multidimensional sorting, various file‑level indexes, cube pre‑aggregation, star‑tree structures, and an automated Magnus service for intelligent optimization, achieving near‑second query responses.

Big DataIcebergLakehouse

0 likes · 14 min read

Bilibili's Iceberg‑Based Lakehouse Platform: Technical Practices for Sub‑Second Query Response

DataFunTalk

Oct 12, 2023 · Big Data

FastData Real‑Time Intelligent Lakehouse Platform: Data Fabric Technology Practice

This article introduces the concept of Data Fabric, explains how Dipu Technology built the FastData real‑time intelligent lakehouse platform on top of it, describes its architecture, core advantages, practical use cases in energy and retail, and outlines the platform’s future roadmap.

AnalyticsBig DataData Fabric

0 likes · 19 min read

FastData Real‑Time Intelligent Lakehouse Platform: Data Fabric Technology Practice

Data Thinking Notes

Oct 11, 2023 · Big Data

How Taikang Life Built a Scalable Lakehouse with Apache Hudi for Big Health Data

This article details Taikang Life's end‑to‑end design and implementation of a lakehouse‑style distributed data platform built on Apache Hudi, covering background, technical selection, architecture, custom Hudi extensions for the health insurance domain, performance benchmarks, real‑world results, and future work.

Apache HudiFlinkHealthcare

0 likes · 45 min read

How Taikang Life Built a Scalable Lakehouse with Apache Hudi for Big Health Data

Sohu Tech Products

Oct 11, 2023 · Industry Insights

How StarRocks Materialized Views Power Real‑Time Lakehouse Analytics

The article provides a deep technical overview of StarRocks 3.0’s data‑lake analysis capabilities, its unified Lakehouse architecture, Catalog integration, Trino compatibility, extensive I/O optimizations, materialized view features, resource isolation techniques, real‑world use cases, and future development directions.

AnalyticsData LakeLakehouse

0 likes · 22 min read

How StarRocks Materialized Views Power Real‑Time Lakehouse Analytics

DataFunTalk

Oct 5, 2023 · Big Data

Building a Unified Streaming‑Batch Lakehouse with Amoro Mixed Iceberg

This article describes how Shanghai Steel Union leveraged Amoro Mixed Iceberg on top of Apache Iceberg to create a unified streaming‑batch lakehouse, addressing small‑file and upsert challenges, simplifying architecture, improving data freshness, and providing a scalable solution for real‑time and batch analytics.

AmoroApache IcebergBig Data

0 likes · 13 min read

Building a Unified Streaming‑Batch Lakehouse with Amoro Mixed Iceberg

DataFunSummit

Sep 25, 2023 · Big Data

Trino in Bilibili Lakehouse: Compute Engine, Stability, and Containerization Practices

This article presents Bilibili's practical implementation of Trino within a lakehouse architecture, focusing on the compute engine placement, stability enhancements, and containerized deployment, while detailing indexing strategies, pre‑computation techniques, Iceberg metadata optimizations, and performance gains for large‑scale analytical queries.

IcebergIndexingLakehouse

0 likes · 14 min read

Trino in Bilibili Lakehouse: Compute Engine, Stability, and Containerization Practices

Big Data Technology & Architecture

Sep 18, 2023 · Big Data

Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse

The article explains the mainstream Lambda data‑warehouse architecture, its benefits and challenges, then introduces Hudi as a lake‑house solution that unifies real‑time and offline storage, describes the multi‑layer service design, and showcases three practical scenarios—stream processing, real‑time multidimensional analysis, and stream‑batch data reuse—demonstrating how the integrated architecture improves latency, cost, and operational complexity.

Batch ProcessingData WarehouseHudi

0 likes · 13 min read

Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse

DataFunTalk

Sep 16, 2023 · Big Data

StarRocks Data Lake Analysis, Materialized Views, and Lakehouse Architecture

This article explains how StarRocks 3.0 extends real‑time data‑warehouse capabilities to support data‑lake analysis, external catalog integration, Trino compatibility, extensive I/O optimizations, and powerful materialized‑view features that together enable a unified, cloud‑native Lakehouse solution with high performance and flexible resource isolation.

Big DataData LakeLakehouse

0 likes · 20 min read

StarRocks Data Lake Analysis, Materialized Views, and Lakehouse Architecture

DataFunSummit

Sep 8, 2023 · Big Data

Tianqiong OLAP Real‑time Lakehouse Fusion Platform Architecture Practice

This article explains why lake‑warehouse fusion is needed, describes the challenges of integrating real‑time data warehouses with data lakes, introduces a new StarRocks‑based architecture that supports real‑time ingestion, cooling, offline loading, and adaptive hot‑cold query rewriting, and outlines future plans and Q&A.

Big DataData IntegrationData Warehouse

0 likes · 21 min read

Tianqiong OLAP Real‑time Lakehouse Fusion Platform Architecture Practice

StarRocks

Sep 6, 2023 · Big Data

How Paimon + StarRocks Revolutionize Lakehouse Analytics

This article reviews traditional Lambda and Kappa data‑warehouse architectures, then details four Paimon‑StarRocks lakehouse solutions—including a data‑lake center, accelerated query with materialized views, hot‑cold data separation, and the JNI connector—while also outlining StarRocks’ future roadmap for lakehouse analytics.

Big DataLakehousePaimon

0 likes · 11 min read

How Paimon + StarRocks Revolutionize Lakehouse Analytics

DataFunTalk

Sep 4, 2023 · Big Data

Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment

This article presents a comprehensive overview of a batch‑stream unified storage solution built on Hudi and the Lakehouse Analysis Service (LAS), covering background challenges, architectural design, data organization, read/write mechanisms, BTS architecture, real‑world deployment scenarios, and future development plans.

Batch-StreamData WarehouseHudi

0 likes · 22 min read

Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment

Data Thinking Notes

Aug 27, 2023 · Big Data

How ByteDance’s LAS Team Unified Real‑Time and Offline Warehousing with a Lakehouse Solution

This article analyzes the shortcomings of mainstream Lambda‑style data warehouse architectures, introduces Hudi‑based lakehouse design principles, details the three‑layer unified storage architecture, data distribution, model and read/write mechanisms, and showcases real‑time streaming, multidimensional analysis, and stream‑batch reuse scenarios along with future roadmap plans.

HudiLakehouseStreaming

0 likes · 14 min read

How ByteDance’s LAS Team Unified Real‑Time and Offline Warehousing with a Lakehouse Solution

Tencent Cloud Developer

Aug 23, 2023 · Big Data

WeChat Experiment Platform: Architecture Design and Iceberg Lakehouse Optimization

The WeChat Experiment Platform migrated its 60,000 metric, 200,000 core, 30 PB plus data pipeline to an Iceberg based lakehouse, leveraging three layer metadata, fine grained partitioning, MERGE into writes, time travel snapshots and skew handling UDFs, which cut core time by 69%, saved ~100 PB storage, and reduced latency by up to 70%.

Big DataData WarehouseIceberg

0 likes · 18 min read

WeChat Experiment Platform: Architecture Design and Iceberg Lakehouse Optimization

ITPUB

Aug 23, 2023 · Cloud Native

Build a Cloud‑Native Lakehouse on AWS with Apache Iceberg and Amoro

This guide explains the cloud‑native lakehouse concept, outlines its advantages and challenges, compares lake‑table projects such as Iceberg, and provides a step‑by‑step AWS deployment of Apache Iceberg and Amoro—including environment setup, AMS installation, catalog configuration, optimizer launch, data ingestion with Flink, and query verification with Spark.

AWSAmoroApache Iceberg

0 likes · 33 min read

Build a Cloud‑Native Lakehouse on AWS with Apache Iceberg and Amoro

Big Data Technology & Architecture

Aug 21, 2023 · Big Data

Key Features and Benefits of Lakehouse Frameworks Hudi, Iceberg, and Paimon

This note outlines how Hudi, Iceberg, and Paimon provide unified batch‑stream storage, UPSERT support, time‑travel capabilities, and lower development costs, enabling a streaming‑warehouse architecture that offers near‑real‑time latency, consistent semantics, persisted intermediate results, and easier historical data repair.

Batch ProcessingHudiIceberg

0 likes · 5 min read

Key Features and Benefits of Lakehouse Frameworks Hudi, Iceberg, and Paimon

AsiaInfo Technology: New Tech Exploration

Aug 18, 2023 · Big Data

How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake

This article analyzes the rise of lake‑house architecture in the Hadoop ecosystem, compares the technical capabilities of Hudi, Iceberg and Delta Lake, details implementation enhancements such as MOR and multi‑writer support, showcases Flink integration, presents a real‑time marketing use case, and outlines future development directions.

Big DataData GovernanceDelta Lake

0 likes · 14 min read

How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake

Big Data Technology & Architecture

Aug 14, 2023 · Databases

Key New Features of Doris 2.0 and Their Impact on Data Development

The article reviews Doris 2.0's major enhancements—including point‑query concurrency, log‑analysis capabilities, cold‑hot data separation, lakehouse integration, and various performance and usability upgrades—explaining how these changes benefit OLAP workloads and simplify data‑engineering pipelines.

Big DataDorisLakehouse

0 likes · 7 min read

Key New Features of Doris 2.0 and Their Impact on Data Development

StarRocks

Aug 9, 2023 · Databases

StarRocks 3.1 Highlights: Faster Lakehouse Analytics and Advanced Materialized Views

StarRocks 3.1 introduces a cloud‑native, lakehouse‑oriented architecture with enhanced storage‑compute separation, up to 3‑6× faster data‑lake queries than Trino/Presto, expanded Iceberg and Paimon support, richer materialized view capabilities, new random bucketing, expression partitioning, generated columns, and spill‑to‑disk stability, all backed by extensive performance optimizations and open‑source contributions.

Data LakeLakehouseMaterialized Views

0 likes · 17 min read

StarRocks 3.1 Highlights: Faster Lakehouse Analytics and Advanced Materialized Views

ByteDance Data Platform

Aug 9, 2023 · Big Data

Why Traditional Data Warehouses Fail and How a Real‑Time Lakehouse Solves the Pain

This article analyzes the shortcomings of mainstream data‑warehouse and data‑lake architectures, explains the design of ByteDance's real‑time/offline unified lakehouse solution, and demonstrates its practical applications and future roadmap across streaming, multi‑dimensional analysis, and batch‑stream reuse scenarios.

HudiLASLakehouse

0 likes · 14 min read

Why Traditional Data Warehouses Fail and How a Real‑Time Lakehouse Solves the Pain

DataFunTalk

Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data

0 likes · 18 min read

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

DataFunTalk

Jul 10, 2023 · Big Data

Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture

This article presents a comprehensive overview of Lakehouse‑based in‑lake warehousing, covering common data‑lake misconceptions, the evolution from databases to data warehouses and lakes, the advantages of Lakehouse over traditional architectures, a reference multi‑layer architecture, typical use cases, challenges, future plans, and a brief Q&A.

Big Data ArchitectureData LakeData Warehouse

0 likes · 20 min read

Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture

Alibaba Cloud Big Data AI Platform

Jun 27, 2023 · Big Data

How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing

This article details Alibaba Cloud MaxCompute’s lakehouse evolution, describing its unified storage‑metadata‑compute design, the Transactional Table 2.0 format, near‑real‑time incremental ingestion, clustering and compaction services, transaction handling, TimeTravel and incremental queries, and future roadmap for big‑data workloads.

Big DataData WarehouseIncremental Processing

0 likes · 23 min read

How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing

DataFunTalk

Jun 24, 2023 · Big Data

Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing

This article explains the evolution of Alibaba Cloud's MaxCompute platform into a lakehouse architecture that supports near‑real‑time incremental processing, detailing its development history, core design of transactional tables, five‑module technical stack, data ingestion methods, optimization services, transaction management, query capabilities, ecosystem integration, practical applications, future roadmap, and common user questions.

Big DataData LakeIncremental Processing

0 likes · 24 min read

Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing

Bilibili Tech

Jun 20, 2023 · Big Data

Design and Evolution of Bilibili's Billions 3.0 Log Platform: A Lakehouse Architecture with ClickHouse, Iceberg, and Trino

Bilibili evolved its log platform from ClickHouse‑based Billions 2.0 to Billions 3.0 lakehouse using Iceberg, HDFS, Trino, retaining ClickHouse for acceleration; this reduces storage cost by over 20%, improves observability, solves the compute‑storage mismatch, adds flexible indexing, and supports complex ETL while staying open‑source.

ClickHouseIcebergIndexing

0 likes · 36 min read

Design and Evolution of Bilibili's Billions 3.0 Log Platform: A Lakehouse Architecture with ClickHouse, Iceberg, and Trino

DataFunTalk

Jun 18, 2023 · Big Data

Evolution and Comparison of High‑Performance Cloud‑Native Lakehouse Storage Architecture: From HDFS to JuiceFS

This article examines the evolution of big‑data storage from on‑premise HDFS to cloud‑native object storage, compares their architectures and performance, outlines future lakehouse storage requirements, and demonstrates a practical implementation using the JuiceFS distributed file system.

Big DataCloud NativeHDFS

0 likes · 15 min read

Evolution and Comparison of High‑Performance Cloud‑Native Lakehouse Storage Architecture: From HDFS to JuiceFS

DataFunSummit

Jun 13, 2023 · Big Data

Building a Sub‑Second Response Lakehouse Platform with Apache Iceberg at Bilibili

This article details Bilibili's implementation of a sub‑second response lakehouse platform using Apache Iceberg, covering background challenges, query acceleration techniques such as multi‑dimensional sorting, indexing, cube pre‑aggregation, and intelligent automated optimizations via the Magnus service, and reports current production metrics.

CubeIcebergLakehouse

0 likes · 14 min read

Building a Sub‑Second Response Lakehouse Platform with Apache Iceberg at Bilibili

DataFunTalk

May 23, 2023 · Big Data

Building a Millisecond‑Response Lakehouse Platform with Apache Iceberg: Architecture, Query Acceleration, and Intelligent Optimization

This article details Bilibili's technical practice of constructing a millisecond‑response lake‑warehouse platform using Apache Iceberg, covering the background challenges, unified architecture, multi‑dimensional sorting and indexing for query acceleration, the Magnus service for intelligent optimization, and the current production deployment and performance metrics.

Big DataCubeIceberg

0 likes · 14 min read

Building a Millisecond‑Response Lakehouse Platform with Apache Iceberg: Architecture, Query Acceleration, and Intelligent Optimization

DataFunTalk

May 22, 2023 · Big Data

Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices

This article explains Alibaba Cloud's data lake architecture, unified metadata services, storage management optimizations, and format handling techniques, illustrating how lakehouse concepts, multi‑engine support, and lifecycle policies enable efficient, secure, and cost‑effective big data processing in the cloud.

Big DataCloud ServicesData Lake

0 likes · 22 min read

Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices

DataFunSummit

May 16, 2023 · Big Data

LakeSoul Joins LF AI & Data Foundation as an Open‑Source Cloud‑Native Lakehouse Framework

LakeSoul, China's only open‑source lakehouse project, has been donated to the LF AI & Data Foundation, becoming its first lake‑warehouse framework and offering ACID‑guaranteed high‑concurrency upserts, a high‑performance Rust‑based I/O layer, real‑time data‑warehouse capabilities, and seamless AI/BI integration for modern big‑data applications.

AIData WarehouseLakeSoul

0 likes · 7 min read

LakeSoul Joins LF AI & Data Foundation as an Open‑Source Cloud‑Native Lakehouse Framework

DataFunSummit

Apr 30, 2023 · Big Data

Arctic: Efficient Management of Apache Iceberg Lakehouse Tables – Concepts, Practices, and Roadmap

This article introduces the Arctic lakehouse management system built on Apache Iceberg, explains Iceberg’s core principles, format versions, and real‑world implementations at NetEase, and details Arctic’s automated table optimization, governance workflows, and future development plans.

Apache IcebergArcticData Governance

0 likes · 22 min read

Arctic: Efficient Management of Apache Iceberg Lakehouse Tables – Concepts, Practices, and Roadmap

Tongcheng Travel Technology Center

Apr 20, 2023 · Big Data

Apache Paimon in Practice: Replacing Hudi for Improved Write and Query Performance

Apache Paimon was adopted at Tongcheng Travel to replace Hudi, achieving three‑fold write speed gains and ten‑fold query acceleration, with detailed discussion of lakehouse challenges, performance issues, migration steps, configuration examples, and future plans for the platform.

Apache PaimonBig DataFlink

0 likes · 15 min read

Apache Paimon in Practice: Replacing Hudi for Improved Write and Query Performance

DataFunTalk

Apr 13, 2023 · Big Data

Four Paradigms of StarRocks Lakehouse Integration and an Overview of StarRocks 3.0

This article explains why lake‑warehouse integration is needed, outlines its challenges, describes StarRocks' four integration paradigms—including query acceleration, layered modeling, real‑time warehouse‑lake fusion, and the cloud‑native 3.0 solution—and previews the upcoming StarRocks 3.0 release.

Big DataCloud NativeData Lake

0 likes · 18 min read

StarRocks

Apr 7, 2023 · Databases

StarRocks 3.0 Highlights: Storage‑Compute Separation, New RBAC, and Lakehouse Features

StarRocks 3.0 introduces a storage‑compute separation architecture, a full‑featured RBAC permission framework, enhanced materialized views, Trino‑compatible query dialect, richer Primary‑Key update/delete syntax, automatic partition creation, and numerous performance optimizations, marking a major step from OLAP to lakehouse analytics.

LakehouseRBACStarRocks

0 likes · 10 min read

StarRocks 3.0 Highlights: Storage‑Compute Separation, New RBAC, and Lakehouse Features

StarRing Big Data Open Lab

Mar 22, 2023 · Big Data

Why Lakehouse Architecture Is Revolutionizing Data Analytics: Hudi vs Iceberg

This article explains how the lakehouse integrated architecture combines data lake and data warehouse capabilities, outlines its key features, compares three implementation paths, and provides an in‑depth technical overview of Apache Hudi and Apache Iceberg for modern big‑data analytics.

Apache HudiApache IcebergData Lake

0 likes · 15 min read

Why Lakehouse Architecture Is Revolutionizing Data Analytics: Hudi vs Iceberg

DataFunTalk

Mar 21, 2023 · Databases

Design and Technical Details of Apache Doris for Lakehouse Architecture

This article explains how Apache Doris extends its real‑time OLAP capabilities to support Lakehouse architectures, covering unified metadata, query acceleration, elastic compute, performance benchmarks, and future roadmap for richer data‑source integration and resource isolation.

Apache DorisBig DataData Warehouse

0 likes · 20 min read

Design and Technical Details of Apache Doris for Lakehouse Architecture

Open Source Linux

Mar 14, 2023 · Big Data

Can Data Lakes and Data Warehouses Coexist? Exploring the Lake‑Warehouse Fusion

This article traces 20 years of big‑data evolution, compares data lakes and data warehouses, defines both concepts, examines their technical trade‑offs, and presents Alibaba Cloud’s lake‑warehouse (lakehouse) solution that unifies flexible storage with enterprise‑grade performance and governance.

Big DataCloud ComputingData Lake

0 likes · 32 min read

Can Data Lakes and Data Warehouses Coexist? Exploring the Lake‑Warehouse Fusion

DataFunSummit

Mar 10, 2023 · Big Data

Interview on Data Lake and Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase’s data‑lake technology manager explores the distinction between data lakes and lakehouses, the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, their maturity across key capabilities, and the practical adoption challenges faced by enterprises.

Data LakeDelta LakeHudi

0 likes · 14 min read

Interview on Data Lake and Lakehouse: Current Applications, Challenges, and Evolution

ITPUB

Feb 22, 2023 · Databases

How Huawei’s GaussDB(DWS) 3.0 Redefines Cloud‑Native Data Warehousing

This article summarizes Wang Chuanting’s DTCC2022 talk on Huawei Cloud GaussDB(DWS) 3.0, detailing its cloud‑native architecture, layered elasticity, lake‑warehouse integration, performance acceleration techniques, and how it seamlessly couples data‑processing pipelines with AI workloads for modern, real‑time analytics.

AI integrationCloud NativeData Warehouse

0 likes · 16 min read

How Huawei’s GaussDB(DWS) 3.0 Redefines Cloud‑Native Data Warehousing

Huawei Cloud Developer Alliance

Feb 20, 2023 · Databases

How Modern Data Warehouses Evolve: Insights from Huawei’s GaussDB(DWS) Chief Architect

In this interview, Huawei Cloud’s chief architect Zeng Kai explains how data warehouses originated, outlines the five key fusion trends shaping their evolution, and reveals the innovative features of GaussDB(DWS) 3.0 that drive cloud‑native, real‑time, and AI‑integrated analytics.

AI integrationData WarehouseGaussDB

0 likes · 8 min read

How Modern Data Warehouses Evolve: Insights from Huawei’s GaussDB(DWS) Chief Architect

DataFunTalk

Jan 28, 2023 · Big Data

Data Lake vs Data Warehouse: Differences, Evolution, and Integrated Lakehouse Design

This article explores the ongoing debate between data lakes and data warehouses, clarifies their distinct purposes and technologies, discusses how they can coexist or complement each other, and introduces the concept of an integrated lakehouse architecture while promoting a comprehensive data intelligence knowledge map.

Big DataData LakeData Warehouse

0 likes · 5 min read

Data Lake vs Data Warehouse: Differences, Evolution, and Integrated Lakehouse Design

ITPUB

Jan 26, 2023 · Big Data

How NetEase’s Arctic Unifies Streaming and Batch with Iceberg for Real‑Time Lakehouse

This article explains the challenges of a Lambda‑architecture data pipeline, introduces NetEase’s Arctic lakehouse built on Apache Iceberg, details its table‑store design, optimization cycles, consistency mechanisms, real‑time features, practical use cases, and future roadmap, highlighting its advantages over similar solutions.

ArcticData IntegrationFlink

0 likes · 14 min read

How NetEase’s Arctic Unifies Streaming and Batch with Iceberg for Real‑Time Lakehouse

Tencent Cloud Developer

Jan 3, 2023 · Big Data

How Tencent’s Cloud‑Native Lakehouse Tackles PB‑Scale Performance Challenges

This article analyzes Tencent Cloud’s DLC lakehouse solution, explaining the unified data lake‑warehouse architecture, the performance hurdles of object‑storage‑based analytics, and the multi‑dimensional caching, virtual‑cluster elasticity, and advanced filter techniques that enable second‑level analysis on petabyte‑scale data while reducing costs.

Big DataCachingDLC

0 likes · 13 min read

How Tencent’s Cloud‑Native Lakehouse Tackles PB‑Scale Performance Challenges

DataFunSummit

Dec 29, 2022 · Big Data

Understanding Lakehouse Systems: Architecture, Practices, and Innovations by Databricks

This article explains the Lakehouse concept, why it is needed, the limitations of traditional data warehouses and data lakes, and how Databricks’ unified architecture—through open storage formats, fine‑grained governance, and optimized query engines—delivers high‑quality, low‑latency data for BI, analytics, and machine learning workloads.

CloudData EngineeringDatabricks

0 likes · 21 min read

Understanding Lakehouse Systems: Architecture, Practices, and Innovations by Databricks

DataFunTalk

Dec 23, 2022 · Big Data

Building a Lakehouse on Alibaba Cloud AnalyticDB (ADB) with Apache Hudi: Architecture, Challenges, and Practices

This article presents a comprehensive technical overview of Alibaba Cloud AnalyticDB's Lakehouse edition, detailing its unified architecture, key advantages, the challenges of ingesting billions of records with Apache Hudi, and the engineering solutions—including Flink integration, hotspot mitigation, memory optimization, OSS throttling handling, concurrent write support, lifecycle management, and TableService—that enable a cost‑effective, high‑performance lake‑to‑warehouse platform.

Apache HudiFlinkLakehouse

0 likes · 19 min read

Building a Lakehouse on Alibaba Cloud AnalyticDB (ADB) with Apache Hudi: Architecture, Challenges, and Practices

DataFunTalk

Dec 8, 2022 · Big Data

Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg

This article introduces NetEase’s Arctic, a real‑time lakehouse system built on Apache Iceberg that unifies streaming and batch processing, explains the challenges of Lambda architecture, details Arctic’s features such as change/base stores, hidden queue, transaction handling, and shares internal practice cases and future roadmap.

Apache IcebergArcticData Lake

0 likes · 12 min read

Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg

Architects' Tech Alliance

Dec 7, 2022 · Big Data

Why Data Lakes and Data Warehouses Must Converge: The Rise of Lakehouse Architecture

This article traces 20 years of big‑data evolution, defines data lakes and data warehouses, compares their trade‑offs, and explains how lakehouse solutions—exemplified by Alibaba Cloud MaxCompute—merge flexibility with enterprise‑grade performance to lower total ownership cost.

Big Data ArchitectureCloud Data PlatformData Lake

0 likes · 32 min read

Why Data Lakes and Data Warehouses Must Converge: The Rise of Lakehouse Architecture

StarRocks

Nov 4, 2022 · Big Data

Building a High‑Performance, Cost‑Effective Cloud Lakehouse with StarRocks and EMR

This article explains how to design and implement a cloud‑native Lakehouse using StarRocks and Tencent Cloud EMR, covering core technical requirements, a five‑layer architecture, data ingestion with Iceberg/Hudi, performance tricks like Z‑order clustering, cost‑control through elastic scaling, and the key product features of EMR StarRocks.

Big DataCloud ComputingEMR

0 likes · 24 min read

Building a High‑Performance, Cost‑Effective Cloud Lakehouse with StarRocks and EMR

Bilibili Tech

Sep 30, 2022 · Big Data

Bilibili's Efficient Lakehouse Platform Built on Trino and Iceberg

Bilibili’s new lake‑house platform, built on Trino and Iceberg, replaces Hive‑based pipelines by ingesting logs and DB data into Iceberg tables, applying advanced sorting, Z‑order/Hilbert clustering, bitmap and bloom indexes, virtual join columns and pre‑aggregation, enabling 70 000 daily queries on 2 PB with average scans of 2 GB and sub‑2‑second response times.

Big DataData SkippingIceberg

0 likes · 15 min read

Bilibili's Efficient Lakehouse Platform Built on Trino and Iceberg

Alibaba Cloud Big Data AI Platform

Sep 13, 2022 · Big Data

From Hadoop to Cloud‑Native: The Evolution of Data Lakes and Modern Architecture

This article traces the history of data lakes from their 2010 inception with Hadoop through cloud‑native object storage, lakehouse formats like Delta Lake, and Alibaba Cloud's multi‑layer solution, outlining key architectural stages and practical construction challenges for enterprise‑grade implementations.

Alibaba CloudBig DataCloud Native

0 likes · 9 min read

From Hadoop to Cloud‑Native: The Evolution of Data Lakes and Modern Architecture

Tencent Cloud Developer

Sep 9, 2022 · Big Data

Data Lake, Data Warehouse, and Lakehouse: Concepts, Architectures, and Industry Practices

The article explains how data lakes excel at ingesting massive, varied data, data warehouses optimize storage and query performance, and lake‑house architectures combine both strengths—offering scalable, low‑cost storage with high‑speed analytics—highlighting industry solutions from Snowflake, Databricks, and major cloud providers.

AnalyticsBig DataData Lake

0 likes · 8 min read

Data Lake, Data Warehouse, and Lakehouse: Concepts, Architectures, and Industry Practices

Shopee Tech Team

Sep 2, 2022 · Big Data

Shopee Data System Challenges and Apache Hudi Practices

Shopee tackled its data‑system bottlenecks by customizing Apache Hudi to provide unified stream‑batch integration, efficient state‑detail snapshots, and low‑latency wide‑table generation, using CDC‑based bootstrapping, COW/MOR tables, savepoints and partial updates, which cut latency to ten minutes, lowered resource use, and yielded several community‑backed enhancements.

Apache HudiBig DataData Integration

0 likes · 18 min read

Shopee Data System Challenges and Apache Hudi Practices

Past Memory Big Data

Aug 11, 2022 · Big Data

What Kind of Data Lake Do Enterprises Really Need? Lessons from Delta 2.0

The article examines the open‑source release of Delta 2.0, compares its features and benchmark results with Iceberg and Hudi, discusses the core capabilities required by enterprises for a lakehouse architecture, and introduces the Arctic project as a multi‑engine streaming lake service.

ArcticData LakeDelta Lake

0 likes · 25 min read

What Kind of Data Lake Do Enterprises Really Need? Lessons from Delta 2.0

DataFunTalk

Aug 10, 2022 · Big Data

Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service

The article reviews recent developments in data‑lake table formats—Delta Lake 2.0, Iceberg, and Hudi—examining their features, benchmark results, and ecosystem impact, and then introduces Arctic, an open‑source streaming lakehouse service built on Iceberg that aims to bridge batch‑stream gaps for enterprises.

Data LakeDelta LakeHudi

0 likes · 24 min read

Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service

DataFunTalk

Aug 5, 2022 · Big Data

Delta Lake Principles, eBay Migration, and Practical Enhancements

This talk by eBay software engineer Zhu Feng explains the fundamentals of Delta Lake and Lakehouse architecture, outlines eBay’s migration from Teradata to a Spark‑based platform, and details the custom enhancements, performance optimizations, and operational improvements implemented to support large‑scale update and delete workloads.

Data LakeDelta LakeLakehouse

0 likes · 16 min read

Delta Lake Principles, eBay Migration, and Practical Enhancements

DataFunTalk

Aug 1, 2022 · Big Data

Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices

This article details Bilibili's lakehouse implementation using Apache Iceberg and Alluxio, covering background challenges, architectural components, data organization techniques like Z‑order and bitmap indexes, performance benchmarks, and future optimization plans for large‑scale analytics.

AlluxioBitmap IndexIceberg

0 likes · 21 min read

Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices

Past Memory Big Data

Jul 22, 2022 · Big Data

Choosing Modern Data Architecture: Data Fabric vs. Data Mesh

The article compares Data Fabric and Data Mesh as modern data‑architecture approaches, explains their technical and organizational differences, discusses the ongoing debate between data lakes, warehouses, and lakehouses, and highlights how each option fits varying data‑type and usage scenarios.

Data ArchitectureData FabricData Lake

0 likes · 4 min read

Choosing Modern Data Architecture: Data Fabric vs. Data Mesh

DataFunTalk

Jul 15, 2022 · Big Data

Lakehouse Architecture at Bilibili: Query Acceleration and Index Enhancement Practices

This article explains Bilibili's lake‑warehouse integrated architecture, describing how Iceberg, MagnuS, Trino, and Alluxio are used to achieve flexible data storage, high‑performance query acceleration, and automated indexing through Z‑Order, Hilbert curve, Bloom filter, and advanced BitMap techniques.

Big DataData WarehouseIceberg

0 likes · 18 min read

Lakehouse Architecture at Bilibili: Query Acceleration and Index Enhancement Practices

High Availability Architecture

Jul 7, 2022 · Big Data

Interview with Tencent Cloud’s Zhang Zhigang on Lakehouse Architecture and Cloud‑Native Integration

In this interview, Tencent Cloud expert Zhang Zhigang explains the fundamentals and key technologies of lakehouse architecture, discusses how cloud‑native practices enhance its performance and operability, and offers practical advice for big‑data professionals ahead of the 2022 GIAC Global Internet Architecture Conference in Shenzhen.

Cloud NativeData ArchitectureLakehouse

0 likes · 10 min read

Interview with Tencent Cloud’s Zhang Zhigang on Lakehouse Architecture and Cloud‑Native Integration

Baidu Geek Talk

Jul 1, 2022 · Big Data

Evolution of Data Platform Technology: From Data Warehouse to Lakehouse Architecture

The article traces the evolution of data platforms from early data warehouses—using schema‑on‑write, columnar storage, and MPP engines—to data lakes that retain raw data with schema‑on‑read, and finally to lakehouse architectures that merge storage and compute, offering unified metadata, versioning, and support for BI, big‑data, AI, and HPC workloads.

Data ArchitectureLakehouseOLAP

0 likes · 25 min read

Evolution of Data Platform Technology: From Data Warehouse to Lakehouse Architecture

Alibaba Cloud Developer

Jun 17, 2022 · Big Data

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

This article introduces Delta Lake as an open‑source lakehouse storage framework, explains its core features, file and metadata structures, details Alibaba Cloud EMR's enhancements and deep integration with DLF, and presents G‑SCD and CDC solutions for real‑time incremental data warehousing.

CDCDLFDelta Lake

0 likes · 11 min read

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

DataFunSummit

May 30, 2022 · Big Data

Lakehouse Architecture at Bilibili: Query Acceleration and Index Enhancement Practices

This article explains Bilibili's lake‑warehouse integrated architecture, describing how Iceberg, Z‑Order sorting, and advanced indexing techniques such as BloomFilter and BitMap are used to accelerate queries and improve data organization in large‑scale analytics workloads.

Big DataData LakeData Warehouse

0 likes · 18 min read

dbaplus Community

May 21, 2022 · Big Data

5 Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time Pipelines, Cloud Market

The article outlines five major 2022 data trends— the rise of analytics engineers, the intensifying lake‑house competition, the growth of real‑time streaming pipelines and operational analytics, the expanding cloud marketplaces for data tools, and the push toward unified data‑quality terminology—explaining their origins, market impact, and future outlook.

Data EngineeringData QualityLakehouse

0 likes · 21 min read

5 Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time Pipelines, Cloud Market

DataFunTalk

May 17, 2022 · Big Data

Exploring JuiceFS in Data Lake Storage Architecture

This presentation provides a comprehensive overview of JuiceFS, an open‑source cloud‑native distributed file system, detailing its role in modern data lake and lakehouse architectures, comparing it with HDFS and object storage, and highlighting its performance, integration, and community ecosystem.

Big DataData LakeDistributed File System

0 likes · 19 min read

Exploring JuiceFS in Data Lake Storage Architecture

HomeTech

Apr 27, 2022 · Big Data

AutoStream Real‑Time Computing Platform: Architecture, Resource Management, Scaling, Lakehouse Integration, and PyFlink Practices

This article details Car Home's AutoStream platform evolution from Storm to Flink‑based versions, covering real‑time application scenarios, strict budget‑controlled resource management, automatic scaling, lake‑house architecture with Iceberg, PyFlink integration, and future plans for resource optimisation and batch‑stream unification.

AutoStreamFlinkLakehouse

0 likes · 15 min read

AutoStream Real‑Time Computing Platform: Architecture, Resource Management, Scaling, Lakehouse Integration, and PyFlink Practices

Alibaba Cloud Developer

Apr 18, 2022 · Big Data

What Is Delta Lake? A Deep Dive into the Lakehouse Evolution and Features

This article explains the evolution from traditional data warehouses to data lakes and the modern Lakehouse architecture, introduces Delta Lake's core concepts, multi‑hop medallion tables, ACID transactions, generated columns, standalone support, and future open‑source directions.

Data ManagementDelta LakeGenerated Columns

0 likes · 13 min read

What Is Delta Lake? A Deep Dive into the Lakehouse Evolution and Features

DataFunTalk

Apr 7, 2022 · Big Data

Apache Kyuubi: Architecture, Use Cases, Community, and Mobile Cloud Deployment

This article introduces Apache Kyuubi—a multi‑tenant Thrift JDBC/ODBC service built on Spark—detailing its architecture, advantages over Spark Thrift Server, real‑world use cases, open‑source community progress, and practical deployment strategies on mobile cloud, Kubernetes, and with Trino.

Apache SparkBig DataKyuubi

0 likes · 16 min read

Apache Kyuubi: Architecture, Use Cases, Community, and Mobile Cloud Deployment