Tagged articles
17 articles
Page 1 of 1
StarRocks
StarRocks
Jul 16, 2025 · Cloud Native

Build a Decoupled Storage‑Compute Data Platform with StarRocks and MinIO

This step‑by‑step tutorial shows how to deploy StarRocks and MinIO in a decoupled storage‑compute architecture using Docker Compose and Kubernetes, configure local caching, create storage volumes, load public datasets, and run SQL queries to explore the combined data.

Data LakehouseDecoupled StorageDocker Compose
0 likes · 14 min read
Build a Decoupled Storage‑Compute Data Platform with StarRocks and MinIO
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 4, 2025 · Big Data

From Real-Time Data Analytics to Real-Time AI: Flink Forward Asia 2025 Highlights

The Flink Forward Asia 2025 conference in Singapore showcased Apache Flink's evolution with new AI‑driven projects such as Flink Agents, the integration of AI Functions in Flink 2.1, the disaggregated state management architecture of Flink 2.0, and complementary lakehouse technologies like Paimon and Fluss, underscoring the platform's role as the real‑time backbone for modern AI applications.

Apache FlinkData LakehouseDisaggregated State Management
0 likes · 9 min read
From Real-Time Data Analytics to Real-Time AI: Flink Forward Asia 2025 Highlights
Baidu Tech Salon
Baidu Tech Salon
Nov 11, 2024 · Cloud Native

Baidu Cloud Native Data Platform: Empowering Enterprise AI in the LLM Era

To empower enterprise AI in the LLM era, Baidu Cloud unveils a cloud‑native data platform featuring upgraded databases—PegaDB, GaiaDB 5.0, Vector DB 2.0, Palo 2.0—and integrated services like DBSC 2.0, EDAP 2.0, and DBStack, delivering high‑performance, cost‑effective handling of structured, unstructured, and vector data for fine‑tuning and Enterprise RAG.

DBStackData LakehouseEDAP
0 likes · 10 min read
Baidu Cloud Native Data Platform: Empowering Enterprise AI in the LLM Era
StarRocks
StarRocks
Jul 24, 2024 · Big Data

Why Lakehouse Architecture Is Redefining Big Data Infrastructure in the AI Era

The article examines the rapid rise of lakehouse architecture, its market momentum, core components—including storage, metadata, table formats, and compute layers—compares Iceberg, Hudi, and Delta Lake, discusses the shift from HDFS to object storage, and outlines the strategic importance of lakehouses for AI-driven data management and future data infrastructure trends.

AIApache IcebergBig Data
0 likes · 28 min read
Why Lakehouse Architecture Is Redefining Big Data Infrastructure in the AI Era
Wukong Talks Architecture
Wukong Talks Architecture
Jul 23, 2024 · Databases

An Overview of StarRocks: Architecture, Features, and Performance Benchmarks

StarRocks, an open‑source, high‑performance MPP analytical database under the Linux Foundation, offers vectorized engines, CBO optimizer, materialized views, and storage‑compute separation, integrates with BI tools and data lakes, and demonstrates superior query speed in benchmark tests against ClickHouse, Druid, and Trino.

Analytical DatabaseData LakehouseMPP
0 likes · 10 min read
An Overview of StarRocks: Architecture, Features, and Performance Benchmarks
DataFunTalk
DataFunTalk
Feb 3, 2024 · Big Data

Alluxio: Introduction, Architecture, and Practical Experience for Big Data Construction

This article introduces Alluxio as an open‑source data orchestration layer, explains its architecture and core features such as unified namespace, caching strategies, and cloud‑native deployment, and shares practical experiences on using Alluxio to simplify data lakehouse construction, migration, and hot‑cold data separation in complex big‑data environments.

AlluxioBig DataData Lakehouse
0 likes · 13 min read
Alluxio: Introduction, Architecture, and Practical Experience for Big Data Construction
vivo Internet Technology
vivo Internet Technology
Dec 13, 2023 · Big Data

Hudi Data Lake Implementation and Optimization Practice at vivo

Vivo’s big‑data team deployed Apache Hudi to create a lakehouse that unifies streaming and batch workloads, leverages COW and MOR storage modes, automates small‑file clustering and compaction, and applies extensive version, streaming, batch, and lifecycle optimizations, delivering minute‑level latency, hundred‑million‑records‑per‑minute ingestion, and query speeds up to 20 % faster than Hive.

Apache HudiBatch ProcessingBig Data
0 likes · 11 min read
Hudi Data Lake Implementation and Optimization Practice at vivo
DataFunTalk
DataFunTalk
Oct 13, 2023 · Big Data

Design Principles, Architecture, and Applications of the Open‑Source LakeSoul Lakehouse Framework

This article provides a comprehensive technical overview of LakeSoul, an open‑source, cloud‑native lakehouse framework, covering its design philosophy, core features, architecture, performance benchmarks, real‑time ingestion, incremental computation, multi‑stream joining, security, community progress, and future roadmap.

Big DataData LakehouseFlink
0 likes · 16 min read
Design Principles, Architecture, and Applications of the Open‑Source LakeSoul Lakehouse Framework
DataFunTalk
DataFunTalk
Sep 13, 2023 · Big Data

Design and Implementation of a Lakehouse Data Platform Based on Apache Hudi at Taikang Life Insurance

This article details Taikang Life Insurance's end‑to‑end technical selection, architecture design, implementation, and custom enhancements of an Apache Hudi‑driven lakehouse platform for large‑scale health‑insurance data, covering background, component evaluation, performance benchmarking, multi‑layer architecture, and real‑world results.

Apache HudiBig DataData Governance
0 likes · 44 min read
Design and Implementation of a Lakehouse Data Platform Based on Apache Hudi at Taikang Life Insurance
DataFunSummit
DataFunSummit
Aug 4, 2023 · Big Data

LakeSoul: An Open‑Source Real‑Time Data Lakehouse Framework – Design, Architecture, Benchmarks and Future Roadmap

This article introduces LakeSoul, an open‑source end‑to‑end real‑time lakehouse framework, detailing its design philosophy, key technologies such as ELT, metadata management, upsert and merge‑on‑read capabilities, performance benchmarks, real‑world use cases, and the roadmap for future enhancements.

Big DataData LakehouseELT
0 likes · 18 min read
LakeSoul: An Open‑Source Real‑Time Data Lakehouse Framework – Design, Architecture, Benchmarks and Future Roadmap
DataFunTalk
DataFunTalk
Apr 10, 2023 · Big Data

Interview on Data Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase data‑lake technology manager Ma Jin explains the distinction between data lakes and lakehouses, reviews the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, evaluates feature maturity and performance trade‑offs, and discusses systematic versus non‑systematic adoption in enterprises.

Big DataData LakehouseDelta Lake
0 likes · 13 min read
Interview on Data Lakehouse: Current Applications, Challenges, and Evolution
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 21, 2022 · Big Data

Unlocking Enterprise Data Lakehouse: Trends, Challenges, and Volcano Engine EMR Solutions

This article explores the open‑source lakehouse trend, outlines the architectural features of Volcano Engine EMR, examines key challenges of building enterprise‑grade data lakehouses, and presents best‑practice case studies demonstrating how EMR enables scalable, real‑time analytics, storage‑compute separation, and seamless integration with modern big‑data engines.

Data LakehouseEMRStorage Compute Separation
0 likes · 22 min read
Unlocking Enterprise Data Lakehouse: Trends, Challenges, and Volcano Engine EMR Solutions
Alibaba Cloud Developer
Alibaba Cloud Developer
May 13, 2022 · Big Data

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

Delta Lake, an open‑source storage layer from Databricks, provides ACID transactions, data versioning, schema evolution, and unified batch‑stream processing, with a detailed file structure and metadata mechanism, while Alibaba Cloud EMR enhances it with advanced DML, performance optimizations, deep DLF integration, and solutions for G‑SCD and CDC.

CDCDLFData Lakehouse
0 likes · 11 min read
Unlocking Delta Lake: Key Features, Architecture, and EMR Integration
Shopee Tech Team
Shopee Tech Team
Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiBatch Processing
0 likes · 20 min read
Building Real-Time Data Warehouse with Flink + Hudi at Shopee
政采云技术
政采云技术
Feb 8, 2022 · Industry Insights

Unlocking Enterprise Value with a Data Middle Platform: Architecture & Indicators

This article traces the evolution from traditional data warehouses to modern data lakes and data middle platforms, explains why siloed data development hampers efficiency, and details the architecture and indicator‑library design used by Zhengcaiyun to achieve unified, reusable data services.

Big DataData GovernanceData Lakehouse
0 likes · 14 min read
Unlocking Enterprise Value with a Data Middle Platform: Architecture & Indicators