Tag

Data Lakehouse

0 views collected around this technical thread.

Baidu Tech Salon
Baidu Tech Salon
Nov 11, 2024 · Cloud Native

Baidu Cloud Native Data Platform: Empowering Enterprise AI in the LLM Era

To empower enterprise AI in the LLM era, Baidu Cloud unveils a cloud‑native data platform featuring upgraded databases—PegaDB, GaiaDB 5.0, Vector DB 2.0, Palo 2.0—and integrated services like DBSC 2.0, EDAP 2.0, and DBStack, delivering high‑performance, cost‑effective handling of structured, unstructured, and vector data for fine‑tuning and Enterprise RAG.

DBStackData LakehouseEDAP
0 likes · 10 min read
Baidu Cloud Native Data Platform: Empowering Enterprise AI in the LLM Era
Wukong Talks Architecture
Wukong Talks Architecture
Jul 23, 2024 · Databases

An Overview of StarRocks: Architecture, Features, and Performance Benchmarks

StarRocks, an open‑source, high‑performance MPP analytical database under the Linux Foundation, offers vectorized engines, CBO optimizer, materialized views, and storage‑compute separation, integrates with BI tools and data lakes, and demonstrates superior query speed in benchmark tests against ClickHouse, Druid, and Trino.

Analytical DatabaseData LakehouseMPP
0 likes · 10 min read
An Overview of StarRocks: Architecture, Features, and Performance Benchmarks
DataFunSummit
DataFunSummit
Mar 17, 2024 · Big Data

OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices

This article presents OPPO's smart data lakehouse solution, describing its massive EB‑scale architecture, the integration of batch and streaming engines, the Glacier service for table management, schema‑adaptive ingestion, performance optimizations, and future technical road‑maps for unified data processing.

Big DataData LakehouseFlink
0 likes · 15 min read
OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices
DataFunTalk
DataFunTalk
Feb 3, 2024 · Big Data

Alluxio: Introduction, Architecture, and Practical Experience for Big Data Construction

This article introduces Alluxio as an open‑source data orchestration layer, explains its architecture and core features such as unified namespace, caching strategies, and cloud‑native deployment, and shares practical experiences on using Alluxio to simplify data lakehouse construction, migration, and hot‑cold data separation in complex big‑data environments.

AlluxioBig DataData Lakehouse
0 likes · 13 min read
Alluxio: Introduction, Architecture, and Practical Experience for Big Data Construction
vivo Internet Technology
vivo Internet Technology
Dec 13, 2023 · Big Data

Hudi Data Lake Implementation and Optimization Practice at vivo

Vivo’s big‑data team deployed Apache Hudi to create a lakehouse that unifies streaming and batch workloads, leverages COW and MOR storage modes, automates small‑file clustering and compaction, and applies extensive version, streaming, batch, and lifecycle optimizations, delivering minute‑level latency, hundred‑million‑records‑per‑minute ingestion, and query speeds up to 20 % faster than Hive.

Apache HudiBig DataData Lakehouse
0 likes · 11 min read
Hudi Data Lake Implementation and Optimization Practice at vivo
DataFunTalk
DataFunTalk
Oct 13, 2023 · Big Data

Design Principles, Architecture, and Applications of the Open‑Source LakeSoul Lakehouse Framework

This article provides a comprehensive technical overview of LakeSoul, an open‑source, cloud‑native lakehouse framework, covering its design philosophy, core features, architecture, performance benchmarks, real‑time ingestion, incremental computation, multi‑stream joining, security, community progress, and future roadmap.

Big DataData LakehouseFlink
0 likes · 16 min read
Design Principles, Architecture, and Applications of the Open‑Source LakeSoul Lakehouse Framework
DataFunTalk
DataFunTalk
Sep 13, 2023 · Big Data

Design and Implementation of a Lakehouse Data Platform Based on Apache Hudi at Taikang Life Insurance

This article details Taikang Life Insurance's end‑to‑end technical selection, architecture design, implementation, and custom enhancements of an Apache Hudi‑driven lakehouse platform for large‑scale health‑insurance data, covering background, component evaluation, performance benchmarking, multi‑layer architecture, and real‑world results.

Apache HudiBig DataData Lakehouse
0 likes · 44 min read
Design and Implementation of a Lakehouse Data Platform Based on Apache Hudi at Taikang Life Insurance
DataFunSummit
DataFunSummit
Aug 4, 2023 · Big Data

LakeSoul: An Open‑Source Real‑Time Data Lakehouse Framework – Design, Architecture, Benchmarks and Future Roadmap

This article introduces LakeSoul, an open‑source end‑to‑end real‑time lakehouse framework, detailing its design philosophy, key technologies such as ELT, metadata management, upsert and merge‑on‑read capabilities, performance benchmarks, real‑world use cases, and the roadmap for future enhancements.

Big DataData LakehouseELT
0 likes · 18 min read
LakeSoul: An Open‑Source Real‑Time Data Lakehouse Framework – Design, Architecture, Benchmarks and Future Roadmap
DataFunTalk
DataFunTalk
Apr 10, 2023 · Big Data

Interview on Data Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase data‑lake technology manager Ma Jin explains the distinction between data lakes and lakehouses, reviews the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, evaluates feature maturity and performance trade‑offs, and discusses systematic versus non‑systematic adoption in enterprises.

Big DataData LakehouseDelta Lake
0 likes · 13 min read
Interview on Data Lakehouse: Current Applications, Challenges, and Evolution
Shopee Tech Team
Shopee Tech Team
Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiData Lakehouse
0 likes · 20 min read
Building Real-Time Data Warehouse with Flink + Hudi at Shopee