Tag

MPP

0 views collected around this technical thread.

Baidu Geek Talk
Baidu Geek Talk
Jun 9, 2025 · Databases

How BaikalDB Tackles OLAP Challenges with Vectorized and MPP Engines

BaikalDB, Baidu's distributed storage system, evolves from an OLTP‑focused engine to a hybrid HTAP architecture by introducing a vectorized query engine and a massively parallel processing (MPP) layer, addressing compute and resource bottlenecks for large‑scale analytical workloads while preserving transactional guarantees.

BaikalDBDatabase ArchitectureHTAP
0 likes · 18 min read
How BaikalDB Tackles OLAP Challenges with Vectorized and MPP Engines
Shopee Tech Team
Shopee Tech Team
Oct 25, 2024 · Big Data

StarRocks at Shopee: Practical Use Cases and Performance Analysis

Shopee’s deployment of StarRocks across DataService, DataGo, and DataStudio demonstrates that its vectorized engine, cost‑based optimizer, and materialized‑view caching can query Hive, Iceberg, Delta Lake and Hudi up to 20,000× faster than Presto, cutting CPU usage and delivering consistently lower latency for complex analytics.

HiveMPPPerformance Benchmark
0 likes · 11 min read
StarRocks at Shopee: Practical Use Cases and Performance Analysis
Wukong Talks Architecture
Wukong Talks Architecture
Jul 23, 2024 · Databases

An Overview of StarRocks: Architecture, Features, and Performance Benchmarks

StarRocks, an open‑source, high‑performance MPP analytical database under the Linux Foundation, offers vectorized engines, CBO optimizer, materialized views, and storage‑compute separation, integrates with BI tools and data lakes, and demonstrates superior query speed in benchmark tests against ClickHouse, Druid, and Trino.

Analytical DatabaseData LakehouseMPP
0 likes · 10 min read
An Overview of StarRocks: Architecture, Features, and Performance Benchmarks
Tencent Cloud Developer
Tencent Cloud Developer
Jul 11, 2024 · Databases

LibraDB Execution Engine Architecture Evolution and Optimization

LibraDB, the column‑store replica of TDSQL MySQL, has evolved its execution engine from a simple scatter‑gather model to a vectorized SMP pipeline that integrates MPP parallelism, asynchronous I/O, SIMD‑accelerated aggregation and join operators, work‑stealing, and runtime filters, thereby fully exploiting CPU, memory, network and disk resources for both OLTP and analytical queries.

DatabaseExecution EngineHash Join
0 likes · 22 min read
LibraDB Execution Engine Architecture Evolution and Optimization
Baidu Geek Talk
Baidu Geek Talk
Apr 10, 2024 · Big Data

TDA: A One‑Stop Self‑Service BI Platform – Architecture, Challenges, and Solutions

The article presents Turing Data Analysis (TDA), a self‑service BI platform that replaces fragile traditional pipelines with a unified DWD‑based data model, drag‑and‑drop analytics, multi‑engine query optimization and caching, delivering sub‑10‑second queries on billions of rows, fine‑grained permissions, and rapid dashboard creation, while reporting significant usage growth and outlining AI‑driven future enhancements.

BIBig DataMPP
0 likes · 15 min read
TDA: A One‑Stop Self‑Service BI Platform – Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
May 27, 2023 · Big Data

Building and Practicing the Performance Assurance System of YouShu BI

This article presents an in‑depth overview of the YouShu BI product, outlines the high‑concurrency performance challenges faced by enterprise BI, and details the multi‑layer performance architecture—including front‑end, back‑end, data engine, and data source layers—along with smart caching, MPP acceleration, materialized views, and the Data Doctor operations that together ensure low‑latency, reliable analytics for large‑scale users.

BIMPPMaterialized Views
0 likes · 16 min read
Building and Practicing the Performance Assurance System of YouShu BI
DataFunTalk
DataFunTalk
May 6, 2023 · Databases

Apache Doris: Overview, Data Lake Analysis Architecture, Community Development and Future Roadmap

This article provides a comprehensive overview of Apache Doris, detailing its origins, MPP‑based analytical capabilities, data‑lake integration techniques, recent architectural enhancements, performance optimizations, community growth, and upcoming development plans, while also addressing common user questions.

Analytical DatabaseApache DorisBig Data
0 likes · 20 min read
Apache Doris: Overview, Data Lake Analysis Architecture, Community Development and Future Roadmap
DataFunSummit
DataFunSummit
Dec 10, 2022 · Databases

StarRocks in the Modern Data Stack: Architecture Evolution, Typical Applications, and Performance Insights

This article presents a comprehensive overview of StarRocks within the modern data stack, covering the evolution of MPP architectures, typical industry use cases, core features, performance benchmark comparisons, real‑time data‑warehouse construction methods, CDP and lakehouse analytics, as well as short‑term roadmap plans and a brief Q&A.

CDPData WarehouseLakehouse
0 likes · 11 min read
StarRocks in the Modern Data Stack: Architecture Evolution, Typical Applications, and Performance Insights
Big Data Technology Architecture
Big Data Technology Architecture
Aug 13, 2022 · Big Data

Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices

This article details Xiaomi's three‑year journey of adopting Apache Doris across dozens of internal services, describing the transition from a Spark‑SQL‑based Lambda architecture to a unified MPP database, performance benchmarks, data ingestion pipelines, compaction tuning, two‑phase commit, single‑replica writes, monitoring, and community contributions.

Apache DorisCompactionData Warehouse
0 likes · 19 min read
Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices
DataFunTalk
DataFunTalk
Aug 2, 2022 · Databases

Apache Doris 1.0: Features, Architecture, Performance Improvements and Future Roadmap

This article introduces Apache Doris 1.0, detailing its simplified architecture, high‑concurrency support, MPP execution engine, vectorized engine, memory‑controlled stability, multi‑source integration, upcoming lake‑house unification, storage‑compute separation, real‑time ingestion, and community growth.

Analytical DatabaseApache DorisBig Data
0 likes · 18 min read
Apache Doris 1.0: Features, Architecture, Performance Improvements and Future Roadmap
Baidu Geek Talk
Baidu Geek Talk
Jul 1, 2022 · Big Data

Evolution of Data Platform Technology: From Data Warehouse to Lakehouse Architecture

The article traces the evolution of data platforms from early data warehouses—using schema‑on‑write, columnar storage, and MPP engines—to data lakes that retain raw data with schema‑on‑read, and finally to lakehouse architectures that merge storage and compute, offering unified metadata, versioning, and support for BI, big‑data, AI, and HPC workloads.

Data WarehouseLakehouseMPP
0 likes · 25 min read
Evolution of Data Platform Technology: From Data Warehouse to Lakehouse Architecture
DataFunSummit
DataFunSummit
May 14, 2022 · Databases

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

This article presents the cloud‑native redesign of ClickHouse, covering its current technical limitations in storage and computation, the proposed storage‑compute separation with DDL task management, multi‑replica and CommitLog mechanisms, and a new MPP query layer to meet future data‑warehouse demands such as real‑time analytics, flexibility, high throughput, low cost, and support for semi‑structured data.

Big DataClickHouseData Warehouse
0 likes · 15 min read
Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer
DataFunSummit
DataFunSummit
Feb 20, 2022 · Databases

Understanding TiDB Architecture and Real‑Time Application Scenarios

This article explains TiDB's HTAP architecture, covering industry challenges, the row‑store TiKV and column‑store TiFlash design, MPP integration in TiDB 5.0, and a range of real‑time use cases such as dashboards, reporting, and data‑warehouse pipelines.

Database ArchitectureHTAPMPP
0 likes · 16 min read
Understanding TiDB Architecture and Real‑Time Application Scenarios
Tencent Architect
Tencent Architect
Dec 10, 2021 · Databases

How a Cloud‑Native MPP Query Layer Turns ClickHouse into a Snowflake‑Like Data Warehouse

This article explains the design and implementation of a cloud‑native MPP query layer for ClickHouse, detailing its architecture, core features, execution flow, performance advantages, SQL compatibility, and future development plans to create a high‑performance, multi‑source OLAP data platform.

ClickHouseData WarehouseMPP
0 likes · 13 min read
How a Cloud‑Native MPP Query Layer Turns ClickHouse into a Snowflake‑Like Data Warehouse
Big Data Technology Architecture
Big Data Technology Architecture
Jun 4, 2021 · Big Data

Types of OLAP Data Warehouses and Performance Optimization Techniques

This article explains the various classifications of OLAP data warehouses—including MOLAP, ROLAP, HOLAP, and HTAP—based on data volume and modeling, reviews common open‑source ROLAP products, and details performance‑boosting techniques such as MPP architecture, cost‑based optimization, vectorized execution, and storage optimizations.

Big DataCost-Based OptimizationData Warehouse
0 likes · 27 min read
Types of OLAP Data Warehouses and Performance Optimization Techniques
DataFunTalk
DataFunTalk
Mar 24, 2021 · Big Data

Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform

This article details how KuJiaLe's big data team replaced their legacy ADB and Presto clusters with a DorisDB MPP database, achieving sub‑second query latency, unified real‑time and offline analytics, simplified ETL pipelines, and significant cost savings while supporting billion‑row tables and high‑QPS workloads.

Big DataData WarehouseDorisDB
0 likes · 9 min read
Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform
DataFunTalk
DataFunTalk
Nov 23, 2020 · Big Data

Choosing OLAP Solutions for Large-Scale Data at Youku

The article examines the challenges big data brings to traditional technologies and surveys major OLAP solutions—MPP, batch processing, and pre‑computation—including Greenplum, Druid, Kylin, and Hadoop‑based engines, then outlines Youku’s specific use‑case selections for real‑time APIs, BI reporting, and ad‑hoc analysis.

Big DataData EngineeringMPP
0 likes · 13 min read
Choosing OLAP Solutions for Large-Scale Data at Youku
DataFunSummit
DataFunSummit
Nov 12, 2020 · Big Data

OLAP Engine Selection and Challenges in Large-Scale Data at Youku

This article explores the challenges big data brings to traditional data technologies and reviews various OLAP solutions—including MPP, batch processing, pre‑computation, and Hadoop‑based engines—while detailing Youku’s specific business scenarios and how different OLAP engines are selected to meet performance, scalability, and real‑time analysis requirements.

Big DataData WarehouseMPP
0 likes · 14 min read
OLAP Engine Selection and Challenges in Large-Scale Data at Youku