Tagged articles

MPP

57 articles · Page 1 of 1

Jun 28, 2026 · Databases

Returning to Nanda Tongyong After 15 Years: A Dual Reunion of Time and Technology

Fifteen years after an internship at Nanda Tongyong, the author returns as an external expert to reflect on personal growth, the evolution of the GBase product line—from the early MPP database to AI‑native HTAP solutions—highlighting technical innovations, benchmark results, and the broader impact on the domestic database ecosystem.

AICloud Data WarehouseDatabase

0 likes · 17 min read

Returning to Nanda Tongyong After 15 Years: A Dual Reunion of Time and Technology

Tech Freedom Circle

Sep 1, 2025 · Databases

How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows

This article explains ClickHouse’s core architecture—including its storage‑compute integration, MPP parallelism, columnar storage, vectorized execution, data pre‑sorting, table engines, sparse and auxiliary indexes, and the two‑stage aggregation pipeline—then walks through the exact GROUP BY execution flow for both local and distributed tables, illustrating each step with diagrams, SQL demos, and code snippets.

ClickHouseColumnar StorageDistributed Query

0 likes · 29 min read

How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows

Baidu Geek Talk

Jun 9, 2025 · Databases

How BaikalDB Tackles OLAP Challenges with Vectorized and MPP Engines

BaikalDB, Baidu's distributed storage system, evolves from an OLTP‑focused engine to a hybrid HTAP architecture by introducing a vectorized query engine and a massively parallel processing (MPP) layer, addressing compute and resource bottlenecks for large‑scale analytical workloads while preserving transactional guarantees.

BaikalDBDatabase ArchitectureHTAP

0 likes · 18 min read

How BaikalDB Tackles OLAP Challenges with Vectorized and MPP Engines

Tencent Cloud Developer

Nov 1, 2024 · Databases

How TDSQL Dominated Global OLAP & OLTP Benchmarks: Inside the Technical Secrets

Tencent Cloud's TDSQL shattered world records in both TPC‑DS (OLAP) and TPC‑C (OLTP) benchmarks, achieving a 7260 M QphDS score at a cost of 37.52 CNY/kQphDS, and the article explains the three self‑developed technologies—MPP execution, parallel execution framework, and columnar‑vectorized engine—that made this performance possible.

Columnar StorageDatabase PerformanceMPP

0 likes · 7 min read

How TDSQL Dominated Global OLAP & OLTP Benchmarks: Inside the Technical Secrets

Shopee Tech Team

Oct 25, 2024 · Big Data

StarRocks at Shopee: Practical Use Cases and Performance Analysis

Shopee’s deployment of StarRocks across DataService, DataGo, and DataStudio demonstrates that its vectorized engine, cost‑based optimizer, and materialized‑view caching can query Hive, Iceberg, Delta Lake and Hudi up to 20,000× faster than Presto, cutting CPU usage and delivering consistently lower latency for complex analytics.

Data LakeHiveMPP

0 likes · 11 min read

StarRocks at Shopee: Practical Use Cases and Performance Analysis

Senior Tony

Sep 19, 2024 · Databases

Why ClickHouse Outperforms MySQL: Deep Dive into Architecture and Benchmarks

This article compares ClickHouse and MySQL by examining benchmark results, MPP architecture, columnar storage, compression techniques, vectorized execution, and index designs, showing why ClickHouse delivers dramatically higher query performance on massive data sets.

ClickHouseColumnar StorageDatabases

0 likes · 8 min read

Why ClickHouse Outperforms MySQL: Deep Dive into Architecture and Benchmarks

ITPUB

Sep 11, 2024 · Big Data

Is Storage‑Compute Separation the Future? Unpacking the Lakehouse Debate

The article examines the concepts of storage‑compute separation and the lake‑warehouse (lakehouse) model, tracing their evolution from physical Hadoop clusters to containerized compute and object storage, and argues that true separation requires MPP systems to adopt open standards, effectively merging lake and warehouse architectures.

Big Data ArchitectureHadoopLakehouse

0 likes · 7 min read

Is Storage‑Compute Separation the Future? Unpacking the Lakehouse Debate

StarRocks

Aug 9, 2024 · Big Data

How Pinterest Cut Query Latency by 50% with StarRocks Migration

Pinterest migrated its Partner Insights analytics from Druid to StarRocks, achieving a 50% reduction in p90 latency, a six‑fold cost‑performance improvement, and simplified data ingestion, illustrating the benefits of a modern MPP database for real‑time ad analytics.

AnalyticsMPPPinterest

0 likes · 6 min read

How Pinterest Cut Query Latency by 50% with StarRocks Migration

Wukong Talks Architecture

Jul 23, 2024 · Databases

An Overview of StarRocks: Architecture, Features, and Performance Benchmarks

StarRocks, an open‑source, high‑performance MPP analytical database under the Linux Foundation, offers vectorized engines, CBO optimizer, materialized views, and storage‑compute separation, integrates with BI tools and data lakes, and demonstrates superior query speed in benchmark tests against ClickHouse, Druid, and Trino.

Data LakehouseMPPPerformance Benchmark

0 likes · 10 min read

An Overview of StarRocks: Architecture, Features, and Performance Benchmarks

Tencent Cloud Developer

Jul 11, 2024 · Databases

LibraDB Execution Engine Architecture Evolution and Optimization

LibraDB, the column‑store replica of TDSQL MySQL, has evolved its execution engine from a simple scatter‑gather model to a vectorized SMP pipeline that integrates MPP parallelism, asynchronous I/O, SIMD‑accelerated aggregation and join operators, work‑stealing, and runtime filters, thereby fully exploiting CPU, memory, network and disk resources for both OLTP and analytical queries.

DatabaseExecution EngineHash Join

0 likes · 22 min read

LibraDB Execution Engine Architecture Evolution and Optimization

dbaplus Community

Jul 10, 2024 · Databases

Why ClickHouse Dominates OLAP Performance: An In‑Depth Architecture Guide

This article explains ClickHouse’s columnar, MPP‑based design, block compression, LSM pre‑sorting, sparse and skip‑list indexing, and vectorized execution, while also discussing its high‑frequency write challenges, concurrency limits, and production‑grade issues such as Zookeeper load and resource management.

ClickHouseColumnar DatabaseIndexing

0 likes · 11 min read

Why ClickHouse Dominates OLAP Performance: An In‑Depth Architecture Guide

StarRocks

Jun 6, 2024 · Big Data

Why StarRocks Beats Trino: A Deep Technical Comparison

This article provides a detailed technical comparison between StarRocks and Trino, covering their shared MPP architecture, cost‑based optimizer, pipeline execution, ANSI SQL support, differences in vectorized execution, materialized view capabilities, caching systems, data source connectors, benchmark results, high‑availability designs, join algorithms, and real‑world user case studies.

Big DataCacheMPP

0 likes · 20 min read

Why StarRocks Beats Trino: A Deep Technical Comparison

Baidu Geek Talk

Apr 10, 2024 · Big Data

TDA: A One‑Stop Self‑Service BI Platform – Architecture, Challenges, and Solutions

The article presents Turing Data Analysis (TDA), a self‑service BI platform that replaces fragile traditional pipelines with a unified DWD‑based data model, drag‑and‑drop analytics, multi‑engine query optimization and caching, delivering sub‑10‑second queries on billions of rows, fine‑grained permissions, and rapid dashboard creation, while reporting significant usage growth and outlining AI‑driven future enhancements.

BIBig DataData Platform

0 likes · 15 min read

TDA: A One‑Stop Self‑Service BI Platform – Architecture, Challenges, and Solutions

DataFunTalk

Jun 25, 2023 · Databases

An Overview of Apache Doris: Minimal Architecture, Simplicity, Rich Features, and Open‑Source Design

Apache Doris is an open‑source MPP OLAP database that combines a minimalist architecture, ease of use, rich features such as partition‑bucket pruning, materialized views, and bitmap indexes, and provides high‑performance, scalable, and reliable data warehousing for big‑data analytics.

Apache DorisBig DataData Warehouse

0 likes · 19 min read

An Overview of Apache Doris: Minimal Architecture, Simplicity, Rich Features, and Open‑Source Design

DataFunSummit

May 27, 2023 · Big Data

Building and Practicing the Performance Assurance System of YouShu BI

This article presents an in‑depth overview of the YouShu BI product, outlines the high‑concurrency performance challenges faced by enterprise BI, and details the multi‑layer performance architecture—including front‑end, back‑end, data engine, and data source layers—along with smart caching, MPP acceleration, materialized views, and the Data Doctor operations that together ensure low‑latency, reliable analytics for large‑scale users.

BIData PlatformMPP

0 likes · 16 min read

Building and Practicing the Performance Assurance System of YouShu BI

DataFunTalk

May 6, 2023 · Databases

Apache Doris: Overview, Data Lake Analysis Architecture, Community Development and Future Roadmap

This article provides a comprehensive overview of Apache Doris, detailing its origins, MPP‑based analytical capabilities, data‑lake integration techniques, recent architectural enhancements, performance optimizations, community growth, and upcoming development plans, while also addressing common user questions.

Apache DorisBig DataData Lake

0 likes · 20 min read

Apache Doris: Overview, Data Lake Analysis Architecture, Community Development and Future Roadmap

ITPUB

Mar 14, 2023 · Big Data

How to Build Real-Time Active‑Active Disaster Recovery for OLAP MPP Clusters

This article explains why disaster‑recovery and active‑active architectures are essential for OLAP MPP data‑warehouse clusters, outlines the specific RPO/RTO requirements for batch and real‑time workloads, and compares several data‑synchronization techniques and active‑active deployment models with their advantages and drawbacks.

Active-ActiveDisaster RecoveryHigh Availability

0 likes · 12 min read

How to Build Real-Time Active‑Active Disaster Recovery for OLAP MPP Clusters

StarRing Big Data Open Lab

Feb 24, 2023 · Big Data

What Makes MPP Databases the Powerhouse Behind Modern Data Analytics?

MPP (Massive Parallel Processing) databases, designed for large‑scale analytical workloads, use distributed, shared‑nothing architectures with multiple control and compute nodes, offering high scalability, diverse data‑sharding strategies, and powerful SQL compatibility, as illustrated by vendors like Teradata, Vertica, Greenplum, and emerging open‑source solutions.

Big DataDistributed ComputingGreenplum

0 likes · 15 min read

What Makes MPP Databases the Powerhouse Behind Modern Data Analytics?

ITPUB

Feb 13, 2023 · Databases

How Apache Doris Enables Cloud‑Native Real‑Time Data Warehousing for Log Analytics

Based on a DTCC2022 presentation, this article explains Apache Doris's high‑performance MPP architecture, its cloud‑native extensions in SelectDB, and how they solve large‑scale log storage and analysis with superior write throughput, storage efficiency, and interactive query speed.

Apache DorisMPPSelectDB

0 likes · 11 min read

How Apache Doris Enables Cloud‑Native Real‑Time Data Warehousing for Log Analytics

ITPUB

Jan 3, 2023 · Databases

How DragonF MPP DB Redefines Cloud‑Native Data Warehousing at Massive Scale

The article details the design, core features, and real‑world performance of the DragonF MPP DB, a cloud‑native, compute‑storage‑separated database that overcomes traditional MPP limitations, supports millions of daily jobs, and outlines its future roadmap for ultra‑large‑scale data platforms.

Big DataCloud NativeData Warehouse

0 likes · 11 min read

How DragonF MPP DB Redefines Cloud‑Native Data Warehousing at Massive Scale

DataFunSummit

Dec 10, 2022 · Databases

StarRocks in the Modern Data Stack: Architecture Evolution, Typical Applications, and Performance Insights

This article presents a comprehensive overview of StarRocks within the modern data stack, covering the evolution of MPP architectures, typical industry use cases, core features, performance benchmark comparisons, real‑time data‑warehouse construction methods, CDP and lakehouse analytics, as well as short‑term roadmap plans and a brief Q&A.

CDPMPPPerformance Benchmark

0 likes · 11 min read

StarRocks

Oct 13, 2022 · Databases

Inside StarRocks: How the Pipeline Execution Engine Boosts Query Performance

This article explains the core concepts, architecture, and code logic of StarRocks' Pipeline execution framework, covering ExecPlan, PlanFragment, Fragment Instance, ExecNode, SourceOperator, SinkOperator, PipelineDriver scheduling, asynchronous handling of blocking operations, and the roles of FE and BE in MPP scheduling.

Execution EngineMPPScheduling

0 likes · 13 min read

Inside StarRocks: How the Pipeline Execution Engine Boosts Query Performance

DataFunSummit

Sep 30, 2022 · Big Data

MercsDB: Architecture, Storage, Computation, and Optimization of Tencent's MPP Data Warehouse Engine

The article presents a comprehensive technical overview of MercsDB—formerly HermesDB—including its background, storage and indexing designs, native and Presto computation engines, vectorization optimizations, benchmark results, real‑world applications, and future development plans.

Big DataColumnar StorageMPP

0 likes · 20 min read

MercsDB: Architecture, Storage, Computation, and Optimization of Tencent's MPP Data Warehouse Engine

Big Data Technology Architecture

Aug 13, 2022 · Big Data

Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices

This article details Xiaomi's three‑year journey of adopting Apache Doris across dozens of internal services, describing the transition from a Spark‑SQL‑based Lambda architecture to a unified MPP database, performance benchmarks, data ingestion pipelines, compaction tuning, two‑phase commit, single‑replica writes, monitoring, and community contributions.

Apache DorisCompactionData Warehouse

0 likes · 19 min read

Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices

dbaplus Community

Aug 11, 2022 · Databases

Why ClickHouse Outperforms Other Databases: Core Features and Architecture Explained

This article explains ClickHouse’s MPP columnar design, complete DBMS capabilities, columnar storage, vectorized execution, multi‑master architecture, real‑time queries, sharding, and performance‑focused hardware and algorithm choices that together deliver its superior speed.

ClickHouseColumnar DatabaseDatabase Architecture

0 likes · 17 min read

Why ClickHouse Outperforms Other Databases: Core Features and Architecture Explained

DataFunTalk

Aug 2, 2022 · Databases

Apache Doris 1.0: Features, Architecture, Performance Improvements and Future Roadmap

This article introduces Apache Doris 1.0, detailing its simplified architecture, high‑concurrency support, MPP execution engine, vectorized engine, memory‑controlled stability, multi‑source integration, upcoming lake‑house unification, storage‑compute separation, real‑time ingestion, and community growth.

Apache DorisMPPOpen-source

0 likes · 18 min read

Apache Doris 1.0: Features, Architecture, Performance Improvements and Future Roadmap

Shepherd Advanced Notes

Jul 6, 2022 · Databases

Understanding Apache Doris: Real‑Time Analytical Database Architecture and Data Modeling

This article introduces Apache Doris, a high‑performance, real‑time analytical database built on an MPP architecture, covering its simple FE/BE design, three data‑model types (Aggregate, Unique, Duplicate), partitioning and bucketing strategies, rollup tables, and the limitations and best practices of row‑level updates.

Apache DorisMPPdata modeling

0 likes · 28 min read

Understanding Apache Doris: Real‑Time Analytical Database Architecture and Data Modeling

MaGe Linux Operations

Jul 3, 2022 · Fundamentals

Understanding SMP, NUMA, and MPP: Which Server Architecture Fits Your Needs?

This article explains the three main commercial server architectures—SMP, NUMA, and MPP—detailing their structures, performance characteristics, scalability limits, and suitability for OLTP versus data‑warehouse workloads, while also covering practical considerations such as virtualization and real‑world examples.

MPPSMPServer Architecture

0 likes · 16 min read

Understanding SMP, NUMA, and MPP: Which Server Architecture Fits Your Needs?

StarRocks

Jun 29, 2022 · Big Data

How StarRocks Boosted Query Performance 2‑3× for a 1TB‑Daily Data Platform

The Qunhe Technology data team replaced their legacy Hadoop and Presto clusters with a StarRocks MPP database, achieving up to three times faster queries, supporting billion‑row tables and sub‑second latency for both real‑time and analytical workloads on a daily 1TB data influx.

Big DataMPPOLAP

0 likes · 10 min read

How StarRocks Boosted Query Performance 2‑3× for a 1TB‑Daily Data Platform

Big Data Technology & Architecture

May 30, 2022 · Big Data

Doris Architecture, Principles, and Key Features Overview

This article provides a comprehensive overview of Doris's architecture—including its FE and BE components, metadata management, data organization, execution planning—and details its major features such as adaptive join aggregation, vectorized execution, materialized views, and Elasticsearch integration, supplemented with example DDL and query code.

Big DataDatabase ArchitectureDoris

0 likes · 7 min read

Doris Architecture, Principles, and Key Features Overview

DataFunSummit

May 14, 2022 · Databases

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

This article presents the cloud‑native redesign of ClickHouse, covering its current technical limitations in storage and computation, the proposed storage‑compute separation with DDL task management, multi‑replica and CommitLog mechanisms, and a new MPP query layer to meet future data‑warehouse demands such as real‑time analytics, flexibility, high throughput, low cost, and support for semi‑structured data.

Big DataClickHouseCloud Native

0 likes · 15 min read

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

StarRocks

Apr 24, 2022 · Databases

How StarRocks Transforms a SQL Query into Distributed Execution: A Deep Dive

This article explains how StarRocks converts a SQL statement into an optimal distributed physical execution plan, schedules the plan across compute nodes, and runs it using MPP, pipeline parallelism, and vectorized execution to achieve near‑linear performance scaling.

CBO optimizerMPPSQL query processing

0 likes · 15 min read

StarRocks

Apr 13, 2022 · Big Data

How StarRocks Achieves Lightning‑Fast Data Lake Analytics

This article explains StarRocks' streamlined architecture, cost‑based optimizer, massively parallel processing and vectorized engine, and how they enable high‑performance queries over data stored in Hive, Iceberg, Hudi and other lake formats, backed by benchmark results and future roadmap details.

Big DataCBOData Lake

0 likes · 19 min read

StarRocks

Mar 29, 2022 · Big Data

How StarRocks Handles PB‑Scale Real‑Time Analytics with High Availability

This article explains how StarRocks manages petabyte‑level user behavior logs, ads and orders through a shared‑nothing architecture, tablet‑based data distribution, MPP compute, high‑availability metadata, real‑time mini‑batch ingestion, and online schema changes, enabling 24/7 analytical services for diverse internet companies.

MPPStarRocksonline schema change

0 likes · 11 min read

How StarRocks Handles PB‑Scale Real‑Time Analytics with High Availability

DataFunSummit

Feb 20, 2022 · Databases

Understanding TiDB Architecture and Real‑Time Application Scenarios

This article explains TiDB's HTAP architecture, covering industry challenges, the row‑store TiKV and column‑store TiFlash design, MPP integration in TiDB 5.0, and a range of real‑time use cases such as dashboards, reporting, and data‑warehouse pipelines.

Database ArchitectureHTAPMPP

0 likes · 16 min read

Understanding TiDB Architecture and Real‑Time Application Scenarios

dbaplus Community

Dec 23, 2021 · Databases

Building a 16,000‑Node Cloud‑Native MPP Data Warehouse: Lessons from CCB

The article details how China Construction Bank's fintech arm designed, deployed, and operated a cloud‑native, three‑layer MPP data warehouse spanning 16,000 servers, covering architectural choices, performance gains, operational automation, and high‑availability strategies for ultra‑large scale workloads.

Cloud NativeData WarehouseDatabase Architecture

0 likes · 10 min read

Building a 16,000‑Node Cloud‑Native MPP Data Warehouse: Lessons from CCB

Tencent Architect

Dec 10, 2021 · Databases

How a Cloud‑Native MPP Query Layer Turns ClickHouse into a Snowflake‑Like Data Warehouse

This article explains the design and implementation of a cloud‑native MPP query layer for ClickHouse, detailing its architecture, core features, execution flow, performance advantages, SQL compatibility, and future development plans to create a high‑performance, multi‑source OLAP data platform.

ClickHouseCloud NativeMPP

0 likes · 13 min read

How a Cloud‑Native MPP Query Layer Turns ClickHouse into a Snowflake‑Like Data Warehouse

ITPUB

Sep 13, 2021 · Big Data

MapReduce vs MPP: Choosing the Right Engine for Global Data Warehousing

A team of engineers at MBI debates the merits of MapReduce, MPP, and Hive for their KeepS global data‑warehouse, discussing technical differences, scalability, concurrency, and the feasibility of mixed batch engines while navigating budget and operational constraints.

Cluster ComputingGrid ComputingHive

0 likes · 20 min read

MapReduce vs MPP: Choosing the Right Engine for Global Data Warehousing

Big Data Technology Architecture

Jun 4, 2021 · Big Data

Types of OLAP Data Warehouses and Performance Optimization Techniques

This article explains the various classifications of OLAP data warehouses—including MOLAP, ROLAP, HOLAP, and HTAP—based on data volume and modeling, reviews common open‑source ROLAP products, and details performance‑boosting techniques such as MPP architecture, cost‑based optimization, vectorized execution, and storage optimizations.

Cost-Based OptimizationData WarehouseMPP

0 likes · 27 min read

Types of OLAP Data Warehouses and Performance Optimization Techniques

DataFunTalk

Mar 24, 2021 · Big Data

Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform

This article details how KuJiaLe's big data team replaced their legacy ADB and Presto clusters with a DorisDB MPP database, achieving sub‑second query latency, unified real‑time and offline analytics, simplified ETL pipelines, and significant cost savings while supporting billion‑row tables and high‑QPS workloads.

Big DataDorisDBETL

0 likes · 9 min read

Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform

DataFunTalk

Nov 23, 2020 · Big Data

Choosing OLAP Solutions for Large-Scale Data at Youku

The article examines the challenges big data brings to traditional technologies and surveys major OLAP solutions—MPP, batch processing, and pre‑computation—including Greenplum, Druid, Kylin, and Hadoop‑based engines, then outlines Youku’s specific use‑case selections for real‑time APIs, BI reporting, and ad‑hoc analysis.

MPPOLAPPrecomputation

0 likes · 13 min read

Choosing OLAP Solutions for Large-Scale Data at Youku

DataFunSummit

Nov 12, 2020 · Big Data

OLAP Engine Selection and Challenges in Large-Scale Data at Youku

This article explores the challenges big data brings to traditional data technologies and reviews various OLAP solutions—including MPP, batch processing, pre‑computation, and Hadoop‑based engines—while detailing Youku’s specific business scenarios and how different OLAP engines are selected to meet performance, scalability, and real‑time analysis requirements.

AnalyticsBig DataData Warehouse

0 likes · 14 min read

OLAP Engine Selection and Challenges in Large-Scale Data at Youku

Tencent Cloud Developer

Sep 9, 2020 · Big Data

Tencent Game Marketing Deduplication Service: Technical Evolution from TDW to ClickHouse

Tencent’s game marketing analysis system “EAS” evolved from inefficient TDW HiveSQL jobs and file‑heavy real‑time pipelines to a scalable ClickHouse‑based deduplication service that processes hundreds of thousands of daily activity counts in sub‑second time, offering fast, reliable, and maintainable participant deduplication for massive marketing campaigns.

ClickHouseDeduplicationLevelDB

0 likes · 10 min read

Tencent Game Marketing Deduplication Service: Technical Evolution from TDW to ClickHouse

dbaplus Community

Aug 4, 2020 · Databases

How Doris Powers Meituan’s Real‑Time Data Warehouse: ROLAP vs MOLAP Lessons

This article examines Meituan’s data warehouse evolution, detailing the limitations of MOLAP with Kylin, the adoption of Doris‑driven ROLAP using MPP technology, and the practical optimizations—such as join predicate pushdown, concurrent execution, colocate join, and bitmap aggregation—that improve real‑time analytics and reduce costs.

Data WarehouseDorisMOLAP

0 likes · 19 min read

How Doris Powers Meituan’s Real‑Time Data Warehouse: ROLAP vs MOLAP Lessons

Big Data Technology Architecture

Jun 28, 2020 · Databases

Understanding OLAP Data Warehouse Types, Architectures, and Performance Optimizations

This article provides a comprehensive overview of OLAP data warehouses, covering classification by data volume and modeling, detailed explanations of MOLAP, ROLAP, HOLAP and HTAP, common open‑source implementations, and a deep dive into performance‑boosting techniques such as MPP architectures, cost‑based optimization, vectorized execution, dynamic code generation, storage compression, runtime filters and resource management.

Cost-Based OptimizationData WarehouseDynamic Code Generation

0 likes · 25 min read

Understanding OLAP Data Warehouse Types, Architectures, and Performance Optimizations

Hulu Beijing

Oct 28, 2019 · Big Data

How Hulu Uses Big Data to Power Precise Advertising and Real‑Time Streaming

At a Tsinghua University forum, Hulu presented a comprehensive overview of its big‑data solutions for advertising and streaming, covering challenges of massive, complex data, the limits of MySQL, and advanced techniques using HBase, Protobuf, Redis batch pipelines, and its own MPP engine Nesto for high‑performance, scalable analytics.

AdvertisingHBaseMPP

0 likes · 6 min read

How Hulu Uses Big Data to Power Precise Advertising and Real‑Time Streaming

Tencent Cloud Developer

Jul 18, 2019 · Big Data

Tencent iData Analysis Center: Why We Chose Spark as Our Computing Platform

Tencent’s iData analysis center selected Spark as its new computing platform because, unlike ElasticSearch, TiDB, and other MPP solutions, Spark offers iterative processing, shuffle support, robust SQL and DAG scheduling, and flexible SMP‑style data exchange, enabling efficient OLAP on billions of game‑user records.

Big DataData PlatformDistributed Computing

0 likes · 13 min read

Tencent iData Analysis Center: Why We Chose Spark as Our Computing Platform

DataFunTalk

Jul 4, 2019 · Databases

An Overview of Apache Doris: Architecture, Key Technologies, and Real‑World Use Cases

This article introduces Apache Doris, a massively parallel processing (MPP) distributed database, covering its background, core architecture, key technologies such as reliability, maintenance, MySQL compatibility, materialized views, and real‑world applications at Baidu and beyond.

Data WarehouseDorisKafka Integration

0 likes · 12 min read

An Overview of Apache Doris: Architecture, Key Technologies, and Real‑World Use Cases

dbaplus Community

Nov 4, 2018 · Databases

How Spark Turns Traditional Databases into Powerful OLAP Engines

This article examines why traditional relational databases like MySQL struggle with analytical workloads, compares ROLAP and MOLAP approaches, explains Spark’s architecture and its advantages for OLAP, and details how Alibaba Cloud’s DRDS HTAP leverages a Spark‑based engine to deliver real‑time distributed query processing.

Data WarehouseDatabasesHTAP

0 likes · 11 min read

How Spark Turns Traditional Databases into Powerful OLAP Engines

Architecture Digest

Jun 22, 2018 · Databases

Distributed Databases for OLAP: MPP, Hadoop Ecosystem, and Like‑Mesa (ClickHouse/Palo) Overview

This article examines the evolution and classification of distributed databases for OLAP workloads, comparing traditional RDBMS, MPP solutions such as Teradata and Greenplum, Hadoop‑based ecosystems, and newer architectures like ClickHouse and Palo, while highlighting their architectural traits, strengths, and limitations.

ClickHouseHadoopMPP

0 likes · 17 min read

Distributed Databases for OLAP: MPP, Hadoop Ecosystem, and Like‑Mesa (ClickHouse/Palo) Overview

Architects' Tech Alliance

Apr 12, 2018 · Fundamentals

Understanding MPI, OpenMPI, OpenMP and the Differences Between SMP, NUMA, and MPP Architectures

This article explains the concepts of MPI, OpenMPI, and OpenMP, compares three major server architectures—SMP, NUMA, and MPP—and discusses their performance characteristics, scalability limits, and typical application scenarios in high‑performance computing.

HPCMPIMPP

0 likes · 13 min read

Understanding MPI, OpenMPI, OpenMP and the Differences Between SMP, NUMA, and MPP Architectures

Baidu Waimai Technology Team

Apr 20, 2017 · Databases

Greenplum (GPDB) Architecture, Features, and Operational Tools Overview

This article explains Greenplum's MPP architecture, master‑segment design, high‑availability, interconnect network, rich management tools, parallel query planning, data loading techniques, and additional capabilities such as LDAP authentication and resource queues, demonstrating why it is a strong next‑generation big‑data query engine.

Big DataDatabaseGreenplum

0 likes · 15 min read

Greenplum (GPDB) Architecture, Features, and Operational Tools Overview

360 Zhihui Cloud Developer

Mar 9, 2017 · Databases

Master Greenplum Table Design & Performance Optimization: Practical Tips

This article explains what Greenplum is, its MPP shared‑nothing architecture, and provides concrete table‑design principles, distribution‑column strategies, indexing guidance, vacuum and table‑rebuilding techniques, as well as SQL, join, insert, update/delete, and resource‑queue optimizations for better performance.

Database DesignDistributed TablesGreenplum

0 likes · 9 min read

Master Greenplum Table Design & Performance Optimization: Practical Tips

Architects' Tech Alliance

Nov 19, 2016 · Databases

An Overview of Greenplum Database Architecture and Core Components

Greenplum is an open‑source, massively parallel processing (MPP) database built on PostgreSQL, offering ANSI‑SQL compliance, distributed ACID transactions, linear scalability, polymorphic storage, advanced optimizers, and extensive ecosystem integrations, making it suitable for large‑scale data warehousing, analytics, and big‑data workloads.

Data WarehousingDatabaseGreenplum

0 likes · 15 min read

An Overview of Greenplum Database Architecture and Core Components

ITPUB

Jun 29, 2016 · Big Data

Why OLTP Falls Short for Big Data: OLAP, Hadoop & MPP Explained

The article explains how traditional OLTP systems cannot satisfy modern big‑data analytics needs and compares OLAP, Hadoop, and MPP architectures, highlighting their data processing models, scalability, cloud‑based managed services, and practical recommendations for building effective data warehouses.

Big DataData WarehouseHadoop

0 likes · 21 min read

Why OLTP Falls Short for Big Data: OLAP, Hadoop & MPP Explained

dbaplus Community

Feb 22, 2016 · Databases

Mastering Greenplum: Planning, Data Modeling, and Daily Ops Best Practices

This article delivers a comprehensive guide to Greenplum deployment, covering early architecture planning, data‑model design, daily maintenance best practices, system‑table management, diagnostic tools like gpcheckcat, and detailed troubleshooting techniques for persistent tables and other common issues.

GreenplumMPPSystem Tables

0 likes · 13 min read

Mastering Greenplum: Planning, Data Modeling, and Daily Ops Best Practices

Art of Distributed System Architecture Design

Aug 4, 2015 · Databases

Development Trends and Challenges of Large‑Scale Parallel Databases

Since the 1970s databases have become essential middleware, and modern large‑scale parallel databases, designed for extreme parallelism on clustered hardware, face trade‑offs in performance, scalability, and fault tolerance, prompting a shift toward cloud‑native, micro‑service architectures and new hardware such as SSDs and memory‑centric designs.

In-MemoryMPPMicroservices

0 likes · 23 min read

Development Trends and Challenges of Large‑Scale Parallel Databases