Tagged articles
55 articles
Page 1 of 1
Tech Freedom Circle
Tech Freedom Circle
Sep 1, 2025 · Databases

How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows

This article explains ClickHouse’s core architecture—including its storage‑compute integration, MPP parallelism, columnar storage, vectorized execution, data pre‑sorting, table engines, sparse and auxiliary indexes, and the two‑stage aggregation pipeline—then walks through the exact GROUP BY execution flow for both local and distributed tables, illustrating each step with diagrams, SQL demos, and code snippets.

Columnar StorageDistributed QueryGROUP BY
0 likes · 29 min read
How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows
Baidu Geek Talk
Baidu Geek Talk
Jun 9, 2025 · Databases

How BaikalDB Tackles OLAP Challenges with Vectorized and MPP Engines

BaikalDB, Baidu's distributed storage system, evolves from an OLTP‑focused engine to a hybrid HTAP architecture by introducing a vectorized query engine and a massively parallel processing (MPP) layer, addressing compute and resource bottlenecks for large‑scale analytical workloads while preserving transactional guarantees.

BaikalDBDatabase ArchitectureHTAP
0 likes · 18 min read
How BaikalDB Tackles OLAP Challenges with Vectorized and MPP Engines
Tencent Cloud Developer
Tencent Cloud Developer
Nov 1, 2024 · Databases

How TDSQL Dominated Global OLAP & OLTP Benchmarks: Inside the Technical Secrets

Tencent Cloud's TDSQL shattered world records in both TPC‑DS (OLAP) and TPC‑C (OLTP) benchmarks, achieving a 7260 M QphDS score at a cost of 37.52 CNY/kQphDS, and the article explains the three self‑developed technologies—MPP execution, parallel execution framework, and columnar‑vectorized engine—that made this performance possible.

Columnar StorageDatabase PerformanceMPP
0 likes · 7 min read
How TDSQL Dominated Global OLAP & OLTP Benchmarks: Inside the Technical Secrets
Shopee Tech Team
Shopee Tech Team
Oct 25, 2024 · Big Data

StarRocks at Shopee: Practical Use Cases and Performance Analysis

Shopee’s deployment of StarRocks across DataService, DataGo, and DataStudio demonstrates that its vectorized engine, cost‑based optimizer, and materialized‑view caching can query Hive, Iceberg, Delta Lake and Hudi up to 20,000× faster than Presto, cutting CPU usage and delivering consistently lower latency for complex analytics.

Data LakeMPPPresto
0 likes · 11 min read
StarRocks at Shopee: Practical Use Cases and Performance Analysis
Senior Tony
Senior Tony
Sep 19, 2024 · Databases

Why ClickHouse Outperforms MySQL: Deep Dive into Architecture and Benchmarks

This article compares ClickHouse and MySQL by examining benchmark results, MPP architecture, columnar storage, compression techniques, vectorized execution, and index designs, showing why ClickHouse delivers dramatically higher query performance on massive data sets.

Columnar StorageMPPVectorized Execution
0 likes · 8 min read
Why ClickHouse Outperforms MySQL: Deep Dive into Architecture and Benchmarks
ITPUB
ITPUB
Sep 11, 2024 · Big Data

Is Storage‑Compute Separation the Future? Unpacking the Lakehouse Debate

The article examines the concepts of storage‑compute separation and the lake‑warehouse (lakehouse) model, tracing their evolution from physical Hadoop clusters to containerized compute and object storage, and argues that true separation requires MPP systems to adopt open standards, effectively merging lake and warehouse architectures.

Big Data ArchitectureHadoopLakehouse
0 likes · 7 min read
Is Storage‑Compute Separation the Future? Unpacking the Lakehouse Debate
StarRocks
StarRocks
Aug 9, 2024 · Big Data

How Pinterest Cut Query Latency by 50% with StarRocks Migration

Pinterest migrated its Partner Insights analytics from Druid to StarRocks, achieving a 50% reduction in p90 latency, a six‑fold cost‑performance improvement, and simplified data ingestion, illustrating the benefits of a modern MPP database for real‑time ad analytics.

AnalyticsMPPPinterest
0 likes · 6 min read
How Pinterest Cut Query Latency by 50% with StarRocks Migration
Wukong Talks Architecture
Wukong Talks Architecture
Jul 23, 2024 · Databases

An Overview of StarRocks: Architecture, Features, and Performance Benchmarks

StarRocks, an open‑source, high‑performance MPP analytical database under the Linux Foundation, offers vectorized engines, CBO optimizer, materialized views, and storage‑compute separation, integrates with BI tools and data lakes, and demonstrates superior query speed in benchmark tests against ClickHouse, Druid, and Trino.

Analytical DatabaseData LakehouseMPP
0 likes · 10 min read
An Overview of StarRocks: Architecture, Features, and Performance Benchmarks
Tencent Cloud Developer
Tencent Cloud Developer
Jul 11, 2024 · Databases

LibraDB Execution Engine Architecture Evolution and Optimization

LibraDB, the column‑store replica of TDSQL MySQL, has evolved its execution engine from a simple scatter‑gather model to a vectorized SMP pipeline that integrates MPP parallelism, asynchronous I/O, SIMD‑accelerated aggregation and join operators, work‑stealing, and runtime filters, thereby fully exploiting CPU, memory, network and disk resources for both OLTP and analytical queries.

Execution EngineHash JoinMPP
0 likes · 22 min read
LibraDB Execution Engine Architecture Evolution and Optimization
dbaplus Community
dbaplus Community
Jul 10, 2024 · Databases

Why ClickHouse Dominates OLAP Performance: An In‑Depth Architecture Guide

This article explains ClickHouse’s columnar, MPP‑based design, block compression, LSM pre‑sorting, sparse and skip‑list indexing, and vectorized execution, while also discussing its high‑frequency write challenges, concurrency limits, and production‑grade issues such as Zookeeper load and resource management.

Columnar DatabaseLSMMPP
0 likes · 11 min read
Why ClickHouse Dominates OLAP Performance: An In‑Depth Architecture Guide
StarRocks
StarRocks
Jun 6, 2024 · Big Data

Why StarRocks Beats Trino: A Deep Technical Comparison

This article provides a detailed technical comparison between StarRocks and Trino, covering their shared MPP architecture, cost‑based optimizer, pipeline execution, ANSI SQL support, differences in vectorized execution, materialized view capabilities, caching systems, data source connectors, benchmark results, high‑availability designs, join algorithms, and real‑world user case studies.

Big DataCacheMPP
0 likes · 20 min read
Why StarRocks Beats Trino: A Deep Technical Comparison
Baidu Geek Talk
Baidu Geek Talk
Apr 10, 2024 · Big Data

TDA: A One‑Stop Self‑Service BI Platform – Architecture, Challenges, and Solutions

The article presents Turing Data Analysis (TDA), a self‑service BI platform that replaces fragile traditional pipelines with a unified DWD‑based data model, drag‑and‑drop analytics, multi‑engine query optimization and caching, delivering sub‑10‑second queries on billions of rows, fine‑grained permissions, and rapid dashboard creation, while reporting significant usage growth and outlining AI‑driven future enhancements.

BIBig DataData Platform
0 likes · 15 min read
TDA: A One‑Stop Self‑Service BI Platform – Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
May 27, 2023 · Big Data

Building and Practicing the Performance Assurance System of YouShu BI

This article presents an in‑depth overview of the YouShu BI product, outlines the high‑concurrency performance challenges faced by enterprise BI, and details the multi‑layer performance architecture—including front‑end, back‑end, data engine, and data source layers—along with smart caching, MPP acceleration, materialized views, and the Data Doctor operations that together ensure low‑latency, reliable analytics for large‑scale users.

BIData PlatformMPP
0 likes · 16 min read
Building and Practicing the Performance Assurance System of YouShu BI
DataFunTalk
DataFunTalk
May 6, 2023 · Databases

Apache Doris: Overview, Data Lake Analysis Architecture, Community Development and Future Roadmap

This article provides a comprehensive overview of Apache Doris, detailing its origins, MPP‑based analytical capabilities, data‑lake integration techniques, recent architectural enhancements, performance optimizations, community growth, and upcoming development plans, while also addressing common user questions.

Analytical DatabaseApache DorisBig Data
0 likes · 20 min read
Apache Doris: Overview, Data Lake Analysis Architecture, Community Development and Future Roadmap
ITPUB
ITPUB
Mar 14, 2023 · Big Data

How to Build Real-Time Active‑Active Disaster Recovery for OLAP MPP Clusters

This article explains why disaster‑recovery and active‑active architectures are essential for OLAP MPP data‑warehouse clusters, outlines the specific RPO/RTO requirements for batch and real‑time workloads, and compares several data‑synchronization techniques and active‑active deployment models with their advantages and drawbacks.

Active-ActiveMPPOLAP
0 likes · 12 min read
How to Build Real-Time Active‑Active Disaster Recovery for OLAP MPP Clusters
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Feb 24, 2023 · Big Data

What Makes MPP Databases the Powerhouse Behind Modern Data Analytics?

MPP (Massive Parallel Processing) databases, designed for large‑scale analytical workloads, use distributed, shared‑nothing architectures with multiple control and compute nodes, offering high scalability, diverse data‑sharding strategies, and powerful SQL compatibility, as illustrated by vendors like Teradata, Vertica, Greenplum, and emerging open‑source solutions.

Big DataGreenplumMPP
0 likes · 15 min read
What Makes MPP Databases the Powerhouse Behind Modern Data Analytics?
ITPUB
ITPUB
Feb 13, 2023 · Databases

How Apache Doris Enables Cloud‑Native Real‑Time Data Warehousing for Log Analytics

Based on a DTCC2022 presentation, this article explains Apache Doris's high‑performance MPP architecture, its cloud‑native extensions in SelectDB, and how they solve large‑scale log storage and analysis with superior write throughput, storage efficiency, and interactive query speed.

Apache DorisMPPReal-time analytics
0 likes · 11 min read
How Apache Doris Enables Cloud‑Native Real‑Time Data Warehousing for Log Analytics
ITPUB
ITPUB
Jan 3, 2023 · Databases

How DragonF MPP DB Redefines Cloud‑Native Data Warehousing at Massive Scale

The article details the design, core features, and real‑world performance of the DragonF MPP DB, a cloud‑native, compute‑storage‑separated database that overcomes traditional MPP limitations, supports millions of daily jobs, and outlines its future roadmap for ultra‑large‑scale data platforms.

Big DataCloud NativeMPP
0 likes · 11 min read
How DragonF MPP DB Redefines Cloud‑Native Data Warehousing at Massive Scale
DataFunSummit
DataFunSummit
Dec 10, 2022 · Databases

StarRocks in the Modern Data Stack: Architecture Evolution, Typical Applications, and Performance Insights

This article presents a comprehensive overview of StarRocks within the modern data stack, covering the evolution of MPP architectures, typical industry use cases, core features, performance benchmark comparisons, real‑time data‑warehouse construction methods, CDP and lakehouse analytics, as well as short‑term roadmap plans and a brief Q&A.

CDPMPPStarRocks
0 likes · 11 min read
StarRocks in the Modern Data Stack: Architecture Evolution, Typical Applications, and Performance Insights
StarRocks
StarRocks
Oct 13, 2022 · Databases

Inside StarRocks: How the Pipeline Execution Engine Boosts Query Performance

This article explains the core concepts, architecture, and code logic of StarRocks' Pipeline execution framework, covering ExecPlan, PlanFragment, Fragment Instance, ExecNode, SourceOperator, SinkOperator, PipelineDriver scheduling, asynchronous handling of blocking operations, and the roles of FE and BE in MPP scheduling.

Execution EngineMPPPipeline
0 likes · 13 min read
Inside StarRocks: How the Pipeline Execution Engine Boosts Query Performance
Big Data Technology Architecture
Big Data Technology Architecture
Aug 13, 2022 · Big Data

Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices

This article details Xiaomi's three‑year journey of adopting Apache Doris across dozens of internal services, describing the transition from a Spark‑SQL‑based Lambda architecture to a unified MPP database, performance benchmarks, data ingestion pipelines, compaction tuning, two‑phase commit, single‑replica writes, monitoring, and community contributions.

Apache DorisMPPReal‑Time Analytics
0 likes · 19 min read
Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices
DataFunTalk
DataFunTalk
Aug 2, 2022 · Databases

Apache Doris 1.0: Features, Architecture, Performance Improvements and Future Roadmap

This article introduces Apache Doris 1.0, detailing its simplified architecture, high‑concurrency support, MPP execution engine, vectorized engine, memory‑controlled stability, multi‑source integration, upcoming lake‑house unification, storage‑compute separation, real‑time ingestion, and community growth.

Analytical DatabaseApache DorisMPP
0 likes · 18 min read
Apache Doris 1.0: Features, Architecture, Performance Improvements and Future Roadmap
Big Data Technology & Architecture
Big Data Technology & Architecture
May 30, 2022 · Big Data

Doris Architecture, Principles, and Key Features Overview

This article provides a comprehensive overview of Doris's architecture—including its FE and BE components, metadata management, data organization, execution planning—and details its major features such as adaptive join aggregation, vectorized execution, materialized views, and Elasticsearch integration, supplemented with example DDL and query code.

Big DataDatabase ArchitectureElasticsearch
0 likes · 7 min read
Doris Architecture, Principles, and Key Features Overview
DataFunSummit
DataFunSummit
May 14, 2022 · Databases

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

This article presents the cloud‑native redesign of ClickHouse, covering its current technical limitations in storage and computation, the proposed storage‑compute separation with DDL task management, multi‑replica and CommitLog mechanisms, and a new MPP query layer to meet future data‑warehouse demands such as real‑time analytics, flexibility, high throughput, low cost, and support for semi‑structured data.

Big DataCloud NativeDistributed Query
0 likes · 15 min read
Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer
StarRocks
StarRocks
Apr 24, 2022 · Databases

How StarRocks Transforms a SQL Query into Distributed Execution: A Deep Dive

This article explains how StarRocks converts a SQL statement into an optimal distributed physical execution plan, schedules the plan across compute nodes, and runs it using MPP, pipeline parallelism, and vectorized execution to achieve near‑linear performance scaling.

CBO optimizerMPPSQL query processing
0 likes · 15 min read
How StarRocks Transforms a SQL Query into Distributed Execution: A Deep Dive
StarRocks
StarRocks
Apr 13, 2022 · Big Data

How StarRocks Achieves Lightning‑Fast Data Lake Analytics

This article explains StarRocks' streamlined architecture, cost‑based optimizer, massively parallel processing and vectorized engine, and how they enable high‑performance queries over data stored in Hive, Iceberg, Hudi and other lake formats, backed by benchmark results and future roadmap details.

Big DataCBOData Lake
0 likes · 19 min read
How StarRocks Achieves Lightning‑Fast Data Lake Analytics
StarRocks
StarRocks
Mar 29, 2022 · Big Data

How StarRocks Handles PB‑Scale Real‑Time Analytics with High Availability

This article explains how StarRocks manages petabyte‑level user behavior logs, ads and orders through a shared‑nothing architecture, tablet‑based data distribution, MPP compute, high‑availability metadata, real‑time mini‑batch ingestion, and online schema changes, enabling 24/7 analytical services for diverse internet companies.

MPPOnline Schema ChangeReal-time analytics
0 likes · 11 min read
How StarRocks Handles PB‑Scale Real‑Time Analytics with High Availability
DataFunSummit
DataFunSummit
Feb 20, 2022 · Databases

Understanding TiDB Architecture and Real‑Time Application Scenarios

This article explains TiDB's HTAP architecture, covering industry challenges, the row‑store TiKV and column‑store TiFlash design, MPP integration in TiDB 5.0, and a range of real‑time use cases such as dashboards, reporting, and data‑warehouse pipelines.

Database ArchitectureHTAPMPP
0 likes · 16 min read
Understanding TiDB Architecture and Real‑Time Application Scenarios
dbaplus Community
dbaplus Community
Dec 23, 2021 · Databases

Building a 16,000‑Node Cloud‑Native MPP Data Warehouse: Lessons from CCB

The article details how China Construction Bank's fintech arm designed, deployed, and operated a cloud‑native, three‑layer MPP data warehouse spanning 16,000 servers, covering architectural choices, performance gains, operational automation, and high‑availability strategies for ultra‑large scale workloads.

Cloud NativeDatabase ArchitectureMPP
0 likes · 10 min read
Building a 16,000‑Node Cloud‑Native MPP Data Warehouse: Lessons from CCB
ITPUB
ITPUB
Sep 13, 2021 · Big Data

MapReduce vs MPP: Choosing the Right Engine for Global Data Warehousing

A team of engineers at MBI debates the merits of MapReduce, MPP, and Hive for their KeepS global data‑warehouse, discussing technical differences, scalability, concurrency, and the feasibility of mixed batch engines while navigating budget and operational constraints.

Cluster ComputingGrid ComputingMPP
0 likes · 20 min read
MapReduce vs MPP: Choosing the Right Engine for Global Data Warehousing
Big Data Technology Architecture
Big Data Technology Architecture
Jun 4, 2021 · Big Data

Types of OLAP Data Warehouses and Performance Optimization Techniques

This article explains the various classifications of OLAP data warehouses—including MOLAP, ROLAP, HOLAP, and HTAP—based on data volume and modeling, reviews common open‑source ROLAP products, and details performance‑boosting techniques such as MPP architecture, cost‑based optimization, vectorized execution, and storage optimizations.

MPPOLAPcost‑based optimization
0 likes · 27 min read
Types of OLAP Data Warehouses and Performance Optimization Techniques
DataFunTalk
DataFunTalk
Mar 24, 2021 · Big Data

Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform

This article details how KuJiaLe's big data team replaced their legacy ADB and Presto clusters with a DorisDB MPP database, achieving sub‑second query latency, unified real‑time and offline analytics, simplified ETL pipelines, and significant cost savings while supporting billion‑row tables and high‑QPS workloads.

Big DataDorisDBETL
0 likes · 9 min read
Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform
DataFunTalk
DataFunTalk
Nov 23, 2020 · Big Data

Choosing OLAP Solutions for Large-Scale Data at Youku

The article examines the challenges big data brings to traditional technologies and surveys major OLAP solutions—MPP, batch processing, and pre‑computation—including Greenplum, Druid, Kylin, and Hadoop‑based engines, then outlines Youku’s specific use‑case selections for real‑time APIs, BI reporting, and ad‑hoc analysis.

MPPOLAPPrecomputation
0 likes · 13 min read
Choosing OLAP Solutions for Large-Scale Data at Youku
DataFunSummit
DataFunSummit
Nov 12, 2020 · Big Data

OLAP Engine Selection and Challenges in Large-Scale Data at Youku

This article explores the challenges big data brings to traditional data technologies and reviews various OLAP solutions—including MPP, batch processing, pre‑computation, and Hadoop‑based engines—while detailing Youku’s specific business scenarios and how different OLAP engines are selected to meet performance, scalability, and real‑time analysis requirements.

AnalyticsBig DataMPP
0 likes · 14 min read
OLAP Engine Selection and Challenges in Large-Scale Data at Youku
Tencent Cloud Developer
Tencent Cloud Developer
Sep 9, 2020 · Big Data

Tencent Game Marketing Deduplication Service: Technical Evolution from TDW to ClickHouse

Tencent’s game marketing analysis system “EAS” evolved from inefficient TDW HiveSQL jobs and file‑heavy real‑time pipelines to a scalable ClickHouse‑based deduplication service that processes hundreds of thousands of daily activity counts in sub‑second time, offering fast, reliable, and maintainable participant deduplication for massive marketing campaigns.

LevelDBMPPOLAP
0 likes · 10 min read
Tencent Game Marketing Deduplication Service: Technical Evolution from TDW to ClickHouse
dbaplus Community
dbaplus Community
Aug 4, 2020 · Databases

How Doris Powers Meituan’s Real‑Time Data Warehouse: ROLAP vs MOLAP Lessons

This article examines Meituan’s data warehouse evolution, detailing the limitations of MOLAP with Kylin, the adoption of Doris‑driven ROLAP using MPP technology, and the practical optimizations—such as join predicate pushdown, concurrent execution, colocate join, and bitmap aggregation—that improve real‑time analytics and reduce costs.

MOLAPMPPROLAP
0 likes · 19 min read
How Doris Powers Meituan’s Real‑Time Data Warehouse: ROLAP vs MOLAP Lessons
Big Data Technology Architecture
Big Data Technology Architecture
Jun 28, 2020 · Databases

Understanding OLAP Data Warehouse Types, Architectures, and Performance Optimizations

This article provides a comprehensive overview of OLAP data warehouses, covering classification by data volume and modeling, detailed explanations of MOLAP, ROLAP, HOLAP and HTAP, common open‑source implementations, and a deep dive into performance‑boosting techniques such as MPP architectures, cost‑based optimization, vectorized execution, dynamic code generation, storage compression, runtime filters and resource management.

Dynamic Code GenerationMPPOLAP
0 likes · 25 min read
Understanding OLAP Data Warehouse Types, Architectures, and Performance Optimizations
Hulu Beijing
Hulu Beijing
Oct 28, 2019 · Big Data

How Hulu Uses Big Data to Power Precise Advertising and Real‑Time Streaming

At a Tsinghua University forum, Hulu presented a comprehensive overview of its big‑data solutions for advertising and streaming, covering challenges of massive, complex data, the limits of MySQL, and advanced techniques using HBase, Protobuf, Redis batch pipelines, and its own MPP engine Nesto for high‑performance, scalable analytics.

AdvertisingHBaseMPP
0 likes · 6 min read
How Hulu Uses Big Data to Power Precise Advertising and Real‑Time Streaming
Tencent Cloud Developer
Tencent Cloud Developer
Jul 18, 2019 · Big Data

Tencent iData Analysis Center: Why We Chose Spark as Our Computing Platform

Tencent’s iData analysis center selected Spark as its new computing platform because, unlike ElasticSearch, TiDB, and other MPP solutions, Spark offers iterative processing, shuffle support, robust SQL and DAG scheduling, and flexible SMP‑style data exchange, enabling efficient OLAP on billions of game‑user records.

Big DataData PlatformMPP
0 likes · 13 min read
Tencent iData Analysis Center: Why We Chose Spark as Our Computing Platform
dbaplus Community
dbaplus Community
Nov 4, 2018 · Databases

How Spark Turns Traditional Databases into Powerful OLAP Engines

This article examines why traditional relational databases like MySQL struggle with analytical workloads, compares ROLAP and MOLAP approaches, explains Spark’s architecture and its advantages for OLAP, and details how Alibaba Cloud’s DRDS HTAP leverages a Spark‑based engine to deliver real‑time distributed query processing.

Distributed SystemsHTAPMPP
0 likes · 11 min read
How Spark Turns Traditional Databases into Powerful OLAP Engines
Architecture Digest
Architecture Digest
Jun 22, 2018 · Databases

Distributed Databases for OLAP: MPP, Hadoop Ecosystem, and Like‑Mesa (ClickHouse/Palo) Overview

This article examines the evolution and classification of distributed databases for OLAP workloads, comparing traditional RDBMS, MPP solutions such as Teradata and Greenplum, Hadoop‑based ecosystems, and newer architectures like ClickHouse and Palo, while highlighting their architectural traits, strengths, and limitations.

HadoopMPPNewSQL
0 likes · 17 min read
Distributed Databases for OLAP: MPP, Hadoop Ecosystem, and Like‑Mesa (ClickHouse/Palo) Overview
Baidu Waimai Technology Team
Baidu Waimai Technology Team
Apr 20, 2017 · Databases

Greenplum (GPDB) Architecture, Features, and Operational Tools Overview

This article explains Greenplum's MPP architecture, master‑segment design, high‑availability, interconnect network, rich management tools, parallel query planning, data loading techniques, and additional capabilities such as LDAP authentication and resource queues, demonstrating why it is a strong next‑generation big‑data query engine.

Big DataGreenplumMPP
0 likes · 15 min read
Greenplum (GPDB) Architecture, Features, and Operational Tools Overview
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Mar 9, 2017 · Databases

Master Greenplum Table Design & Performance Optimization: Practical Tips

This article explains what Greenplum is, its MPP shared‑nothing architecture, and provides concrete table‑design principles, distribution‑column strategies, indexing guidance, vacuum and table‑rebuilding techniques, as well as SQL, join, insert, update/delete, and resource‑queue optimizations for better performance.

Database designDistributed TablesGreenplum
0 likes · 9 min read
Master Greenplum Table Design & Performance Optimization: Practical Tips
Architects' Tech Alliance
Architects' Tech Alliance
Nov 19, 2016 · Databases

An Overview of Greenplum Database Architecture and Core Components

Greenplum is an open‑source, massively parallel processing (MPP) database built on PostgreSQL, offering ANSI‑SQL compliance, distributed ACID transactions, linear scalability, polymorphic storage, advanced optimizers, and extensive ecosystem integrations, making it suitable for large‑scale data warehousing, analytics, and big‑data workloads.

Data WarehousingGreenplumMPP
0 likes · 15 min read
An Overview of Greenplum Database Architecture and Core Components
ITPUB
ITPUB
Jun 29, 2016 · Big Data

Why OLTP Falls Short for Big Data: OLAP, Hadoop & MPP Explained

The article explains how traditional OLTP systems cannot satisfy modern big‑data analytics needs and compares OLAP, Hadoop, and MPP architectures, highlighting their data processing models, scalability, cloud‑based managed services, and practical recommendations for building effective data warehouses.

Big DataCloud ServicesHadoop
0 likes · 21 min read
Why OLTP Falls Short for Big Data: OLAP, Hadoop & MPP Explained
dbaplus Community
dbaplus Community
Feb 22, 2016 · Databases

Mastering Greenplum: Planning, Data Modeling, and Daily Ops Best Practices

This article delivers a comprehensive guide to Greenplum deployment, covering early architecture planning, data‑model design, daily maintenance best practices, system‑table management, diagnostic tools like gpcheckcat, and detailed troubleshooting techniques for persistent tables and other common issues.

GreenplumMPPSystem Tables
0 likes · 13 min read
Mastering Greenplum: Planning, Data Modeling, and Daily Ops Best Practices

Development Trends and Challenges of Large‑Scale Parallel Databases

Since the 1970s databases have become essential middleware, and modern large‑scale parallel databases, designed for extreme parallelism on clustered hardware, face trade‑offs in performance, scalability, and fault tolerance, prompting a shift toward cloud‑native, micro‑service architectures and new hardware such as SSDs and memory‑centric designs.

In-MemoryMPPMicroservices
0 likes · 23 min read
Development Trends and Challenges of Large‑Scale Parallel Databases