Databases 10 min read

Scaling PostgreSQL for Multi‑Terabyte Databases: Indexes, Partitioning, Tablespaces, Parallelism, and Replication

This article explains how to extract maximum performance and scalability from PostgreSQL for multi‑terabyte workloads by leveraging specialized indexes, declarative partitioning, tablespaces, parallel query execution, read‑only replica load‑balancing, and foreign‑table sharding techniques.

Architects Research Society

Sep 2, 2020

Scaling PostgreSQL for Multi‑Terabyte Databases: Indexes, Partitioning, Tablespaces, Parallelism, and Replication

After a recent conversation with a DBA who began migrating a large Oracle database to PostgreSQL, the author notes a lack of comprehensive articles covering PostgreSQL's features for squeezing the last bits of performance out of hardware while safely handling several terabytes of data.

Standard PostgreSQL Features

PostgreSQL offers out‑of‑the‑box capabilities that can handle multi‑terabyte workloads when used correctly.

Lightweight / Specialized Indexes

Using partial, BRIN, GIN, and experimental BLOOM indexes can dramatically reduce index size for large OLTP systems, avoiding the common mistake of relying solely on default B‑tree indexes.

Partial indexes store only a subset of rows, e.g., indexing only orders with status “FINISHED”.

GIN excels at columns with many repeated values, storing each unique value once.

BRIN provides lossy compression by indexing only the minimum and maximum values of block ranges, ideal for ordered time‑series data.

BLOOM can offer up to 20× higher efficiency for suitable bitmap‑style searches.

Table Partitioning

Declarative partitioning introduced in PostgreSQL 10 (improved in 11) allows clean separation of hot and cold data, reducing vacuum and buffer usage without application changes.

Tablespaces

Tablespaces let you place tables and indexes on different storage media, enabling cost‑effective placement of cold data on slower disks while keeping hot data on fast storage.

Maximizing Parallelism

Since PostgreSQL 9.6, many operations can run in parallel; the default max_parallel_workers_per_gather is 2, and newer releases add parallel hash joins, index scans, and UNIONs.

Query Load Balancing with Replicas

Using streaming replicas (e.g., with Patroni) you can scale read‑only workloads across multiple nodes, while writes remain on a single primary.

Hybrid / Foreign Tables

Foreign Data Wrappers (FDW) let you query external data sources—files, logs, Twitter, S3, or even other RDBMS—as if they were regular tables, often with compression that reduces data size 10‑20×.

Extensions like c_store provide columnar storage for faster analytical queries.

Foreign Table Inheritance (Sharding)

Combining partitioning with foreign tables enables sharding across remote PostgreSQL instances or other databases, allowing data locality while keeping a unified query interface.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

replication PostgreSQL Indexes parallelism database scaling partitioning Tablespaces

Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.