Databases 33 min read

Boost PostgreSQL Performance: Essential DBA Tricks Every Developer Should Know

This article presents a collection of practical PostgreSQL DBA techniques—including selective updates, bulk‑load optimizations, CTE‑based deduplication, partial and BRIN indexes, correlation tuning, and transactional DDL tricks—to help developers reduce database bottlenecks, improve query speed, and avoid common performance pitfalls.

DevOps Coach

Aug 1, 2020

Boost PostgreSQL Performance: Essential DBA Tricks Every Developer Should Know

Introduction

DBAs are often a bottleneck in development pipelines; DevOps promotes cross‑role learning to mitigate this. The article, originally from hakibenita.com, shares practical DBA techniques for application developers to reduce reliance on expert DBAs and accelerate delivery.

Infrastructure vs. Application DBA

An infrastructure DBA provisions databases, handles backups and replication, and occasionally tweaks instance settings. An application DBA receives a clean database, designs schemas, creates indexes, writes ETL scripts, and maintains stored procedures, typically working as part of a development team.

Update Only Needed Rows

UPDATE is expensive; limiting it to rows that actually need changes can cut execution time dramatically. Updating 1,010,000 rows took about 1.5 s, while updating only the 10,000 rows that required changes reduced the time to under 300 ms.

db = # UPDATE users SET email = lower(email);
UPDATE 1010000
Time: 1583.935 ms (00:01.584)

# UPDATE users SET email = lower(email) WHERE email != lower(email);
UPDATE 10000
Time: 299.470 ms

Disable Constraints and Indexes During Bulk Load

Loading large volumes of data is faster when constraints and indexes are temporarily removed. The example creates three tables, inserts one million rows, then adds constraints and indexes, showing a drop from 15.4 s to 3.1 s when the data is loaded first and the indexes added afterward.

DROP TABLE IF EXISTS product CASCADE;
CREATE TABLE product (
    id serial PRIMARY KEY,
    name TEXT NOT NULL,
    price INT NOT NULL
);
INSERT INTO product (name, price)
SELECT random()::text, (random()*1000)::int FROM generate_series(0,10000);

DROP TABLE IF EXISTS sale CASCADE;
CREATE TABLE sale (
    id serial PRIMARY KEY,
    created timestamptz NOT NULL,
    product_id int NOT NULL,
    customer_id int NOT NULL
);

INSERT INTO sale (created, product_id, customer_id)
SELECT now() - interval '1 hour' * random() * 1000,
       (random()*10000)::int + 1,
       (random()*100000)::int + 1
FROM generate_series(1,1000000);

-- Adding constraints and indexes afterwards
ALTER TABLE sale ADD CONSTRAINT sale_product_fk FOREIGN KEY (product_id) REFERENCES product(id);
ALTER TABLE sale ADD CONSTRAINT sale_customer_fk FOREIGN KEY (customer_id) REFERENCES customer(id);
CREATE INDEX sale_created_ix ON sale(created);

Using UNLOGGED Tables

UNLOGGED tables skip WAL writes, making them ideal for temporary staging tables in ETL pipelines where durability is not required.

CREATE UNLOGGED TABLE staging_table (/* table definition */);

WITH and RETURNING

Common Table Expressions (CTE) let you perform the entire deduplication process in a single statement. The example builds a CTE to find duplicate users, updates orders to reference the canonical user, deletes the duplicates, and uses RETURNING to report affected rows.

WITH duplicate_users AS (
    SELECT min(id) AS convert_to_user,
           array_remove(array_agg(id), min(id)) AS convert_from_users
    FROM users
    GROUP BY lower(email)
    HAVING count(*) > 1
),
update_orders_of_duplicate_users AS (
    UPDATE orders o
    SET user_id = du.convert_to_user
    FROM duplicate_users du
    WHERE o.user_id = ANY(du.convert_from_users)
    RETURNING o.id
),
delete_duplicate_user AS (
    DELETE FROM users
    WHERE id IN (SELECT unnest(convert_from_users) FROM duplicate_users)
    RETURNING id
)
SELECT (SELECT count(*) FROM update_orders_of_duplicate_users) AS orders_updated,
       (SELECT count(*) FROM delete_duplicate_user) AS users_deleted;

Note: In PostgreSQL, the sub‑statements inside a WITH clause are executed concurrently, so their order is not guaranteed. Dependencies must be expressed explicitly.

Avoid Indexes on Low‑Selectivity Columns

When a boolean column is true for 90 % of rows, an index on that column helps queries that filter for the rare false value but adds overhead for the common true case. The article shows creating an index on activated, then comparing execution plans for queries that select activated vs. not‑activated users.

CREATE INDEX users_activated_ix ON users(activated);
EXPLAIN SELECT * FROM users WHERE NOT activated;
-- Bitmap Heap Scan using the index
EXPLAIN SELECT * FROM users WHERE activated;
-- Seq Scan because most rows are activated

Partial Indexes

Partial indexes index only a subset of rows, reducing size and improving performance for selective queries. Example creates an index on id for rows where activated = false.

CREATE INDEX users_unactivated_partial_ix ON users(id) WHERE NOT activated;

Load Data Sorted for Better Index Usage

Loading data ordered by the indexed column improves correlation, allowing the planner to choose a plain index scan instead of a bitmap scan. The article demonstrates this with a sale_fact table and a date range query.

TRUNCATE sale_fact;
INSERT INTO sale_fact (username, sold_at)
SELECT md5(random()::text),
       '2020-01-01'::date + (interval '1 day') * round(random()*365*2)
FROM generate_series(1,100000)
ORDER BY sold_at;
EXPLAIN ANALYZE SELECT * FROM sale_fact WHERE sold_at BETWEEN '2020-07-01' AND '2020-07-31';
-- Index Scan, ~2.3 ms

Correlation

Correlation measures how well the physical order of rows matches the logical order of a column. A correlation near 1 means rows are stored in order, making index scans cheap; near 0 means they are scattered, favoring bitmap scans.

CLUSTER Command

The CLUSTER command physically reorders a table based on a specified index, improving correlation without needing to sort data on insert. The article shows correlation before and after clustering the sale_fact table on sold_at.

CLUSTER sale_fact USING sale_fact_sold_at_ix;
ANALYZE sale_fact;
SELECT tablename, attname, correlation FROM pg_stats WHERE tablename='sale_fact';
-- sold_at correlation becomes 1

BRIN Indexes for Highly Correlated Columns

BRIN (Block Range Index) stores min/max values per page range and is ideal for very large tables where a column has natural ordering (e.g., timestamps, serial IDs). The article creates a BRIN index on sold_at with the default pages_per_range = 128, then shows query plans and the effect of reducing pages_per_range to 64, 8, etc., on rows removed by index recheck and execution time.

CREATE INDEX sale_fact_sold_at_bix ON sale_fact USING BRIN(sold_at) WITH (pages_per_range = 128);
EXPLAIN ANALYZE SELECT * FROM sale_fact WHERE sold_at BETWEEN '2020-07-01' AND '2020-07-31';
-- Bitmap Heap Scan with many rows removed by recheck
CREATE INDEX sale_fact_sold_at_bix64 ON sale_fact USING BRIN(sold_at) WITH (pages_per_range = 64);
EXPLAIN ANALYZE SELECT * FROM sale_fact WHERE sold_at BETWEEN '2020-07-01' AND '2020-07-31';
-- Fewer rows removed, faster execution

Make Indexes “Invisible” with Transactional DDL

PostgreSQL’s transactional DDL lets you drop an index inside a transaction, generate an execution plan as if the index didn’t exist, and then roll back, leaving the index intact. This is useful for testing planner behavior without permanently removing the index.

BEGIN;
DROP INDEX sale_fact_sold_at_ix;
EXPLAIN SELECT * FROM sale_fact WHERE sold_at BETWEEN '2020-07-01' AND '2020-07-31';
-- Shows Seq Scan
ROLLBACK;
-- Index is still present

Schedule Long‑Running Jobs Off the Hour

Scheduling periodic jobs on round‑hour boundaries can cause load spikes. Adding a random delay (e.g., RandomizedDelaySec in systemd timers) spreads the load and avoids contention.

Conclusion

The presented DBA tricks range from simple updates to advanced indexing strategies. Understanding and applying them can significantly improve PostgreSQL performance, reduce operational bottlenecks, and make developers more self‑sufficient.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL devops PostgreSQL Indexes CTE BRIN

Written by

DevOps Coach

Master DevOps precisely and progressively.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.