Achieving Sub‑Second Queries on 1.2 B‑Row PostgreSQL Using BRIN, pg_cron & Query Folding
The article recounts how a PostgreSQL instance on a modest 2‑CPU, 4 GB VM handling 1.2 billion rows was dramatically accelerated by adding BRIN indexes, scheduling maintenance with pg_cron, applying query folding and tuning memory and parallel settings, achieving sub‑second query times without additional hardware.
Background: A PostgreSQL server running on a tiny 2‑core, 4 GB virtual machine was tasked with analyzing a table containing more than 1.2 billion rows. Initial performance was poor – a simple COUNT(*) took 27 seconds and complex joins exceeded one minute.
Why hardware upgrade isn’t the only answer
The team’s instinct was to provision a larger server, but the author demonstrated that PostgreSQL’s built‑in features can unlock massive speedups on the existing hardware.
Using BRIN indexes
Traditional B‑Tree indexes become bulky and memory‑hungry on tables with hundreds of millions of rows. BRIN (Block Range Index) stores a compact summary for each disk block, using only a few kilobytes instead of gigabytes.
CREATE INDEX idx_logs_brin_ts ON logs USING brin(timestamp);This single statement reduced the index size from 24 GB to 32 MB. A count query on the timestamp range dropped from 11.8 seconds to 0.9 seconds, and after further tuning stabilized around 0.7 seconds.
Automating maintenance with pg_cron
Out‑of‑date statistics and bloated tables hurt the planner. Installing the built‑in scheduler is straightforward:
sudo apt install postgresql-15-cron CREATE EXTENSION pg_cron;Nightly jobs keep the table lean and statistics fresh:
SELECT cron.schedule('vacuum_logs', '0 2 * * *', 'VACUUM ANALYZE logs');
SELECT cron.schedule('repack_logs', '0 3 * * 0', 'REINDEX TABLE logs;');Query folding for earlier filtering
Even with indexes, the planner sometimes chooses a sequential scan. By rewriting the query to filter early, performance improves dramatically.
SELECT date_trunc('day', l.timestamp), COUNT(*)
FROM logs l
JOIN users u ON l.user_id = u.id
WHERE u.country = 'US'
AND l.timestamp >= now() - interval '30 days'
GROUP BY 1;was replaced with:
WITH filtered_users AS (
SELECT id FROM users WHERE country = 'US'
)
SELECT date_trunc('day', l.timestamp), COUNT(*)
FROM logs l
WHERE l.user_id IN (SELECT id FROM filtered_users)
AND l.timestamp >= now() - interval '30 days'
GROUP BY 1;The execution time fell from 4.5 seconds to 0.73 seconds on the same hardware.
Memory and parallelism tuning
shared_buffers = 1GB
work_mem = 64MB
maintenance_work_mem = 256MB
effective_cache_size = 3GB
max_parallel_workers_per_gather = 2These settings give the planner accurate cost estimates and enable multi‑core parallel scans.
Native declarative partitioning
Partitioning the logs table by month lets PostgreSQL prune irrelevant partitions automatically.
CREATE TABLE logs (
id BIGSERIAL,
timestamp TIMESTAMP,
user_id BIGINT,
message TEXT
) PARTITION BY RANGE (timestamp);
CREATE TABLE logs_2025_01 PARTITION OF logs
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');Monthly queries became five times faster.
Performance results
Daily aggregation: 4.5 s → 0.73 s
Range query: 11.8 s → 0.9 s
Full‑table count: 27 s → 3.2 s
Disk usage: 89 GB → 42 GB
Trade‑offs and cautions
BRIN indexes excel on naturally ordered, append‑only data but perform poorly on frequently updated tables.
pg_cron tasks should be spaced to avoid I/O contention.
Setting work_mem too high on a small server can cause OOM.
Always run ANALYZE after bulk loads or partition changes.
Key takeaways
Use BRIN indexes for cold, ordered data.
Leverage pg_cron for proactive maintenance.
Apply query folding to help the planner filter early.
Tune memory parameters and enable parallel workers.
Adopt declarative partitioning to reduce scanned data.
Result: a PostgreSQL database with 1.2 billion rows delivering sub‑second query responses on a 2‑core, 4 GB machine – proof that careful tuning can rival distributed solutions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
