Boost PostgreSQL Query Performance with CREATE STATISTICS: Real‑World Examples
PostgreSQL 11+ lets you define custom statistics with CREATE STATISTICS, dramatically improving query plans for large tables and complex predicates, as shown by step‑by‑step examples that cut execution time by up to 300× and illustrate different statistic kinds such as ndistinct and dependencies.
Why Use Extended Statistics in PostgreSQL
Unlike MySQL, PostgreSQL supports user‑defined statistics starting from version 11. For large tables or queries with complex predicates, default statistics may not capture column relationships, causing suboptimal plans even when indexes exist. Creating extended statistics helps the planner generate more efficient execution strategies.
Example 1: Simple Date‑Based Statistics
create table test_t (time_d timestamp, value_d numeric DEFAULT random()); insert into test_t (time_d)
SELECT * FROM generate_series('2022-01-01', '2023-06-30', '5 second'::interval); create statistics test_t_day on (date_trunc('day', time_d)) from test_t; analyze test_t; explain analyze SELECT date_trunc('day', time_d) as days, count(*)
FROM test_t GROUP BY 1;After creating the statistic, the plan switches to a parallel execution with a Gather Merge aggregation, reducing the row count to 1,090 and cutting execution time roughly in half compared to a full‑table scan without any indexes.
Example 2: Dependency Statistics on Multiple Columns
create table test_t (id serial primary key, age int, ages int); insert into test_t (age, ages)
SELECT i/100, i/500 FROM generate_series(1,2000000) s(i); create statistics test_t_s (dependencies) on age, ages from test_t; analyze test_t; explain analyze select * from test_t where age = 1 and ages = 0 limit 1;Dropping the statistic and re‑running the query shows a dramatic slowdown; with the dependency statistic the query runs over 300 times faster because the planner understands the correlation between age and ages.
Other Statistic Types
The main built‑in statistic kinds are:
ndistinct : useful for multi‑column GROUP BY queries, helping the planner estimate distinct value counts.
dependencies : captures correlation between columns, beneficial for predicates involving several related columns.
custom : user‑defined functions can be used to compute specialized statistics (similar to the first example).
MVC : multi‑variate statistics; powerful but complex, generally not recommended for routine use.
In practice, dependencies and ndistinct cover most performance‑critical scenarios.
Work_mem, Hash vs. Group Aggregation
If work_mem is sufficient, PostgreSQL can perform a hash aggregation; otherwise it falls back to a grouped aggregation that may spill to disk. Extended statistics can influence the planner’s choice by providing more accurate row‑count estimates.
Conclusion
Creating extended statistics with CREATE STATISTICS is a powerful, low‑overhead technique to improve query plans for large tables, especially when column correlations exist. By selecting the appropriate statistic type (ndistinct or dependencies) and ensuring up‑to‑date analysis, execution times can be reduced dramatically, sometimes by hundreds of times.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
