Databases 8 min read

Boost PostgreSQL Query Performance with CREATE STATISTICS: Real‑World Examples

PostgreSQL 11+ lets you define custom statistics with CREATE STATISTICS, dramatically improving query plans for large tables and complex predicates, as shown by step‑by‑step examples that cut execution time by up to 300× and illustrate different statistic kinds such as ndistinct and dependencies.

ITPUB

Aug 30, 2023

Boost PostgreSQL Query Performance with CREATE STATISTICS: Real‑World Examples

Why Use Extended Statistics in PostgreSQL

Unlike MySQL, PostgreSQL supports user‑defined statistics starting from version 11. For large tables or queries with complex predicates, default statistics may not capture column relationships, causing suboptimal plans even when indexes exist. Creating extended statistics helps the planner generate more efficient execution strategies.

Example 1: Simple Date‑Based Statistics

create table test_t (time_d timestamp, value_d numeric DEFAULT random());

insert into test_t (time_d)
  SELECT * FROM generate_series('2022-01-01', '2023-06-30', '5 second'::interval);

create statistics test_t_day on (date_trunc('day', time_d)) from test_t;

analyze test_t;

explain analyze SELECT date_trunc('day', time_d) as days, count(*)
FROM test_t GROUP BY 1;

After creating the statistic, the plan switches to a parallel execution with a Gather Merge aggregation, reducing the row count to 1,090 and cutting execution time roughly in half compared to a full‑table scan without any indexes.

Explain analyze output before and after create statistics

Example 2: Dependency Statistics on Multiple Columns

create table test_t (id serial primary key, age int, ages int);

insert into test_t (age, ages)
  SELECT i/100, i/500 FROM generate_series(1,2000000) s(i);

create statistics test_t_s (dependencies) on age, ages from test_t;

analyze test_t;

explain analyze select * from test_t where age = 1 and ages = 0 limit 1;

Dropping the statistic and re‑running the query shows a dramatic slowdown; with the dependency statistic the query runs over 300 times faster because the planner understands the correlation between age and ages.

Performance gain with dependency statistics

Other Statistic Types

The main built‑in statistic kinds are:

ndistinct : useful for multi‑column GROUP BY queries, helping the planner estimate distinct value counts.

dependencies : captures correlation between columns, beneficial for predicates involving several related columns.

custom : user‑defined functions can be used to compute specialized statistics (similar to the first example).

MVC : multi‑variate statistics; powerful but complex, generally not recommended for routine use.

In practice, dependencies and ndistinct cover most performance‑critical scenarios.

Work_mem, Hash vs. Group Aggregation

If work_mem is sufficient, PostgreSQL can perform a hash aggregation; otherwise it falls back to a grouped aggregation that may spill to disk. Extended statistics can influence the planner’s choice by providing more accurate row‑count estimates.

Conclusion

Creating extended statistics with CREATE STATISTICS is a powerful, low‑overhead technique to improve query plans for large tables, especially when column correlations exist. By selecting the appropriate statistic type (ndistinct or dependencies) and ensuring up‑to‑date analysis, execution times can be reduced dramatically, sometimes by hundreds of times.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance PostgreSQL CREATE STATISTICS extended statistics

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.