Databases 8 min read

Why PostgreSQL Misestimates Rows When Types Are Cast – and How to Fix It

A recent production issue showed that PostgreSQL dramatically over‑estimated row counts because filter predicates performed implicit type casts, leading to sequential scans and 18‑second queries, but correcting column types or rewriting the predicates restored accurate estimates and reduced execution time to under a second.

dbaplus Community

Sep 19, 2017

Why PostgreSQL Misestimates Rows When Types Are Cast – and How to Fix It

Problem

A newly released feature caused a surge of slow queries in PostgreSQL logs. One representative query took 18 seconds, and its EXPLAIN ANALYZE plan revealed that most of the time was spent in two nested joins that performed sequential scans on large tables (over 20 million rows each) for every iteration.

Analysis

The obvious suspicion was that adding an index on the join columns would help, but the tables already had indexes and PostgreSQL could also use a hash join. A deeper look at the plan’s cost estimates showed that PostgreSQL expected only one row, while the actual row count was 5 429. The huge discrepancy stemmed from the filter clause:

Filter: (((zsize)::text = '2'::text) AND ((tmall_flg)::text = '1'::text))

The column zsize is defined as char(10) and tmall_flg as int. The implicit casts to text caused PostgreSQL to fall back to a default selectivity of 0.005, dramatically under‑estimating the number of matching rows.

When the casts were removed in a test environment, execution time dropped from over ten seconds to under one second.

In production, a quick workaround was to change the column type of zsize from char(10) to varchar(10):

alter table bi_dm.tdm_wh_zl057_rt alter zsize type varchar(10);

After the change, the planner estimated 79 rows (still inaccurate) but chose an index scan, reducing the query time to 311 ms.

How PostgreSQL Estimates Row Counts

PostgreSQL gathers statistics (distinct values, most‑common values, histograms, correlation, etc.) from pg_class and pg_stats. For a table without a WHERE clause, the estimated row count comes directly from reltuples in pg_class, which in this case is 2 026 121 rows, matching the actual count within about 5 %.

When a simple equality predicate is present, PostgreSQL uses the column’s value‑distribution statistics to compute a selectivity. For example, the column tmall_flg has a most‑common frequency of 0.258133322, leading to an estimated row count of roughly 523 000, close to the actual 532 630 rows.

However, if the predicate involves a type cast or any expression that is not a plain column reference, PostgreSQL cannot use the statistics and falls back to the constant DEFAULT_EQ_SEL = 0.005 defined in src/include/utils/selfuncs.h and implemented in src/backend/utils/adt/selfuncs.c:

#define DEFAULT_EQ_SEL    0.005

if (!get_restriction_variable(root, args, varRelid, &vardata, &other, &varonleft))
    PG_RETURN_FLOAT8(DEFAULT_EQ_SEL);

This default selectivity leads to wildly inaccurate row estimates when the predicate includes casts, functions, or other calculations.

Mitigation Strategies

Normalize column data types to avoid unnecessary casts.

Move calculations to constants instead of columns, e.g., rewrite where c1 + 1 = 1000 as where c1 = 999.

Create expression indexes that match the casted expression, allowing the planner to use an index even with casts.

Rewrite predicates to avoid functions, e.g., replace where substring(c2,2) = 'ZC' with a range condition where c2 >= 'ZC' and c2 < 'ZD' or a simple regex where c2 ~ '^ZC'.

These steps prevent the planner from falling back to the default selectivity, improve index usage, and produce more accurate cost estimates, ultimately avoiding the kind of severe performance degradation demonstrated in the original query.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

PostgreSQL Query Planning type casting Index Scan row estimation

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.