Databases 8 min read

How to Efficiently Retrieve Top N Rows per Group in PostgreSQL

This article explains how to extract the top N records for each group in PostgreSQL, compares the slow window‑function approach with a fast recursive‑CTE and set‑returning function solution, and shows performance results dropping from over 20 seconds to under half a second.

ITPUB
ITPUB
ITPUB
How to Efficiently Retrieve Top N Rows per Group in PostgreSQL

Background

Extracting the top N rows for each group (e.g., top 10 songs per singer) is a common requirement.

Traditional approach using window functions

The usual solution uses a window query with row_number() over (partition by c1 order by c2) and filters on rn <= 10. Example setup creates a table tbl(c1 int, c2 int, c3 int) with an index on (c1,c2), inserts 10 000 000 rows spread across 10 000 groups, and runs the query.

postgres=# create table tbl(c1 int, c2 int, c3 int);
postgres=# create index idx1 on tbl(c1,c2);
postgres=# insert into tbl select mod(trunc(random()*10000)::int,10000), trunc(random()*10000000) from generate_series(1,10000000);

postgres=# explain select * from (select row_number() over(partition by c1 order by c2) as rn, * from tbl) t where t.rn <= 10;
QUERY PLAN
----------------------------------------------------------------------------------------
Subquery Scan on t  (cost=0.43..770563.03 rows=3333326 width=20)
  Filter: (t.rn <= 10)
  ->  WindowAgg  (cost=0.43..645563.31 rows=9999977 width=12)
  ->  Index Scan using idx1 on tbl  (cost=0.43..470563.72 rows=9999977 width=12)
(4 rows)

Execution time: 20833.747 ms

Optimized method with recursive CTE and set‑returning function

When the number of groups is moderate, a recursive CTE can enumerate distinct group identifiers and an index can fetch the top N rows per group. PostgreSQL does not allow ORDER BY inside the recursive seed, so the CTE is wrapped in a PL/pgSQL function that returns a set of rows.

with recursive t1 as (
  select min(c1) c1 from tbl
  union all
  select (select min(tbl.c1) from tbl where tbl.c1 > t.c1) c1
  from t1 t where t.c1 is not null
);

create or replace function f() returns setof tbl as $$
declare v int;
begin
  for v in with recursive t1 as (
    select min(c1) c1 from tbl
    union all
    select (select min(tbl.c1) from tbl where tbl.c1 > t.c1) c1
    from t1 t where t.c1 is not null
  ) loop
    return query select * from tbl where c1 = v order by c2 limit 10;
  end loop;
  return;
end;
$$ language plpgsql strict;

Calling the function returns the same top‑10 rows per group, but the execution plan shows a dramatic speed improvement.

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from f();
QUERY PLAN
---------------------------------------------------------------
Function Scan on public.f  (cost=0.25..10.25 rows=1000 width=12) (actual time=419.218..444.810 rows=100000 loops=1)
  Buffers: shared hit=170407, temp read=221 written=220
Planning time: 0.037 ms
Execution time: 464.257 ms

Key points

The window‑function method scans the entire table; on 10 million rows it takes about 20 seconds.

With a limited number of groups, a recursive CTE combined with an index can retrieve the top N per group in sub‑second time.

PostgreSQL currently does not support ordering in the recursive seed or using ORDER BY directly inside the recursive part, so a wrapper function is required.

Source: Database Kernel Monthly, http://www.kancloud.cn/taobaomysql/monthly/181768

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PostgreSQLWindow FunctionsRecursive CTETop N per group
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.