Databases 22 min read

How to Speed Up Latest‑Record Queries in Large PostgreSQL Time‑Series Tables

This article explains why querying the most recent record for a specific device in massive time‑series tables can be painfully slow in PostgreSQL, demonstrates the impact of using only a timestamp index, and presents several indexing and query‑design strategies—including composite indexes, lateral joins, SkipScan, recursive CTEs, and logging tables with triggers—to dramatically improve performance.

ITPUB

Mar 10, 2022

How to Speed Up Latest‑Record Queries in Large PostgreSQL Time‑Series Tables

Many applications need to fetch the newest timestamped row for a given device from huge time‑series datasets, but PostgreSQL often scans millions of rows when only a B‑tree index on the timestamp column exists.

Problem Overview

When the query

SELECT * FROM truck_reading WHERE truck_id = 1234 ORDER BY ts DESC LIMIT 1;

runs on a table that only has the default index CREATE INDEX ix_ts ON truck_reading (ts DESC);, PostgreSQL must read the index in timestamp order and filter each row to find the matching truck_id. If the device has not reported recently, the planner may need to scan hundreds of thousands or millions of rows, consuming hundreds of megabytes of buffer data.

Demonstration of the Issue

An EXPLAIN on a 20‑million‑row sample shows:

Limit  (cost=0.44..289.59 rows=1) (actual time=189.343..189.344 rows=1)
  -> Index Scan using ix_ts on truck_reading
       Filter: (truck_id = 1234)
       Rows Removed by Filter: 1532532
       Buffers: shared hit=23168

This scan reads roughly 184 MB of data, making the query a bottleneck as the dataset grows.

Effective Indexing Strategy

Creating a composite index that starts with the device identifier dramatically reduces work:

CREATE INDEX ix_truck_id_ts ON truck_reading (truck_id, ts DESC);

With this index, the same query reads only a few index entries, dropping execution time to sub‑millisecond levels and reducing buffer reads by a factor of ~4600.

Why Some Queries Remain Slow

Even with both indexes, open‑ended queries that lack a time filter (e.g.,

SELECT * FROM truck_reading WHERE truck_id=1234 ORDER BY ts LIMIT 1;

) force the planner to assume it must scan every partition, increasing planning time, especially on highly partitioned hypertables.

Alternative Query Patterns

Naïve GROUP BY : SELECT max(time) FROM truck_reading GROUP BY truck_id; works but cannot use indexes efficiently on large datasets.

LATERAL JOIN : Allows each outer row to drive an inner query that can use the composite index.

SELECT * FROM trucks t
    INNER JOIN LATERAL (
      SELECT * FROM truck_reading
      WHERE truck_id = t.truck_id
      ORDER BY ts DESC
      LIMIT 1
    ) l ON TRUE
    ORDER BY t.truck_id DESC;

Works well when the outer table is small but can suffer from high cardinality.

TimescaleDB SkipScan : With TimescaleDB 2.3+, the planner can use a SkipScan node to efficiently fetch the first row per truck_id when the appropriate index exists.

Loose Index Scan (Recursive CTE) : Implements a similar pattern without TimescaleDB, but the query is more complex and returns only a single column.

WITH RECURSIVE t AS (
      SELECT min(ts) AS time FROM truck_reading
      UNION ALL
      SELECT (SELECT min(ts) FROM truck_reading WHERE ts > t.ts)
    ) SELECT ts FROM t WHERE ts IS NOT NULL;

Logging Table with Trigger : Maintain a separate table that stores the latest reading per device, updated via a trigger on truck_reading.

CREATE TABLE truck_log (
      truck_id int PRIMARY KEY REFERENCES trucks(truck_id),
      milage int,
      fuel int,
      latitude float8,
      longitude float8
    );
    ALTER TABLE truck_log SET (fillfactor = 90);

    CREATE OR REPLACE FUNCTION create_truck_trigger_fn()
    RETURNS TRIGGER LANGUAGE plpgsql AS $BODY$
    BEGIN
      INSERT INTO truck_log VALUES (NEW.truck_id, NEW.milage, NEW.fuel, NEW.latitude, NEW.longitude)
      ON CONFLICT (truck_id) DO UPDATE SET
        milage = EXCLUDED.milage,
        fuel = EXCLUDED.fuel,
        latitude = EXCLUDED.latitude,
        longitude = EXCLUDED.longitude;
      RETURN NEW;
    END;
    $BODY$;

    CREATE TRIGGER create_truck_trigger
    BEFORE INSERT OR UPDATE ON truck_reading
    FOR EACH ROW EXECUTE PROCEDURE create_truck_trigger_fn();

Queries against truck_log are tiny and fast, at the cost of additional write overhead.

Practical Recommendations

For most workloads, create the composite index (truck_id, ts DESC) to enable fast look‑ups of the latest record per device.

Use a time filter whenever possible to avoid full‑partition scans.

Consider LATERAL JOINs for small outer tables, but watch cardinality.

If you have TimescaleDB installed, leverage SkipScan for efficient per‑device retrieval.

For extremely high‑throughput ingestion, a logging table with a trigger can provide O(1) reads of the latest value.

Conclusion

Choosing the right indexing strategy and query pattern is essential for performant retrieval of the most recent rows in large PostgreSQL time‑series tables; the options above give you a toolbox to match your data volume, cardinality, and operational constraints.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL PostgreSQL Triggers TimescaleDB Time-Series

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.