How to Speed Up Latest‑Record Queries in Large PostgreSQL Time‑Series Tables
This article explains why querying the most recent record for a specific device in massive time‑series tables can be painfully slow in PostgreSQL, demonstrates the impact of using only a timestamp index, and presents several indexing and query‑design strategies—including composite indexes, lateral joins, SkipScan, recursive CTEs, and logging tables with triggers—to dramatically improve performance.
Many applications need to fetch the newest timestamped row for a given device from huge time‑series datasets, but PostgreSQL often scans millions of rows when only a B‑tree index on the timestamp column exists.
Problem Overview
When the query
SELECT * FROM truck_reading WHERE truck_id = 1234 ORDER BY ts DESC LIMIT 1;runs on a table that only has the default index CREATE INDEX ix_ts ON truck_reading (ts DESC);, PostgreSQL must read the index in timestamp order and filter each row to find the matching truck_id. If the device has not reported recently, the planner may need to scan hundreds of thousands or millions of rows, consuming hundreds of megabytes of buffer data.
Demonstration of the Issue
An EXPLAIN on a 20‑million‑row sample shows:
Limit (cost=0.44..289.59 rows=1) (actual time=189.343..189.344 rows=1)
-> Index Scan using ix_ts on truck_reading
Filter: (truck_id = 1234)
Rows Removed by Filter: 1532532
Buffers: shared hit=23168This scan reads roughly 184 MB of data, making the query a bottleneck as the dataset grows.
Effective Indexing Strategy
Creating a composite index that starts with the device identifier dramatically reduces work:
CREATE INDEX ix_truck_id_ts ON truck_reading (truck_id, ts DESC);With this index, the same query reads only a few index entries, dropping execution time to sub‑millisecond levels and reducing buffer reads by a factor of ~4600.
Why Some Queries Remain Slow
Even with both indexes, open‑ended queries that lack a time filter (e.g.,
SELECT * FROM truck_reading WHERE truck_id=1234 ORDER BY ts LIMIT 1;) force the planner to assume it must scan every partition, increasing planning time, especially on highly partitioned hypertables.
Alternative Query Patterns
Naïve GROUP BY : SELECT max(time) FROM truck_reading GROUP BY truck_id; works but cannot use indexes efficiently on large datasets.
LATERAL JOIN : Allows each outer row to drive an inner query that can use the composite index.
SELECT * FROM trucks t
INNER JOIN LATERAL (
SELECT * FROM truck_reading
WHERE truck_id = t.truck_id
ORDER BY ts DESC
LIMIT 1
) l ON TRUE
ORDER BY t.truck_id DESC;Works well when the outer table is small but can suffer from high cardinality.
TimescaleDB SkipScan : With TimescaleDB 2.3+, the planner can use a SkipScan node to efficiently fetch the first row per truck_id when the appropriate index exists.
Loose Index Scan (Recursive CTE) : Implements a similar pattern without TimescaleDB, but the query is more complex and returns only a single column.
WITH RECURSIVE t AS (
SELECT min(ts) AS time FROM truck_reading
UNION ALL
SELECT (SELECT min(ts) FROM truck_reading WHERE ts > t.ts)
) SELECT ts FROM t WHERE ts IS NOT NULL;Logging Table with Trigger : Maintain a separate table that stores the latest reading per device, updated via a trigger on truck_reading.
CREATE TABLE truck_log (
truck_id int PRIMARY KEY REFERENCES trucks(truck_id),
milage int,
fuel int,
latitude float8,
longitude float8
);
ALTER TABLE truck_log SET (fillfactor = 90);
CREATE OR REPLACE FUNCTION create_truck_trigger_fn()
RETURNS TRIGGER LANGUAGE plpgsql AS $BODY$
BEGIN
INSERT INTO truck_log VALUES (NEW.truck_id, NEW.milage, NEW.fuel, NEW.latitude, NEW.longitude)
ON CONFLICT (truck_id) DO UPDATE SET
milage = EXCLUDED.milage,
fuel = EXCLUDED.fuel,
latitude = EXCLUDED.latitude,
longitude = EXCLUDED.longitude;
RETURN NEW;
END;
$BODY$;
CREATE TRIGGER create_truck_trigger
BEFORE INSERT OR UPDATE ON truck_reading
FOR EACH ROW EXECUTE PROCEDURE create_truck_trigger_fn();Queries against truck_log are tiny and fast, at the cost of additional write overhead.
Practical Recommendations
For most workloads, create the composite index (truck_id, ts DESC) to enable fast look‑ups of the latest record per device.
Use a time filter whenever possible to avoid full‑partition scans.
Consider LATERAL JOINs for small outer tables, but watch cardinality.
If you have TimescaleDB installed, leverage SkipScan for efficient per‑device retrieval.
For extremely high‑throughput ingestion, a logging table with a trigger can provide O(1) reads of the latest value.
Conclusion
Choosing the right indexing strategy and query pattern is essential for performant retrieval of the most recent rows in large PostgreSQL time‑series tables; the options above give you a toolbox to match your data volume, cardinality, and operational constraints.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
