Databases 7 min read

Why SELECT * Can Kill Your API Performance: Lessons from a 2012 Database Mishap

This article recounts a 2012 incident where a backend API slowed dramatically after hidden BLOB columns were added, and explains why using SELECT *—which blocks index‑only scans, forces extra I/O, increases deserialization, network, and client processing costs, and hampers schema maintenance—should be avoided in favor of explicit column lists.

ITPUB

Dec 26, 2024

Why SELECT * Can Kill Your API Performance: Lessons from a 2012 Database Mishap

Background: a real‑world slowdown

In 2012 a backend API that normally responded in single‑digit milliseconds suddenly began taking 500 ms to 2 seconds. The code had not changed, but three large blob columns had been added to the table for another application. The API still executed SELECT *, pulling those BLOB fields even though they were never returned to the client, causing massive database, network, and serialization overhead.

How row‑oriented storage works

Row‑store engines store rows in fixed‑size pages. Each page has a header and a series of row headers followed by column data. When a page is read into the shared buffer pool, all rows on that page become accessible. If every column is needed, the engine must read the full row data from the heap.

Why SELECT * hurts performance

Index‑only scans are disabled – The optimizer cannot use an index‑only plan when the query requests all columns, forcing a heap lookup for each matching row.

Extra I/O – Retrieving non‑required columns adds random page reads, increasing disk I/O.

Deserialization cost – The database must decode every column’s raw bytes into internal types, even for columns the application never uses.

Large columns are often out‑of‑line – Text, JSON, or BLOB data may be stored in TOAST tables (PostgreSQL) or similar external storage. Fetching them requires additional reads and decompression.

Network overhead – More columns mean more bytes to serialize, transmit over TCP/IP, and ultimately deserialize on the client side.

Client‑side deserialization – The application must parse the full result set, increasing CPU time and latency.

Unpredictability of schema changes

Even if a table originally had only two integer columns, adding a new JSON or BLOB column can instantly double query latency because the same SELECT * now pulls the extra data, even though the application code has not changed.

Benefits of explicit column lists

Enables index‑only scans, reducing I/O.

Limits deserialization and network traffic to only needed data.

Makes schema evolution safer; developers can search the codebase for column references and understand the impact of renames or drops.

Conclusion

While a SELECT * on a tiny, simple table may have negligible cost, in real‑world systems it often introduces hidden performance penalties. Selecting only the required columns is a best practice for efficient, predictable, and maintainable database queries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Query Optimization PostgreSQL Database Performance select

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.