Why SELECT * Can Kill Your API Performance: Lessons from a 2012 Database Mishap
This article recounts a 2012 incident where a backend API slowed dramatically after hidden BLOB columns were added, and explains why using SELECT *—which blocks index‑only scans, forces extra I/O, increases deserialization, network, and client processing costs, and hampers schema maintenance—should be avoided in favor of explicit column lists.
Background: a real‑world slowdown
In 2012 a backend API that normally responded in single‑digit milliseconds suddenly began taking 500 ms to 2 seconds. The code had not changed, but three large blob columns had been added to the table for another application. The API still executed SELECT *, pulling those BLOB fields even though they were never returned to the client, causing massive database, network, and serialization overhead.
How row‑oriented storage works
Row‑store engines store rows in fixed‑size pages. Each page has a header and a series of row headers followed by column data. When a page is read into the shared buffer pool, all rows on that page become accessible. If every column is needed, the engine must read the full row data from the heap.
Why SELECT * hurts performance
Index‑only scans are disabled – The optimizer cannot use an index‑only plan when the query requests all columns, forcing a heap lookup for each matching row.
Extra I/O – Retrieving non‑required columns adds random page reads, increasing disk I/O.
Deserialization cost – The database must decode every column’s raw bytes into internal types, even for columns the application never uses.
Large columns are often out‑of‑line – Text, JSON, or BLOB data may be stored in TOAST tables (PostgreSQL) or similar external storage. Fetching them requires additional reads and decompression.
Network overhead – More columns mean more bytes to serialize, transmit over TCP/IP, and ultimately deserialize on the client side.
Client‑side deserialization – The application must parse the full result set, increasing CPU time and latency.
Unpredictability of schema changes
Even if a table originally had only two integer columns, adding a new JSON or BLOB column can instantly double query latency because the same SELECT * now pulls the extra data, even though the application code has not changed.
Benefits of explicit column lists
Enables index‑only scans, reducing I/O.
Limits deserialization and network traffic to only needed data.
Makes schema evolution safer; developers can search the codebase for column references and understand the impact of renames or drops.
Conclusion
While a SELECT * on a tiny, simple table may have negligible cost, in real‑world systems it often introduces hidden performance penalties. Selecting only the required columns is a best practice for efficient, predictable, and maintainable database queries.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
