Why SELECT * Slows Down Your Database and How to Avoid It
The article recounts a 2012 incident where a seemingly fast backend API became sluggish after hidden blob columns were added, explains how SELECT * forces full table scans, extra deserialization, network overhead, and unpredictable performance, and advises selecting only needed columns for optimal efficiency.
Story from 2012
A developer recounts a real case from 2012‑2013 where a backend API that normally responded in a few milliseconds suddenly became slow for users.
Code reviews showed no abnormal changes, and even after rolling back all commits the slowdown persisted.
Diagnosing the slowdown
API response times occasionally rose to 500 ms–2 s, whereas they used to be a few milliseconds.
The team investigated the database queries and discovered that the table had three new BLOB fields added by another application, although the original table only had two integer columns.
How database reads work
In row‑store engines, rows are stored in pages, each page containing a header and multiple rows with column data.
When a page is loaded into the shared buffer pool, all rows and columns become accessible.
Even though the extra BLOB columns are not returned to the client, the backend API still fetches them, increasing database, network, and serialization overhead.
Leaving index scans
Using SELECT * prevents the optimizer from using an index‑only scan. For example, if you need student IDs with scores above 90 and there is an index on the score column, the index can satisfy the query without touching the heap.
Because SELECT * requests all columns, the database must also read the heap pages for the remaining columns, causing many random I/O operations.
Deserialization cost
Deserialization converts raw bytes into data types, a process that adds CPU work.
When executing SELECT *, the database must deserialize every column, even those not needed by the application, increasing computational overhead and reducing query performance.
Not all columns are inline
Large columns such as text or BLOBs are often stored out‑of‑line (e.g., PostgreSQL TOAST tables) and fetched only on demand.
Fetching many such columns forces the database to retrieve, decompress, and serialize additional data, adding load.
Network cost
Result rows are serialized according to the database protocol before being sent over TCP/IP; more data means more CPU work and larger packets, increasing latency.
Returning all columns can force clients to handle unnecessary large fields, further slowing deserialization on the client side.
Unpredictability
Even a table with one or two simple columns can become slow if administrators later add XML, JSON, or BLOB columns that the application never uses.
The query remains fast until those extra columns are added, at which point SELECT * starts pulling unnecessary data.
Using code grep
Explicit column lists make it easy to grep the codebase for column usage, simplifying schema refactoring and DDL changes.
Conclusion
In summary, SELECT * incurs many hidden costs—extra I/O, deserialization, network overhead, and unpredictability—so it is best to select only the columns you truly need, unless the table is tiny with simple data types.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
