Why FastAPI Slows Down with Millions of Rows—and How to Keep It Fast
FastAPI feels lightning‑fast on small datasets, but returning millions of rows can exhaust memory, block the event loop, and cripple the database; this article explains why that happens and provides concrete design rules—selective fields, pagination, cursor‑based queries, streaming, and chunked processing—to keep APIs stable at scale.
FastAPI’s Speed Illusion
FastAPI appears extremely fast for small to medium data sets, but when the data volume jumps to millions of rows, requests start to hang, worker memory spikes to 100%, and the database slows down for everyone.
Why the Performance Degrades
Async helps with waiting, not with the amount of data. Returning millions of rows forces Python to keep all rows in memory, makes Pydantic serialize every record, blocks the event loop during serialization, and finally sends the huge payload over the network.
Common Mistake: Fetch Everything at Once
@app.get("/events")
async def get_events():
rows = await database.fetch_all("SELECT * FROM events")
return rowsThis pattern works for tiny data sets but quickly leads to:
RAM exhaustion
Worker process freeze
Timeouts
Down‑stream service collapse
In large‑scale scenarios the endpoint becomes a denial‑of‑service attack on your own server.
Principle #1 – Never Retrieve Unnecessary Data
The hidden cost of SELECT * includes extra memory, slower serialization, higher network load, and wasted CPU cycles.
SELECT id, status, created_at
FROM events
WHERE created_at >= NOW() - INTERVAL '7 days';Specifying only the columns you need reduces memory usage, improves cache efficiency, and makes the intent clear for future readers.
Principle #2 – Pagination Is a Requirement, Not a Feature
Allowing an endpoint to return an unlimited number of rows means it is already broken. The real question is *when* it will fail.
Why Offset Pagination Fails Silently
SELECT * FROM logs
ORDER BY id
LIMIT 1000 OFFSET 500000;Even though the query looks harmless, the database must scan and discard half a million rows before returning the next thousand, causing linear slowdown, CPU spikes, and index degradation.
Cursor‑Based Pagination – The Scalable Default
SELECT * FROM logs
WHERE id > :last_id
ORDER BY id
LIMIT 1000;This approach leverages indexes efficiently, keeps each page’s cost roughly constant, and prevents performance decay over time.
Principle #3 – Stream Large Responses
Returning a massive JSON list is the worst choice for export‑type endpoints. Streaming changes the model:
You don’t wait for all data.
You don’t keep everything in memory.
You can “produce‑while‑sending”.
from fastapi.responses import StreamingResponse
async def stream_rows():
async for row in database.iterate("SELECT id, message FROM logs"):
yield f"{row['id']},{row['message']}
"
@app.get("/export")
async def export_logs():
return StreamingResponse(
stream_rows(),
media_type="text/csv",
headers={"Content-Disposition": "attachment; filename=logs.csv"}
)Streaming uses constant memory, improves first‑byte latency, and is essential for surviving large‑scale data sets.
Principle #4 – Chunk Everything (Even When Streaming)
Databases dislike long‑running queries; they hold locks and can fail under pressure. Process data in bounded chunks.
CHUNK_SIZE = 100_000
last_id = 0
while True:
rows = await database.fetch_all(
"""
SELECT * FROM events
WHERE id > :last_id
ORDER BY id
LIMIT :limit
""",
{"last_id": last_id, "limit": CHUNK_SIZE}
)
if not rows:
break
process(rows)
last_id = rows[-1]["id"]Chunking yields short transactions, easier failure recovery, and controllable memory growth, turning dangerous operations into manageable work units.
Data‑Pipeline Mental Model
When data grows, treat each endpoint as a pipeline:
Database → Filter → Chunk → Stream → ClientEach stage should have bounded memory, be independently optimizable, and fail without cascading effects.
Conclusion (Part 1)
FastAPI itself does not crash under large loads; unstructured data handling does. Remember: large data sets require a structured pipeline, not clever shortcuts.
In Part 2 we will explore why async alone isn’t enough, when to move work out of the request, background tasks, parallel processing, and keeping APIs responsive under load.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
