Why FastAPI Slows Down with Millions of Rows—and How to Keep It Fast

FastAPI feels lightning‑fast on small datasets, but returning millions of rows can exhaust memory, block the event loop, and cripple the database; this article explains why that happens and provides concrete design rules—selective fields, pagination, cursor‑based queries, streaming, and chunked processing—to keep APIs stable at scale.

Code Mala Tang
Code Mala Tang
Code Mala Tang
Why FastAPI Slows Down with Millions of Rows—and How to Keep It Fast

FastAPI’s Speed Illusion

FastAPI appears extremely fast for small to medium data sets, but when the data volume jumps to millions of rows, requests start to hang, worker memory spikes to 100%, and the database slows down for everyone.

Why the Performance Degrades

Async helps with waiting, not with the amount of data. Returning millions of rows forces Python to keep all rows in memory, makes Pydantic serialize every record, blocks the event loop during serialization, and finally sends the huge payload over the network.

Common Mistake: Fetch Everything at Once

@app.get("/events")
async def get_events():
    rows = await database.fetch_all("SELECT * FROM events")
    return rows

This pattern works for tiny data sets but quickly leads to:

RAM exhaustion

Worker process freeze

Timeouts

Down‑stream service collapse

In large‑scale scenarios the endpoint becomes a denial‑of‑service attack on your own server.

Principle #1 – Never Retrieve Unnecessary Data

The hidden cost of SELECT * includes extra memory, slower serialization, higher network load, and wasted CPU cycles.

SELECT id, status, created_at
FROM events
WHERE created_at >= NOW() - INTERVAL '7 days';

Specifying only the columns you need reduces memory usage, improves cache efficiency, and makes the intent clear for future readers.

Principle #2 – Pagination Is a Requirement, Not a Feature

Allowing an endpoint to return an unlimited number of rows means it is already broken. The real question is *when* it will fail.

Why Offset Pagination Fails Silently

SELECT * FROM logs
ORDER BY id
LIMIT 1000 OFFSET 500000;

Even though the query looks harmless, the database must scan and discard half a million rows before returning the next thousand, causing linear slowdown, CPU spikes, and index degradation.

Cursor‑Based Pagination – The Scalable Default

SELECT * FROM logs
WHERE id > :last_id
ORDER BY id
LIMIT 1000;

This approach leverages indexes efficiently, keeps each page’s cost roughly constant, and prevents performance decay over time.

Principle #3 – Stream Large Responses

Returning a massive JSON list is the worst choice for export‑type endpoints. Streaming changes the model:

You don’t wait for all data.

You don’t keep everything in memory.

You can “produce‑while‑sending”.

from fastapi.responses import StreamingResponse

async def stream_rows():
    async for row in database.iterate("SELECT id, message FROM logs"):
        yield f"{row['id']},{row['message']}
"

@app.get("/export")
async def export_logs():
    return StreamingResponse(
        stream_rows(),
        media_type="text/csv",
        headers={"Content-Disposition": "attachment; filename=logs.csv"}
    )

Streaming uses constant memory, improves first‑byte latency, and is essential for surviving large‑scale data sets.

Principle #4 – Chunk Everything (Even When Streaming)

Databases dislike long‑running queries; they hold locks and can fail under pressure. Process data in bounded chunks.

CHUNK_SIZE = 100_000
last_id = 0
while True:
    rows = await database.fetch_all(
        """
        SELECT * FROM events
        WHERE id > :last_id
        ORDER BY id
        LIMIT :limit
        """,
        {"last_id": last_id, "limit": CHUNK_SIZE}
    )
    if not rows:
        break
    process(rows)
    last_id = rows[-1]["id"]

Chunking yields short transactions, easier failure recovery, and controllable memory growth, turning dangerous operations into manageable work units.

Data‑Pipeline Mental Model

When data grows, treat each endpoint as a pipeline:

Database → Filter → Chunk → Stream → Client

Each stage should have bounded memory, be independently optimizable, and fail without cascading effects.

Conclusion (Part 1)

FastAPI itself does not crash under large loads; unstructured data handling does. Remember: large data sets require a structured pipeline, not clever shortcuts.

In Part 2 we will explore why async alone isn’t enough, when to move work out of the request, background tasks, parallel processing, and keeping APIs responsive under load.

StreamingPaginationFastAPIlarge datasets
Code Mala Tang
Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.