Why Async FastAPI Still Blocks and How to Offload Heavy Work
After fixing unlimited queries and pagination issues, this article reveals why async FastAPI still stalls under load, outlines the hidden bottlenecks in the request lifecycle, and provides practical rules and code examples for offloading heavy work to background workers, ensuring scalability, idempotence, and observability.
The Myth of Async FastAPI
Many believe that because FastAPI is asynchronous it can efficiently handle long‑running tasks. This is only half‑true. Async helps when the code is waiting for external resources such as databases, networks, or disks. It does not help with CPU‑bound work like file compression, large‑scale data transformation, serializing huge objects, or intensive calculations. In those cases the event loop is blocked just like in synchronous code.
Real Bottleneck: Request Lifecycle
A typical endpoint that does everything in one request illustrates the problem:
@app.post("/generate-report")
async def generate_report():
data = await fetch_millions_of_rows()
processed = heavy_transformation(data)
write_to_file(processed)
return {"status": "done"}The request performs four steps: retrieve massive data, process it in memory, write to disk, and finally respond. Under load this leads to request timeouts, workers blocked for seconds or minutes, queued requests, unnecessary autoscaling, and unrecoverable failures. The architecture is fragile, not simply a “slow API”.
Rule #5 – Heavy Work Must Leave the Request Path
The API should accept work, not execute it. A proper model looks like:
Client Request → FastAPI → Job Queue / Background Workers → Long‑running Processing → Storage / Callback / PollingThis design yields fast API responses, robust job handling, retry mechanisms, observability, and horizontal scalability.
Option 1 – FastAPI BackgroundTasks (Lightweight)
FastAPI provides a built‑in way to run background jobs that are suitable for lightweight tasks such as logging, small notifications, sending a single email, or other light I/O.
from fastapi import BackgroundTasks
def write_audit_log(user_id: int):
with open("audit.log", "a") as f:
f.write(f"{user_id} accessed report
")
@app.post("/report")
async def generate_report(background_tasks: BackgroundTasks):
background_tasks.add_task(write_audit_log, user_id=42)
return {"status": "processing"}These tasks still run in the same worker process, so they cannot protect the service from memory or CPU exhaustion.
Option 2 – Dedicated Worker Architecture (Production)
For heavy workloads you need a queue, one or more worker processes, and task isolation.
FastAPI → Queue → Worker → Storage → ClientExample job‑submission endpoint:
import uuid
@app.post("/export")
async def start_export():
job_id = str(uuid.uuid4())
await enqueue_export_job(job_id)
return {"job_id": job_id, "status": "queued"}Clients poll a status endpoint:
@app.get("/export/{job_id}")
async def export_status(job_id: str):
status = await get_job_status(job_id)
return {"job_id": job_id, "status": status}This pattern returns immediately, never blocks, and remains responsive under load.
Rule #6 – Concurrency ≠ Parallelism
Async provides concurrency, but CPU‑intensive jobs need true parallelism via multiple processes. If a job involves file compression, writing Parquet files, aggregating millions of records, or machine‑learning inference, you must spawn several processes.
Rule #7 – Prevent Worker Starvation
Sharing the same process pool between API handlers and background work leads to high latency, lost requests, and unpredictable timeouts.
❝FastAPI is async. It should be able to handle long‑running tasks efficiently.❞
Anti‑pattern (single pool handling everything):
FastAPI Worker:
- handle request
- generate file
- compress data
- transform datasetCorrect separation:
FastAPI Worker → only handle HTTP requests
Background Worker → CPU / file / data processingRule #8 – Make Jobs Idempotent
Failures are inevitable at scale (process crashes, node restarts, partial writes). Jobs must be safely retryable.
def export_job(job_id: str):
if export_exists(job_id):
return
data = fetch_data(job_id)
write_export(job_id, data)This ensures no duplicates, safe retries, and consistency after restarts.
Observability
Once work leaves the request path you need to monitor:
Job queue size
Job duration
Failure count per job type
CPU / memory usage of workers
Minimal logging example:
import logging
logger = logging.getLogger(__name__)
def export_job(job_id: str):
logger.info("Starting export %s", job_id)
try:
run_export(job_id)
logger.info("Completed export %s", job_id)
except Exception as e:
logger.exception("Export failed %s", job_id)
raiseVisibility equals stability in large‑scale systems.
Mental Model: API as Orchestrator, Not Worker
When datasets grow, the API should:
Accept requests
Validate input
Schedule work
Expose results
It should not process large datasets, write huge files, or perform heavy computation directly.
Recap of Part 2
Async cannot solve CPU or memory pressure.
Heavy work must leave the request lifecycle.
Background workers protect API responsiveness.
Parallelism requires multiple processes.
Job systems must be idempotent and observable.
What’s Coming in Part 3
The next article will dive into memory‑control strategies, data formats (JSON, CSV, Parquet), large‑scale streaming pipelines, production monitoring, and diagnosing real performance failures.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
