5 Proven API Performance Hacks to Turn Slow Endpoints into Lightning‑Fast Services

This article shares five practical techniques—cursor‑based pagination, buffered asynchronous logging, server‑side caching, gzip compression, and database connection pooling—that together can shrink API latency from seconds to milliseconds and cut infrastructure costs dramatically.

DevOps Coach
DevOps Coach
DevOps Coach
5 Proven API Performance Hacks to Turn Slow Endpoints into Lightning‑Fast Services

1. Pagination: Stop Overloading Your Database

When a client requests user data, returning all 50,000 rows forces the database to work hard and raises response time to about 8 seconds. The fix is simple: implement cursor‑based pagination.

@app.get("/users")
def get_users():
    return db.query(User).all()  # fetch all data

@app.get("/users")
def get_users(cursor: str = None, limit: int = 50):
    query = db.query(User).limit(limit)
    if cursor:
        query = query.filter(User.id > cursor)
    users = query.all()
    next_cursor = users[-1].id if users else None
    return {"data": users, "next_cursor": next_cursor}

Benchmark on a 100 k‑record set shows average latency dropping from 4.2 s (no pagination) to 0.18 s (50 rows per page), a 23× speed‑up.

2. Asynchronous Logging: Eliminate Request Blocking

Synchronous logging blocks the request thread while writing to disk, creating a bottleneck under high traffic.

// Synchronous logging (blocking)
app.post('/api/orders', (req, res) => {
    logger.info('收到订单'); // block
    processOrder(req.body);
    logger.info('订单已处理'); // block
    res.send({status: 'success'});
});

// Buffered async logging
const logBuffer = [];
const FLUSH_INTERVAL = 5000;
function asyncLog(message) {
    logBuffer.push({timestamp: Date.now(), message});
}
setInterval(() => {
    if (logBuffer.length > 0) {
        fs.appendFile('app.log', logBuffer.join('
'), () => {});
        logBuffer.length = 0;
    }
}, FLUSH_INTERVAL);
app.post('/api/orders', (req, res) => {
    asyncLog('收到订单');
    processOrder(req.body);
    asyncLog('订单已处理');
    res.send({status: 'success'});
});

In tests, throughput rose from 450 req/s (synchronous) to 2,100 req/s (async), a 4.6× improvement.

3. Caching: Your First Line of Defense

When 80 % of requests hit the same data, repeatedly querying the database wastes resources. Adding a simple Redis cache eliminates redundant reads.

import redis
import json
cache = redis.Redis(host='localhost', port=6379, db=0)

@app.get("/products/{product_id}")
def get_product(product_id: int):
    cached = cache.get(f"product:{product_id}")
    if cached:
        return json.loads(cached)
    product = db.query(Product).filter(Product.id == product_id).first()
    cache.setex(f"product:{product_id}", 3600, json.dumps(product.dict()))
    return product

On a product‑catalog API, average latency fell from 85 ms (no cache) to 3 ms with a 90 % hit rate, and database queries dropped from 150 qps to 15 qps.

4. Compression: Slim Down Your Payloads

Transmitting a 500 KB JSON over 3G can take >2 s. Gzip compression reduces size by ~76 % and cuts transfer time to 0.5 s.

from flask import Flask, request, jsonify
import gzip, json

app = Flask(__name__)

def compress_response(data):
    json_str = json.dumps(data)
    return gzip.compress(json_str.encode('utf-8'))

@app.route('/api/data')
def get_data():
    data = generate_large_dataset()  # returns ~500KB JSON
    if 'gzip' in request.headers.get('Accept-Encoding', ''):
        compressed = compress_response(data)
        response = app.response_class(response=compressed, status=200, mimetype='application/json')
        response.headers['Content-Encoding'] = 'gzip'
        return response
    return jsonify(data)

5. Connection Pooling: Reuse Instead of Re‑creating

Opening a new DB connection per request adds 50‑100 ms overhead. A pool of reusable connections removes that cost.

from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

engine = create_engine(
    'postgresql://user:pass@localhost/db',
    poolclass=QueuePool,
    pool_size=10,          # keep 10 connections alive
    max_overflow=20,       # allow up to 20 extra connections
    pool_pre_ping=True,    # validate before use
    pool_recycle=3600      # recycle after 1 hour
)

Under 1,000 concurrent requests, average latency dropped from 145 ms (no pool) to 42 ms (pool of 10), a 3.4× gain.

Putting It All Together

These five techniques are standard practice at large‑scale operators. While each introduces trade‑offs—pagination adds contract complexity, async logging can lose messages, caching may serve stale data, compression consumes CPU, and pools need tuning—the performance gains are undeniable. In a recent project, applying all five reduced P95 latency from 3.2 s to 0.4 s and cut infrastructure costs by 40 %.

Start with caching and pagination for the biggest bang‑for‑buck, then progressively adopt async logging, compression, and connection pooling as traffic grows.

performanceCachingPaginationcompressionConnection Pooling
DevOps Coach
Written by

DevOps Coach

Master DevOps precisely and progressively.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.