5 Proven API Performance Hacks to Turn Slow Endpoints into Lightning‑Fast Services
This article shares five practical techniques—cursor‑based pagination, buffered asynchronous logging, server‑side caching, gzip compression, and database connection pooling—that together can shrink API latency from seconds to milliseconds and cut infrastructure costs dramatically.
1. Pagination: Stop Overloading Your Database
When a client requests user data, returning all 50,000 rows forces the database to work hard and raises response time to about 8 seconds. The fix is simple: implement cursor‑based pagination.
@app.get("/users")
def get_users():
return db.query(User).all() # fetch all data
@app.get("/users")
def get_users(cursor: str = None, limit: int = 50):
query = db.query(User).limit(limit)
if cursor:
query = query.filter(User.id > cursor)
users = query.all()
next_cursor = users[-1].id if users else None
return {"data": users, "next_cursor": next_cursor}Benchmark on a 100 k‑record set shows average latency dropping from 4.2 s (no pagination) to 0.18 s (50 rows per page), a 23× speed‑up.
2. Asynchronous Logging: Eliminate Request Blocking
Synchronous logging blocks the request thread while writing to disk, creating a bottleneck under high traffic.
// Synchronous logging (blocking)
app.post('/api/orders', (req, res) => {
logger.info('收到订单'); // block
processOrder(req.body);
logger.info('订单已处理'); // block
res.send({status: 'success'});
});
// Buffered async logging
const logBuffer = [];
const FLUSH_INTERVAL = 5000;
function asyncLog(message) {
logBuffer.push({timestamp: Date.now(), message});
}
setInterval(() => {
if (logBuffer.length > 0) {
fs.appendFile('app.log', logBuffer.join('
'), () => {});
logBuffer.length = 0;
}
}, FLUSH_INTERVAL);
app.post('/api/orders', (req, res) => {
asyncLog('收到订单');
processOrder(req.body);
asyncLog('订单已处理');
res.send({status: 'success'});
});In tests, throughput rose from 450 req/s (synchronous) to 2,100 req/s (async), a 4.6× improvement.
3. Caching: Your First Line of Defense
When 80 % of requests hit the same data, repeatedly querying the database wastes resources. Adding a simple Redis cache eliminates redundant reads.
import redis
import json
cache = redis.Redis(host='localhost', port=6379, db=0)
@app.get("/products/{product_id}")
def get_product(product_id: int):
cached = cache.get(f"product:{product_id}")
if cached:
return json.loads(cached)
product = db.query(Product).filter(Product.id == product_id).first()
cache.setex(f"product:{product_id}", 3600, json.dumps(product.dict()))
return productOn a product‑catalog API, average latency fell from 85 ms (no cache) to 3 ms with a 90 % hit rate, and database queries dropped from 150 qps to 15 qps.
4. Compression: Slim Down Your Payloads
Transmitting a 500 KB JSON over 3G can take >2 s. Gzip compression reduces size by ~76 % and cuts transfer time to 0.5 s.
from flask import Flask, request, jsonify
import gzip, json
app = Flask(__name__)
def compress_response(data):
json_str = json.dumps(data)
return gzip.compress(json_str.encode('utf-8'))
@app.route('/api/data')
def get_data():
data = generate_large_dataset() # returns ~500KB JSON
if 'gzip' in request.headers.get('Accept-Encoding', ''):
compressed = compress_response(data)
response = app.response_class(response=compressed, status=200, mimetype='application/json')
response.headers['Content-Encoding'] = 'gzip'
return response
return jsonify(data)5. Connection Pooling: Reuse Instead of Re‑creating
Opening a new DB connection per request adds 50‑100 ms overhead. A pool of reusable connections removes that cost.
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine(
'postgresql://user:pass@localhost/db',
poolclass=QueuePool,
pool_size=10, # keep 10 connections alive
max_overflow=20, # allow up to 20 extra connections
pool_pre_ping=True, # validate before use
pool_recycle=3600 # recycle after 1 hour
)Under 1,000 concurrent requests, average latency dropped from 145 ms (no pool) to 42 ms (pool of 10), a 3.4× gain.
Putting It All Together
These five techniques are standard practice at large‑scale operators. While each introduces trade‑offs—pagination adds contract complexity, async logging can lose messages, caching may serve stale data, compression consumes CPU, and pools need tuning—the performance gains are undeniable. In a recent project, applying all five reduced P95 latency from 3.2 s to 0.4 s and cut infrastructure costs by 40 %.
Start with caching and pagination for the biggest bang‑for‑buck, then progressively adopt async logging, compression, and connection pooling as traffic grows.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
