Optimizing a Python Flask Backend: Reducing Response Time from 37.6 s to 1.47 s with Profiling and Database Refactoring

This article walks through a systematic performance investigation of a Python‑Flask backend, using Chrome Network, flame‑graph profiling, and MySQL query redesign to cut a 37.6‑second page load down to 1.47 seconds while highlighting practical code‑level and architectural optimizations.

NetEase Game Operations Platform
NetEase Game Operations Platform
NetEase Game Operations Platform
Optimizing a Python Flask Backend: Reducing Response Time from 37.6 s to 1.47 s with Profiling and Database Refactoring

Background – A widely used internal tool suffered from an extremely slow settings page, taking up to 36 seconds to load, prompting a performance‑optimization effort.

Initial investigation – Chrome’s Network panel revealed that most of the latency (≈17.6 s) was spent in the Waiting (TTFB) phase, indicating server‑side processing bottlenecks.

Profiling with flame graphs – A Python + Flask endpoint was profiled, exposing heavy time consumption in functions that fetched CPU‑max values per group (gid) using many threads.

First wave of optimization – functional redesign

The original code created and destroyed a thread for each gid, leading to high overhead. The redesign removed the default loading of CPU‑max values, making the data load user‑initiated and eliminating the multithreaded implementation.

def get_max_cpus(project_code, gids):
    """..."""
    # ...
    for gid in gids:
        t = Thread(target=get_max_cpu, args=(...))
        threads.append(t)
        t.start()
    for t in threads:
        t.join()
    return max_cpus

Issues identified:

Thread creation/destruction per request is costly; a thread pool would be preferable.

The CPU‑max value is a historical maximum, not a real‑time metric, so loading it eagerly adds little value.

Solutions applied:

Make the CPU‑max loading optional, triggered by user interaction.

Remove the per‑gid threading entirely.

The post‑optimization flame graph showed a much healthier distribution of time.

Second wave of optimization – MySQL query refactor

Profiling highlighted utils.py:get_group_profile_settings as a hotspot due to repeated ORM queries inside a loop over gids.

def get_group_profile_settings(project_code, gids):
    ProfileSetting = unpurview(sandman.endpoint_class('profile_settings'))
    session = get_postman_session()
    profile_settings = {}
    for gid in gids:
        compound_name = project_code + ':' + gid
        result = session.query(ProfileSetting).filter(ProfileSetting.name == compound_name).first()
        ...
    return profile_settings

Problems:

No batch query – each gid triggers a separate SQL request.

ORM objects are recreated repeatedly, adding overhead.

Frequent attribute access via getAttr inside the loop.

Refactor applied:

def get_group_profile_settings(project_code, gids):
    ProfileSetting = unpurview(sandman.endpoint_class('profile_settings'))
    session = get_postman_session()
    # Batch query all needed rows at once
    query_results = session.query(ProfileSetting).filter(
        ProfileSetting.name.in_([
            f"{project_code}:{gid}" for gid in gids
        ])
    ).all()
    profile_settings = {}
    for result in query_results:
        if not result:
            continue
        result = result.as_dict()
        gid = result['name'].split(':')[1]
        profile_settings[gid] = {
            'tag_indexes': result.get('tag_indexes'),
            'interval': result['interval'],
            'status': result['status'],
            'profile_machines': result['profile_machines'],
            'thread_settings': result['thread_settings'],
        }
    return profile_settings

After this change, the flame graph showed a dramatic reduction in database‑related hot spots.

Optimization results – The same API endpoint’s response time dropped from 37.6 seconds to 1.47 seconds.

Takeaways

Eliminate unnecessary features when possible; removing the eager CPU‑max load gave the biggest win.

Reduce the frequency and complexity of expensive operations (e.g., batch database queries, thread pools).

Use profiling tools (cProfile + gprof2dot for Python, pprof for Go) to locate true bottlenecks before optimizing.

Further gains can be achieved by front‑end rendering tricks, deeper code refactoring, or even rewriting performance‑critical parts in a faster language.

Overall, a disciplined, data‑driven approach to performance tuning—starting with measurement, then targeted code and design changes—delivered a >25× speedup with relatively low development effort.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendOptimizationprofiling
NetEase Game Operations Platform
Written by

NetEase Game Operations Platform

The NetEase Game Automated Operations Platform delivers stable services for thousands of NetEase titles, focusing on efficient ops workflows, intelligent monitoring, and virtualization.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.