How Refactoring a Decade‑Old Query Engine Cut Code by 80% and Boosted Performance

After taking over a 10‑year‑old query‑understanding module, the team reduced its codebase by 80%, dramatically improved iteration speed, stability, observability, and memory usage, and enabled deployment on both self‑built cloud and on‑premise data centers by systematically eliminating classic code smells.

ITPUB
ITPUB
ITPUB
How Refactoring a Decade‑Old Query Engine Cut Code by 80% and Boosted Performance

Background

Following an organizational restructure, our team inherited three low‑level search‑chain modules, including the Query Optimizer (QO) responsible for tokenization, term weighting, relevance scoring, and intent detection. The original code, written over a decade ago, suffered from poor performance, long startup times, high memory consumption, and limited observability.

Why Refactor

Iteration efficiency was low – adding a simple operator required three person‑days.

Stability was poor – occasional P99 latency spikes.

Startup time was excessive – the service needed 18 minutes to launch.

Memory usage was high – a single process consumed 114 GB.

Lack of monitoring and traceability tools.

Out‑dated GCC 4.8 prevented use of modern C++ features.

Inability to deploy to the company’s self‑built cloud platform.

These pain points motivated a three‑month “big cleaning” effort to redesign and refactor the module.

Code Smells and Fixes

1. Duplicate Code

Two conversion functions for GBK↔UTF‑8 differed only in argument order, leading to duplicated logic.

Motivation: developers copied existing code (CV) for speed.

Prevention/Rescue: use CodeCC to detect duplicates, extract common logic into shared utilities, increase unit‑test coverage, and enforce reuse policies.

After refactor the duplicated functions were merged into a single reusable component.

2. Long Functions

One function spanned 1,380 lines (including 300 lines of comments), making it unreadable.

Motivation: fear of breaking existing logic caused developers to avoid refactoring.

Prevention/Rescue: avoid large functions, split responsibilities, add comprehensive tests, and use feature flags instead of commenting out code.

Post‑refactor the function was broken into smaller, well‑named units, reducing its size to under 200 lines.

3. Bloated Classes

A request‑handling class bundled HTTP service instances, cache objects, and dozens of strategy logics, becoming a “god class”.

Motivation: initial single strategy expanded as new features were added without reconsidering class boundaries.

Prevention/Rescue: clearly document class responsibilities, split large classes into smaller ones, and delegate sub‑tasks to dedicated helper classes.

After refactor the responsibilities were distributed across multiple focused classes, improving readability and testability.

4. Long Parameter Lists

A method accepted 56 parameters, making calls error‑prone.

Motivation: developers copied existing parameter structures for convenience.

Prevention/Rescue: group related parameters into configuration objects, increase unit‑test coverage, and prefer passing domain objects over long primitive lists.

Refactored code now uses a single config object, reducing the call signature dramatically.

5. Confusing Temporary Fields

Variables like is_second were introduced without clear meaning, obscuring intent.

Motivation: quick fixes without proper naming.

Prevention/Rescue: give variables expressive names, eliminate unnecessary temporaries, and consolidate related logic into functions.

After cleanup the code uses self‑explanatory names and fewer temporary variables.

6. Overly Broad Parameter Ranges

Large structs (e.g., worker and proc_node) were passed through many functions, many of which needed only a subset of fields.

Motivation: laziness in extracting precise dependencies.

Prevention/Rescue: apply the “least knowledge” principle, pass only required data, and consider creating smaller DTOs.

Refactor resulted in leaner interfaces with minimal required parameters.

7. Unnecessary Serial Execution

Two tokenization steps (with and without punctuation) were performed sequentially despite being independent.

Motivation: belief that request‑level parallelism was sufficient.

Prevention/Rescue: use a DAG scheduler to run independent sub‑tasks in parallel.

Parallelizing these steps reduced the main‑flow latency from 13.19 ms to 9.71 ms (≈26 % improvement).

8. Ignored Compilation Warnings

Upgrading GCC exposed missing return statements and buffer overflows caused by unsafe sprintf usage.

Motivation: warnings were treated as harmless.

Prevention/Rescue: enable -Wall -Werror to turn warnings into errors, enforce strict compilation, and adopt safer string handling functions.

After fixing, the code compiled cleanly with modern GCC and exhibited no runtime crashes.

Overall Impact

The systematic removal of code smells cut the code volume by 80 %, halved the service startup time, reduced memory consumption from 114 GB to a manageable level, and eliminated long‑standing stability issues. The refactored module now supports deployment on both the company’s self‑built cloud and traditional data‑center environments.

Conclusion

Sharing the encountered smells, their motivations, and concrete remediation steps demonstrates that disciplined refactoring can dramatically improve performance, maintainability, and reliability of legacy backend systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend DevelopmentSoftware EngineeringCode Refactoringcode smells
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.