Databases 15 min read

How to Refactor Complex SQL Like Java Code: A Real‑World Case Study

This article demonstrates how to refactor a complex, production‑level SQL query used in a routing system by formatting, decomposing into layers, merging temporary tables, pushing predicates, optimizing joins, and validating performance, ultimately reducing nesting from four levels to one and cutting execution time from 4.75 s to 0.6 s.

JD Cloud Developers

May 21, 2024

How to Refactor Complex SQL Like Java Code: A Real‑World Case Study

Introduction

Software evolves, adding features and fixing bugs, which leads to code growth and increased complexity. Without proper management, technical debt accumulates, causing “code rot”. Although SQL itself is a stable language, database applications written with SQL can suffer similar “SQL rot” over time.

The Platform Technology team focuses on stability; slow SQL is a core metric. When governance reaches a deep‑water stage, “SQL rot” appears as complex queries that need refactoring. This article uses a real case to show how to refactor SQL the way Java code is refactored.

JDL Routing System Complex SQL Governance Case

Routing planning ensures customer experience by designing operation times for each node in a logistics network, linking nodes to guarantee end‑to‑end connectivity and optimal overall performance. It is essentially an operations‑research optimization problem built on a network of routes.

1. Problem SQL

SELECT COUNT(*) total_count FROM (
    SELECT *
    FROM (
        SELECT *
        FROM (
            SELECT ...
        )
    )
) t_total;

This production‑level complex SQL was identified by slow‑SQL metrics. It is both poorly performing and hard to read, indicating low maintainability.

2. Begin Governance

step1. Formatting

For engineers, formatting is essential for readability before refactoring.

SELECT COUNT(*) total_count FROM (
    SELECT * FROM (
        SELECT a.line_store_goods_id AS line_resource_id, a.group_num, a.approval_erp, ...
        FROM join_table
    )
) t_total;

After formatting, the purpose of the query becomes clear: count lines that satisfy certain conditions.

Note: Formatting can be applied at any stage.

step2. Layered Decomposition

level0 SELECT COUNT(*) total_count FROM t_total; level1 – t_total

SELECT * FROM temp WHERE start_node_code = '311F001' AND disable_time > '2023-11-15 00:00:00' AND enable_time < disable_time;

level2 – temp

SELECT a.line_store_goods_id AS line_resource_id, a.group_num, a.approval_erp, a.approval_person, a.approval_status, a.approval_time, a.approval_remark, a.master_slave, a.parent_line_code, a.remarks, a.operator_time, a.same_stowage, b.start_org_id, b.start_org_name, b.start_province_id, b.start_province_name, b.start_city_id, b.start_city_name, b.start_node_code, b.start_node_name, b.end_org_id, b.end_org_name, b.end_province_id, b.end_province_name, b.end_city_id, b.end_city_name, b.end_node_code, b.end_node_name, b.depart_time, b.arrive_time, b.depart_wave_code, b.arrive_wave_code, b.enable_time, b.disable_time, a.store_enable_time, a.store_disable_time, a.update_name AS operator_name, b.line_code, b.line_type, b.transport_type,
    IF(a.store_enable_time > b.enable_time, IF(a.store_enable_time > c.enable_time, a.store_enable_time, c.enable_time), IF(b.enable_time > c.enable_time, b.enable_time, c.enable_time)) AS insect_start_time,
    IF(a.store_disable_time < b.disable_time, IF(a.store_disable_time < c.disable_time, a.store_disable_time, c.disable_time), IF(b.disable_time < c.disable_time, b.disable_time, c.disable_time)) AS insect_end_time
FROM join_table;

level3 – join_table

(SELECT * FROM line_store_goods WHERE yn = 1 AND master_slave = 1) a
JOIN (
    SELECT start_org_id, start_org_name, start_province_id, start_province_name, start_city_id, start_city_name, start_node_code, start_node_name, end_org_id, end_org_name, end_province_id, end_province_name, end_city_id, end_city_name, end_node_code, end_node_name, depart_time, arrive_time, depart_wave_code, arrive_wave_code, line_code, line_type, transport_type, MIN(enable_time) AS enable_time, MAX(disable_time) AS disable_time
    FROM line_resource
    WHERE line_code IN (SELECT line_code FROM line_store_goods WHERE yn = 1) AND yn = 1
    GROUP BY line_code
) b ON a.line_code = b.line_code AND a.start_node_code = b.start_node_code
JOIN (
    SELECT line_code, start_node_code, MIN(enable_time) AS enable_time, MAX(disable_time) AS disable_time
    FROM line_resource
    WHERE yn = 1
    GROUP BY line_code
) c ON a.parent_line_code = c.line_code AND a.start_node_code = c.start_node_code;

step3. Refactoring

Java developers know “Refactoring” from Martin Fowler’s book; the same principles apply to SQL, with tools to implement them.

Toolset:

Merge layers – reduce temporary tables

Push predicates – reduce rows scanned and temporary‑table size

Join optimization – reduce rows scanned and temporary‑table size

Remove subqueries – reduce temporary tables

Convert subqueries ↔ joins – reduce rows scanned

Refactor 1 – Layer Merge

level0 & level1

Two equivalent queries; the second avoids an extra subquery and performs better.

SELECT COUNT(*) total_count FROM (SELECT * FROM temp WHERE a = "1");

SELECT COUNT(*) FROM temp WHERE a = "1";

Benefits: less compute overhead, lower memory usage, reduced disk I/O.

Refactor 2 – Predicate Pushdown

Push start_node_code = '311F001' down to level4

Moving the filter earlier reduces the size of intermediate results and saves time and space.

Refactor 3 – Join Optimization

Formatting to personal preference

Predicate pushdown

Remove redundant fields

Delete invalid conditions (e.g., start_node_code in the JOIN ON clause after pushdown) and unnecessary subqueries, then merge duplicate joins.

The optimized join becomes a simple two‑table join with clear readability.

Step4. Theoretical Validation

SELECT COUNT(*) FROM (
  (SELECT line_code FROM line_store_goods WHERE yn = 1 AND parent_line_code = line_code AND master_slave = 1 AND start_node_code = '311F001') a
  JOIN (SELECT line_code, MIN(enable_time) AS enable_time, MAX(disable_time) AS disable_time FROM line_resource WHERE yn = 1 AND start_node_code = '311F001' GROUP BY line_code) b
  ON a.line_code = b.line_code
) WHERE disable_time > '2023-11-15 00:00:00' AND enable_time < disable_time;

The refactored SQL is readable and its business intent easy to infer; however, production SQL may still be incorrect despite being usable.

step5. Index Optimization

Refer to extensive index‑optimization articles for details.

step6. Result Testing

As with code refactoring, functional and performance testing are required after SQL changes.

3. Effect Comparison

Nested levels reduced from 4 to 1

Number of joins reduced from 3 to 2

Subqueries reduced from 7 to 2

Execution time reduced from 4.75 s to 0.6 s

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Optimization SQL database JOIN refactoring Slow Query

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.