How to Refactor Complex SQL Like Java Code: A Real‑World Case Study
This article demonstrates how to refactor a complex, production‑level SQL query used in a routing system by formatting, decomposing into layers, merging temporary tables, pushing predicates, optimizing joins, and validating performance, ultimately reducing nesting from four levels to one and cutting execution time from 4.75 s to 0.6 s.
Introduction
Software evolves, adding features and fixing bugs, which leads to code growth and increased complexity. Without proper management, technical debt accumulates, causing “code rot”. Although SQL itself is a stable language, database applications written with SQL can suffer similar “SQL rot” over time.
The Platform Technology team focuses on stability; slow SQL is a core metric. When governance reaches a deep‑water stage, “SQL rot” appears as complex queries that need refactoring. This article uses a real case to show how to refactor SQL the way Java code is refactored.
JDL Routing System Complex SQL Governance Case
Routing planning ensures customer experience by designing operation times for each node in a logistics network, linking nodes to guarantee end‑to‑end connectivity and optimal overall performance. It is essentially an operations‑research optimization problem built on a network of routes.
1. Problem SQL
SELECT COUNT(*) total_count FROM (
SELECT *
FROM (
SELECT *
FROM (
SELECT ...
)
)
) t_total;This production‑level complex SQL was identified by slow‑SQL metrics. It is both poorly performing and hard to read, indicating low maintainability.
2. Begin Governance
step1. Formatting
For engineers, formatting is essential for readability before refactoring.
SELECT COUNT(*) total_count FROM (
SELECT * FROM (
SELECT a.line_store_goods_id AS line_resource_id, a.group_num, a.approval_erp, ...
FROM join_table
)
) t_total;After formatting, the purpose of the query becomes clear: count lines that satisfy certain conditions.
Note: Formatting can be applied at any stage.
step2. Layered Decomposition
level0 SELECT COUNT(*) total_count FROM t_total; level1 – t_total
SELECT * FROM temp WHERE start_node_code = '311F001' AND disable_time > '2023-11-15 00:00:00' AND enable_time < disable_time;level2 – temp
SELECT a.line_store_goods_id AS line_resource_id, a.group_num, a.approval_erp, a.approval_person, a.approval_status, a.approval_time, a.approval_remark, a.master_slave, a.parent_line_code, a.remarks, a.operator_time, a.same_stowage, b.start_org_id, b.start_org_name, b.start_province_id, b.start_province_name, b.start_city_id, b.start_city_name, b.start_node_code, b.start_node_name, b.end_org_id, b.end_org_name, b.end_province_id, b.end_province_name, b.end_city_id, b.end_city_name, b.end_node_code, b.end_node_name, b.depart_time, b.arrive_time, b.depart_wave_code, b.arrive_wave_code, b.enable_time, b.disable_time, a.store_enable_time, a.store_disable_time, a.update_name AS operator_name, b.line_code, b.line_type, b.transport_type,
IF(a.store_enable_time > b.enable_time, IF(a.store_enable_time > c.enable_time, a.store_enable_time, c.enable_time), IF(b.enable_time > c.enable_time, b.enable_time, c.enable_time)) AS insect_start_time,
IF(a.store_disable_time < b.disable_time, IF(a.store_disable_time < c.disable_time, a.store_disable_time, c.disable_time), IF(b.disable_time < c.disable_time, b.disable_time, c.disable_time)) AS insect_end_time
FROM join_table;level3 – join_table
(SELECT * FROM line_store_goods WHERE yn = 1 AND master_slave = 1) a
JOIN (
SELECT start_org_id, start_org_name, start_province_id, start_province_name, start_city_id, start_city_name, start_node_code, start_node_name, end_org_id, end_org_name, end_province_id, end_province_name, end_city_id, end_city_name, end_node_code, end_node_name, depart_time, arrive_time, depart_wave_code, arrive_wave_code, line_code, line_type, transport_type, MIN(enable_time) AS enable_time, MAX(disable_time) AS disable_time
FROM line_resource
WHERE line_code IN (SELECT line_code FROM line_store_goods WHERE yn = 1) AND yn = 1
GROUP BY line_code
) b ON a.line_code = b.line_code AND a.start_node_code = b.start_node_code
JOIN (
SELECT line_code, start_node_code, MIN(enable_time) AS enable_time, MAX(disable_time) AS disable_time
FROM line_resource
WHERE yn = 1
GROUP BY line_code
) c ON a.parent_line_code = c.line_code AND a.start_node_code = c.start_node_code;step3. Refactoring
Java developers know “Refactoring” from Martin Fowler’s book; the same principles apply to SQL, with tools to implement them.
Toolset:
Merge layers – reduce temporary tables
Push predicates – reduce rows scanned and temporary‑table size
Join optimization – reduce rows scanned and temporary‑table size
Remove subqueries – reduce temporary tables
Convert subqueries ↔ joins – reduce rows scanned
Refactor 1 – Layer Merge
level0 & level1
Two equivalent queries; the second avoids an extra subquery and performs better.
SELECT COUNT(*) total_count FROM (SELECT * FROM temp WHERE a = "1"); SELECT COUNT(*) FROM temp WHERE a = "1";Benefits: less compute overhead, lower memory usage, reduced disk I/O.
Refactor 2 – Predicate Pushdown
Push start_node_code = '311F001' down to level4
Moving the filter earlier reduces the size of intermediate results and saves time and space.
Refactor 3 – Join Optimization
Formatting to personal preference
Predicate pushdown
Remove redundant fields
Delete invalid conditions (e.g., start_node_code in the JOIN ON clause after pushdown) and unnecessary subqueries, then merge duplicate joins.
The optimized join becomes a simple two‑table join with clear readability.
Step4. Theoretical Validation
SELECT COUNT(*) FROM (
(SELECT line_code FROM line_store_goods WHERE yn = 1 AND parent_line_code = line_code AND master_slave = 1 AND start_node_code = '311F001') a
JOIN (SELECT line_code, MIN(enable_time) AS enable_time, MAX(disable_time) AS disable_time FROM line_resource WHERE yn = 1 AND start_node_code = '311F001' GROUP BY line_code) b
ON a.line_code = b.line_code
) WHERE disable_time > '2023-11-15 00:00:00' AND enable_time < disable_time;The refactored SQL is readable and its business intent easy to infer; however, production SQL may still be incorrect despite being usable.
step5. Index Optimization
Refer to extensive index‑optimization articles for details.
step6. Result Testing
As with code refactoring, functional and performance testing are required after SQL changes.
3. Effect Comparison
Nested levels reduced from 4 to 1
Number of joins reduced from 3 to 2
Subqueries reduced from 7 to 2
Execution time reduced from 4.75 s to 0.6 s
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
