Taming a Million‑Row Log Table: Real‑World SQL Performance Optimization
A detailed case study describes how a rapidly growing edit‑log feature caused query times to soar to 30 seconds, and walks through the step‑by‑step investigation, identification of a custom function bottleneck, data‑volume analysis, and the eventual implementation of partitioning, mandatory time filters, and composite indexing to restore acceptable performance.
Background
The "edit log query" feature initially handled a small amount of data without issue, but as batch edits increased, daily log increments reached about 1 million rows, growing to 60 million rows within two months. Performance concerns surfaced when a user reported query times exceeding 30 seconds.
Initial Investigation
The author’s habit is to first scan the SQL before examining the execution plan. A custom function TimeZone_Date_Translator was identified as a likely bottleneck. Removing the function dramatically improved execution time, confirming the suspicion.
Escalation and Root Cause
Further testing with realistic conditions (querying a year’s worth of logs for a specific project) reproduced the 30‑second delay, revealing a result set of over 5 million rows and a base table size approaching 100 million rows. The primary cause was the sheer data volume.
Additional Bottleneck: Subquery for subtitlename
A scalar subquery retrieves subtitlename by joining different tables based on operate_type. When used as a filter, this subquery executes millions of times against large tables, becoming a serious performance hotspot.
Proposed Solutions
Introduce Table Partitioning : Partition RP_PLAN_LOG_T by operate_time on a monthly basis to enable partition pruning.
Make operate_time a Mandatory Filter : Require users to specify a time range, preferably within a single month, to limit scanned partitions.
Create a Composite Index : Add an index on (project_number, operate_time) to support the filtered queries efficiently.
Organizational Challenges
Implementation required coordination among developers, business analysts (BA), and DBAs. Responsibilities for partitioning and index creation were debated, leading to a consensus that DBAs would handle partitioning in the next release, while developers would add the composite index and enforce the time filter.
Further Optimizations
To address the subquery bottleneck, the author suggested splitting business_id into separate activity_id and attribute_id fields, allowing direct joins to smaller tables and avoiding costly scalar subqueries.
Outcome and Open Questions
After applying function removal, partitioning, mandatory time filters, and the composite index, query performance stabilized below the 5‑second threshold. Remaining questions include whether the data model could have been designed more comprehensively from the start, the true value of logging massive datasets, and how to handle dynamic query conditions without excessive indexing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
