How We Cut Data Deletion Time from Hours to Minutes in a High‑Volume OLTP System
Through a step‑by‑step case study of a high‑traffic OLTP system, this article explains how evolving deletion logic—from simple full‑table scans to indexed intermediate tables and parallel processing—reduced daily expired‑data cleanup from several hours to under an hour, while addressing ORA‑01555 and clustering‑factor issues.
Case Background
In a core OLTP transaction system with dozens of master‑detail tables, expired data must be cleaned daily before the 6 am peak. The original Java program performed serial deletions of each child table followed by the master table.
Problem Description
As business grew, the deletion window expanded from 30 minutes to 2‑10 hours. The bottleneck was a child table B with 50 million rows; about 7 million rows needed deletion each day. The initial design scanned the master table A to find IDs, then deleted matching rows in B, performing full‑table scans each run.
Deep Analysis and Evolution
Stage 1 – Early Business : Data volume was small; full scans of A were acceptable, and batch deletes were limited by ROWNUM to avoid large transactions.
Stage 2 – Growing Load : Introduced an intermediate table C to store IDs to delete, allowing parallel deletion of multiple child tables. This reduced the number of full scans on A but B still required full scans.
Stage 3 – Rapid Expansion : Added a PKID column to C and created an index, enabling index‑range scans and reducing execution cycles. However, long‑running queries triggered ORA‑01555 (undo‑segment overflow) because the queries held locks for too long.
Further testing showed that the effectiveness of the index depended on the clustering factor: when PKID ranges (e.g., 1 900 001‑1 920 001) aligned with ordered a_id values, index range scans were efficient; when ranges were unordered (e.g., 1‑10 001), the optimizer reverted to full scans.
Solution: construct C with
ORDER BY idso that inserts are sequential, keeping the index tightly correlated with physical row order and avoiding costly full scans.
Summary Review
Initially, any SQL that met functional requirements was acceptable, but full‑table scans soon became a performance risk. By moving to parallel execution, adding the PKID index, and ordering the intermediate table, the daily deletion window dropped from 2‑3 hours to about 50 minutes, cutting the overall maintenance window roughly in half.
Beyond these specific tweaks, other options such as partitioning, removing foreign‑key constraints for concurrent deletes, or using automated SQL audit tools (e.g., the in‑house "Sherlock" platform) can further mitigate hidden risks.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.