How to Shrink a 130GB Logistics JSON Column in MySQL
This article examines a MySQL table storing large JSON logistics data, analyzes why its size grows to 130 GB, and presents practical optimizations such as data archiving, JSON field reduction, and schema redesign to keep the table manageable.
Background
The e‑commerce system records a logistics JSON string for each order in the goods_order_express table. Over time the table grew to 130 GB because every API call appends a new JSON payload to the data TEXT column, and no rows are ever deleted.
Original Table Structure
CREATE TABLE `goods_order_express` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`express_id` int(10) unsigned NOT NULL,
`message` varchar(200) NOT NULL,
`status` varchar(20) NOT NULL,
`state` tinyint(3) unsigned NOT NULL,
`data` text NOT NULL,
`created_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_expid` (`express_id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4;Business Analysis
Each logistics event generates a JSON array like:
[{"time":"2016-03-16 11:16:20","ftime":"2016-03-16 11:16:20","context":"...","areaCode":"","areaName":"","status":"在途"}, ...]
The raw string is 411 bytes. The system periodically polls the API and inserts or updates this JSON into the data column. Because rows are never removed, the table continuously expands.
Optimization 1 – Archive Historical Data
Logistics records have a natural lifecycle: after delivery they are rarely needed for display. Archive rows older than 90 days to HBase (or another cold‑storage system) and delete them from MySQL, keeping the active table size bounded.
Optimization 2 – Reduce JSON Size
Remove redundant fields : ftime duplicates time, and areaCode / areaName are not used in the UI. Dropping them cuts the example from 411 bytes to 237 bytes (≈47 % reduction).
Trim node count : Typical shipments have 15‑20 nodes. Estimating 100 bytes per node, a maximum of 20 nodes fits comfortably in VARCHAR(2048). For shipments with >25‑30 nodes, keep only key milestones (first, last, and any status changes) and discard intermediate nodes.
Optimization 3 – Schema Change to VARCHAR
After trimming, the JSON fits within 2 KB. Changing the column type from TEXT to VARCHAR(2048) reduces storage overhead and improves indexability.
Optimization 4 – Sharding (Optional)
If the table still grows rapidly, consider horizontal sharding (e.g., by month or by express_id) using middleware such as Cobar, Mycat, or ProxySQL. However, this adds operational complexity and should be a last resort.
Result
After applying the above steps—archiving old rows, simplifying the JSON, and converting the column to VARCHAR(2048) —the table size stabilized and the system runs smoothly without requiring massive disk expansion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
