Databases 15 min read

Order Table Splitting Project: Architecture, Implementation, and Performance Improvements

This article describes the background, analysis, solution selection, architecture redesign, execution steps, and results of a large‑scale order table splitting project that separates hot and cold data, applies horizontal sharding, and uses MySQL partitioning to control table size and boost query performance.

Hujiang Technology

Jul 5, 2018

Order Table Splitting Project: Architecture, Implementation, and Performance Improvements

Background – As the product line of Hujiang expands, the volume of transaction data grows exponentially, especially in the order domain. Single‑table size is projected to exceed reasonable limits within six months, causing high CPU and I/O consumption for inserts and queries.

Current Situation Analysis – The order system consists of six internal subsystems (order creation, modification, delivery, front‑end query, operations query, after‑sale) and several external systems (checkout page, settlement, BI, CRM, operations). Data is classified into hot data (frequently updated within two months) and cold data (historical orders used mainly by after‑sale and query services). Hot data is accessed by the order creation, modification, and delivery subsystems, while cold data is accessed by query and after‑sale subsystems.

Solution Options

MySQL table partitioning – provides coarse‑grained partitioning but offers limited optimization for the multi‑dimensional queries required by the order system.

Horizontal sharding – improves concurrency and controllability but requires global order‑id distribution, extensive service refactoring, and a downtime deployment.

Cold‑hot data separation – stores hot and cold data in separate databases, reducing impact scope and allowing independent scaling, but introduces large historical tables and potential single‑point bottlenecks.

Cold‑hot separation + horizontal sharding of cold data – combines the benefits of both approaches, keeping hot data in the active database and horizontally sharding the cold historical tables.

Chosen Architecture – After evaluation, the team adopted the fourth option (cold‑hot separation plus horizontal sharding of cold data). The final storage consists of four parts: active real‑time data, historical data older than two months, a full‑copy warehouse, and an Elasticsearch index for historical orders.

Data Storage Diagram – (Images omitted for brevity.)

Execution Details

Cold Data Migration & Horizontal Splitting

Cold data is migrated to a historical database via scheduled tasks, following order‑id order. Migration stops on errors and does not create indexes or delete migrated rows. Horizontal sharding uses ID ranges (e.g., 1‑10000 → tb_order_0001, 10001‑20000 → tb_order_0002) and ensures related tables share the same sharding rule.

Cold Data Index Creation

Indexes are built in Elasticsearch either by full data pull during initialization or by incremental tasks that push newly migrated orders to a message queue. A periodic inspection task verifies that indexes have been created successfully.

Full‑Warehouse Synchronization

Otter, an open‑source MySQL binlog sync tool, is used for near‑real‑time synchronization to the full warehouse. DELETE statements are filtered out, and a health‑check task monitors sync consistency.

Multi‑Source Consistency

Consistency checks include pausing migration on errors, alerting on failed modifications, and comparing data before deletion from the active database. Additional hourly checks compare the full warehouse with the active database.

Order Query Logic

Queries are routed to the active or historical database based on a global split point (order ID ≤ n → historical, > n → active). Pagination supports both "previous/next" and page‑number styles, handling Elasticsearch's 1000‑record limit by adjusting offset and limit via query rewriting.

public static void paging(int offset, int limit) {
    List<Integer> result = null;
    if (offset <= MAX_OFFSET) {
        result = pagingForLimit(offset, limit, null);
    } else if (offset > MAX_OFFSET) {
        int skipSize = offset;
        Integer currentItem = skip(skipSize);
        result = pagingForLimit(0, limit, currentItem);
    }
    if (result != null) {
        System.out.println(result.toString());
    } else {
        System.out.println("无值");
    }
}

Skipping to the Previous Order ID

The skip method finds the previous order ID to reset offset to zero, reducing the number of Elasticsearch queries.

private static Integer skip(int size) {
    int maxSkipStep = MAX_OFFSET + MAX_LIMIT;
    if (size <= maxSkipStep) {
        return get(size, null);
    } else {
        Integer preInteger = null;
        while (size > maxSkipStep) {
            List<Integer> tmp = query(MAX_OFFSET, MAX_LIMIT, preInteger);
            preInteger = tmp.get(tmp.size() - 1);
            size -= maxSkipStep;
        }
        return get(size, preInteger);
    }
}

Pagination with Limit Constraints

When limit exceeds the maximum allowed, the algorithm repeatedly queries smaller chunks and aggregates results until the requested number of records is satisfied.

private static List<Integer> pagingForLimit(int offset, int limit, Integer currentItem) {
    if (limit <= MAX_LIMIT) {
        return query(offset, limit, currentItem);
    } else {
        List<Integer> result = query(offset, MAX_LIMIT, currentItem);
        limit -= MAX_LIMIT;
        while (limit > MAX_LIMIT) {
            Integer pre = null;
            if (result != null) {
                pre = result.get(result.size() - 1);
            }
            result.addAll(query(0, MAX_LIMIT, pre));
            limit -= MAX_LIMIT;
        }
        if (limit > 0) {
            Integer preInteger = result.get(result.size() - 1);
            result.addAll(query(0, limit, preInteger));
        }
        return result;
    }
}

Release Plan – The rollout is divided into six steps across four releases, covering cold‑hot data migration, Otter deployment, incremental migration tasks, Elasticsearch integration, backend query refactoring, front‑end query refactoring, and cleanup of hot data.

Result – The project successfully controls single‑table size, doubles order‑query QPS for users, and quadruples historical query performance for operations. Remaining challenges include the large size of the full warehouse and the need to reduce its dependency.

data migration backend development sharding MySQL database partitioning

Written by

Hujiang Technology

We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.