Backend Development 14 min read

Design and Architecture of a Scalable Transaction Platform for Vivo Global Mall

This article presents the design philosophy, overall architecture, multi‑tenant sharding strategy, state‑machine workflow, distributed transaction handling, and high‑availability measures of Vivo’s global e‑commerce transaction platform, sharing practical challenges and solutions encountered during its development and ongoing evolution.

Architecture Digest
Architecture Digest
Architecture Digest
Design and Architecture of a Scalable Transaction Platform for Vivo Global Mall

Vivo's official mall has evolved from a monolithic system to a micro‑service architecture over seven years, accumulating valuable technical experience and deep e‑commerce domain knowledge. To support new O2O initiatives, a gift‑center, and offline delivery, the team decided to build a platform‑based foundation for product, transaction, and inventory capabilities.

The transaction platform’s overall architecture aims for high concurrency, performance, and availability, while also emphasizing low cost, high extensibility, and easy horizontal scaling.

The data model is illustrated below.

Multi‑Tenant Design

Each tenant (business line) stores massive order data and requires high‑availability, high‑performance services. Sharding is based on userId as the shard key, using MySQL for storage and ShardingSphere for database‑table partitioning.

Tenant‑to‑resource mapping: tenantCode → {dbCount, tableCount, dbStart, tableStart} . This allows flexible allocation of storage per tenant, reusing existing databases for low‑volume tenants.

Examples:

Tenant with low order volume reuses an existing DB0‑Table0 (mapping: 1,1,0,0).

Tenant with moderate volume adds 8 new tables to DB0 (mapping: 1,8,0,16).

High‑volume tenant creates 4 new databases each with 8 tables (mapping: 4,8,4,0).

Order‑to‑DB/Table calculation:

dbIndex = Hash(userId) / tableCount % dbCount + dbStart
 tableIndex = Hash(userId) % tableCount + tableStart

Dividing by tableCount first avoids skew when dbCount and tableCount share a common factor.

Global Unique ID

To keep order numbers globally unique after sharding, a Snowflake‑like ID is used, embedding the DB and table indices (each 5 bits) so the routing information can be extracted from the ID.

Full‑Table Search

For admin queries that need to filter across all orders, order data is duplicated into Elasticsearch to provide fast, flexible search capabilities.

State Machine Design

Order and after‑sale processes are modeled as configurable state machines stored as JSON in a configuration center or database, enabling each tenant to define custom workflows without code changes.

/**
 * Order flow configuration
 */
@Data
public class OrderFlowConfig implements Serializable {
    /**
     * Initial order status code
     */
    private String initStatus;
    /**
     * Map
>
     */
    private Map
> operations;
}

Generic Trigger Mechanism

Common delayed actions (e.g., auto‑close unpaid orders, auto‑approve refunds) are implemented via configurable triggers that send delayed messages; upon receipt, conditions are re‑checked before executing the defined operation.

Distributed Transactions

Two consistency models are applied:

Strong consistency : critical actions such as order creation/cancellation that affect inventory and coupons use Seata’s AT mode.

Eventual consistency : actions like notifying the shipping system after payment or awarding points after receipt use a local message table with retry/compensation.

High Availability & Security

Circuit breaking with Hystrix to isolate failing downstream services.

Rate limiting based on performance testing results.

Row‑level database locks to prevent concurrent order updates.

Idempotent APIs to allow safe retries.

Network isolation: only a few third‑party interfaces are exposed externally with whitelist, encryption, and signature verification.

Comprehensive monitoring and alerting via log platforms, tracing, and middleware metrics.

Other Considerations

Domain‑Driven Design was not adopted due to team structure.

Peak‑traffic bottlenecks (e.g., flash‑sale spikes) may trigger rate limiting; mitigation includes graceful degradation such as async persistence, cache‑first reads, and limiting query windows.

Conclusion & Outlook

The platform balances pragmatic technology choices with business needs, avoiding unnecessary over‑engineering. After more than a year in production, the transaction platform serves three business lines and remains extensible for new requirements. Future work includes extracting the fulfillment module into an independent service to further decouple the system.

e-commercemicroservicesshardingState Machinedistributed transactions
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.