How 58 Express Scaled from Startup to Industry Leader: Architecture, Sharding, and AI Dispatch
This article recounts the technical evolution of 58 Express from its early startup days through rapid growth to an intelligent dispatch era, detailing challenges, database sharding, service decomposition, big‑data analytics, AI‑driven order routing, monitoring, and lessons learned for building a high‑performance backend system.
1. Startup – Rapid Iteration (2014)
58 Express was launched as one of >20 incubated services under the 58 Group. All services initially shared a single MySQL database distinguished only by tag fields, allowing a new service to go live within two weeks.
Pain points
Single‑point failure: a slow SQL query could stall the whole platform.
Multiple concurrent services created many indexes on the order table, degrading performance.
Schema changes were painful due to field redundancy and lock contention.
Order volume quickly grew to >10k orders/day, turning the database into a bottleneck.
First technical evolution – Database migration & cluster decoupling
Extract the order table into a dedicated database.
Set up bidirectional sync between the original and the new DB.
Use distinct primary‑key prefixes (e.g., 80 for Express, 10 for other services) to avoid key collisions.
Log updates during sync and perform post‑migration validation to prevent overwrite.
After several migration cycles the monolithic DB was split into four logical databases: order, settlement, configuration, and tracking, each sized according to its workload.
2. High‑Growth Stage (2015)
Rapid order growth, heavy subsidy competition, and frequent releases introduced new challenges:
Escalating operational cost due to blanket subsidies.
Merge conflicts and high bug rates from a growing codebase.
Order volume multiplied, pushing the system toward its performance ceiling.
Complex operational analytics required richer data access.
Second technical evolution – Service‑oriented architecture, caching, sharding, and big‑data platform
Service decoupling
More than 20 independent services (settlement, recharge, push, driver‑task, etc.) were created, each with its own database and owner, enabling isolated development and deployment.
Multi‑channel push
Push notifications use three channels (Xiaomi, GeTui, self‑built TCP). The system selects the channel with the highest delivery rate based on the driver’s device type, providing redundancy if one channel fails.
Quadrant‑based dispatch
Dispatch logic evolved from simple distance‑based push to a quadrant model:
Push to drivers within 1 km without subsidy.
If no grab, expand to the next quadrant and add a graduated subsidy (e.g., ¥1 for the first quadrant, ¥2 for the second, etc.).
Rank drivers by a quality score and select the best candidate.
Database sharding
Two‑step approach:
Vertical split : Frequently accessed user attributes moved to a separate table.
Horizontal split : User IDs partitioned either by range (0‑10M, 10M‑20M…) or by hash. Each shard receives its own read replicas.
Post‑sharding issues:
Non‑partition‑key queries became slower because they required full‑shard scans.
Cross‑shard joins were impossible, complicating complex operational reports.
Solutions for cross‑shard analytics
Build an index table mapping non‑partition keys (e.g., email, phone) to the partition key (UID).
Cache the UID‑partition mapping for high‑hit‑rate lookups.
Use a separate read‑only “backend” database synchronized via MySQL binlog, Canal, or MQ for heavy analytical queries (no production traffic).
Integrate external search engines (Elasticsearch / Solr) for ad‑hoc queries.
3. Intelligent Era (2016)
The “Battle‑Axe” project introduced AI‑driven pricing, driver selection, and churn mitigation.
Model training pipeline
Data from order, user, driver, and relationship tables are streamed to a big‑data platform. Features are engineered using XGBoost, one‑hot encoding, and feature crossing, producing >400 k features per order.
Feature pipeline stages (parallel thread pools): prepare → transform → fetch → compute, achieving sub‑50 ms latency per order.
Model‑driven workflow
Order creation : Adjust price based on supply‑demand imbalance in the target area.
Push stage : Rank drivers by willingness score and match quality.
Grab stage : Estimate expected grab count; high‑value orders receive no subsidy, low‑value orders receive targeted subsidies.
Assignment stage : Choose the driver with the best historical performance.
Completion stage : Predict user churn; issue coupons if churn risk is high.
Online A/B experiments run on 5‑10 % of traffic with real‑time monitoring; faulty algorithms are automatically rolled back.
Monitoring & tracing
Comprehensive metrics cover JVM, CPU, threads, cache, DB, service health, and business KPIs (conversion, cancellation, abnormal orders). A call‑trace system visualizes the full request path across services for rapid fault isolation.
4. Key Takeaways
Adopt architecture that matches the business stage: monolith → service‑oriented → AI‑enabled.
Use dual or triple push channels to guarantee high delivery rates.
Horizontal sharding (splitting by range or hash) is preferred when resources allow, but plan for the impact on non‑key queries.
Online algorithm traffic splitting must be backed by real‑time monitoring and automatic fallback.
Robust, multi‑dimensional monitoring is essential for early problem detection and impact mitigation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
