How Shopee Automates Southeast Asian Last‑Mile Sorting with AI and Big Data
This article analyzes the inefficiencies of Southeast Asian last‑mile logistics and explains Shopee's AI‑driven, data‑centric solution that builds a trusted address library, uses offline training and online inference, and adopts AOI‑based matching to automate parcel sorting and driver assignment.
Problem Overview
Southeast Asian e‑commerce faces fragmented, manual last‑mile logistics due to diverse languages, missing corpora, and weak GIS data, resulting in low efficiency, high error rates, and limited scalability. Shopee processes over 2 billion items during major sales events, exposing these challenges.
Business Model and Pain Points
Shopee’s delivery flow involves sellers, Shopee logistics, and buyers, with three stages: First Mile (FM), Middle Mile (SOC), and Last Mile (LM). Manual address reading takes about 10 seconds per parcel, leading to low sorting speed, human error, high training costs, and driver inefficiencies caused by overlapping service areas.
Technical Challenges
Non‑standard, dialect‑rich address texts lacking a comprehensive corpus.
Coarse or changing administrative boundaries causing mis‑filled addresses.
Identifying and correcting erroneous addresses without a reliable lexicon.
Assessing confidence of historical address records.
Geolocation offsets and duplicate orders with multiple coordinates.
Solution Architecture
1. Offline Training
Clean historical address orders to build a trusted address library.
Train matching models on the cleaned data.
2. Online Inference
Pre‑process incoming order addresses into standardized text.
Apply the trained model to infer the best Hub and driver.
Use inference results for sorting and dispatch.
The trusted address library and matching model form the foundation for real‑time services.
Data Sources
Two main sources feed the system: massive historical order addresses and manually verified address data from field teams.
Processing Pipeline
Standardized address texts are segmented using administrative information, aggregated into Areas of Interest (AOI), and cleaned to produce the trusted library.
Technical Stack
The system is split into an online address inference service and an offline training service, communicating via message queues. Core components include:
Address Service : Provides address libraries and matching models, employing rule‑based, similarity‑based, keyword‑based, and locally annotated data matching.
Sorting Service : Exposes OpenAPI for address assignment and text segmentation, serving both Shopee logistics and third‑party logistics.
Operations Platform : Offers tools for zone generation, data annotation upload, monitoring, and strategy configuration.
AOI vs. POI
Because administrative regions are large, matching to a point (POI) is unnecessary; matching to an area (AOI) suffices. AOI offers higher matching performance with moderate collection difficulty and maintenance cost.
Key techniques for AOI extraction include TF‑IDF, BM25, and TextRank to derive keywords from addresses.
Address Cleaning Workflow
Data passes through the following stages:
Strategy Center : Configures cleaning strategies.
Pre‑processing : Normalizes format, splits text, batch processes.
Cleaning Engine : Applies rule‑based, machine‑learning, or deep‑learning methods.
Validation Engine : Checks accuracy and coverage, feeds back bad cases.
Trusted Address Library : Stores versioned, region‑aware address data.
Practical Outcomes
By partitioning service ranges into finer AOIs per driver, Shopee reduces manual sorting time, improves driver dispatch efficiency, and enables automation equipment in warehouses. The approach also supports multilingual address handling and scalable model management across diverse regions.
Example of problematic address texts:
Jalan Petinggi Umar, RT.23, Depan Kantr Desa Loa Duri Ilir, Loa Janan.Translated: "Jalan Petinggi Umar, RT.23, in front of the Loa Duri Ilir village office, Loa Janan." Google Maps mis‑matches the auxiliary information, causing sorting errors.
Jl marsma r iswahyudi RT 15 (masuk 75 M dari Jembatan sungai sepinggan rumah didepan sungai tingkat warna pink biru)Translated: "Jl marsma r iswahyudi RT 15 (enter 75 m from the river bridge, house front of river, pink‑blue color)." The address is chaotic, leading to unreliable geocoding.
Shopee Tech Team
How to innovate and solve technical challenges in diverse, complex overseas scenarios? The Shopee Tech Team will explore cutting‑edge technology concepts and applications with you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
