How Lalamove Built an AI‑Powered Edge‑Cloud Review System for Global Driver Verification
Lalamove tackled the scalability and accuracy challenges of worldwide driver onboarding by designing a layered edge‑cloud AI architecture that combines lightweight mobile models, cloud‑based large‑language and computer‑vision models, OCR, and multimodal LLMs to filter low‑quality inputs, automate identity checks, and reduce manual effort while maintaining data compliance.
Background
Lalamove’s Driver Operations (DOP) team processes massive driver‑onboarding verification tasks (face matching, document validation, vehicle inspection) across many markets. Data formats vary by country and fraud tactics evolve, creating latency, cost, and accuracy challenges.
Edge‑Cloud Collaborative Architecture
A layered governance model separates responsibilities:
Edge (mobile) : Ultra‑lightweight models run on the driver’s device to reject low‑quality or non‑compliant images (blurred, no face, multiple faces, off‑center, incomplete license‑plate) before upload.
Cloud : High‑capacity resources (large language models, high‑precision computer‑vision models) handle complex, long‑tail cases and make final decisions.
This reduces invalid uploads, lowers third‑party API costs, and lets human reviewers focus on high‑risk cases.
Scenario 1 – Face‑Matching Verification
Business background
To prevent impersonation, a live driver photo is compared with the stored registration image using Face Recognition Technology (FRT).
Solution
Front‑end (edge) : A millisecond‑level lightweight model performs a health check on the uploaded photo and blocks images that meet any of the following conditions:
No face detected
More than one face detected
Face not centered (to avoid side‑views or distant shots)
Result: early feedback to the driver and fewer calls to downstream FRT services.
Back‑end (cloud) : Accepted images are sent to AWS Rekognition (or equivalent) to extract a high‑dimensional embedding and compute a 1:1 similarity score against the stored profile.
Face detection & feature extraction → embedding vector.
Similarity computation between live embedding and stored embedding.
Threshold decision: score ≥ pass‑threshold → verification success; score ≤ reject‑threshold → failure.
Scenario 2 – Driver Document Verification
Business background
Drivers submit various identity documents (ID, driver’s license, etc.) that differ across regions, leading to high rejection rates due to wrong document type or non‑conforming photos.
Solution
The pipeline adds OCR on both edge and cloud, then uses a large language model (LLM) on the cloud to interpret raw OCR text, classify document type, and flag anomalies.
Limitations of traditional OCR + rule‑based extraction:
Highly sensitive to layout changes.
Regex/keyword rules explode across languages.
OCR errors cause rule failures.
Inability to reason about semantics (e.g., mixed document types).
LLM‑enhanced approach:
Run OCR to obtain raw text.
Prompt LLM with the text to extract structured fields (e.g., name, number, expiry) and determine document type.
Apply business‑level validation on the extracted JSON.
This reduces manual review volume and improves accuracy across multilingual markets.
Scenario 3 – Overseas Vehicle Monthly Review
Business background
Monthly vehicle inspections are required for driver payouts. Manual review suffered from inconsistent standards and a ~40 % rejection rate, primarily because license‑plate photos were incomplete.
Solution
Edge : A custom lightweight model embedded in the driver app detects missing or poorly framed plates in real time. The model achieves >99.9 % detection accuracy while keeping the app responsive.
Cloud : A multimodal LLM processes the remaining images, extracting structured JSON (e.g., {"text_visible": true, "side_view": true}) that downstream rule engines consume. The LLM is used only for information extraction, not direct decision making, to avoid hallucinations.
Iterative prompt engineering:
Version 1 – Separate LLM call per rule (high token cost, impractical).
Version 2 – Single massive prompt (attention dilution, frequent hallucinations).
Version 3 – Group‑wise, relevance‑based prompts with structured output (stable accuracy, manageable cost).
Key engineering choices:
Parameter tuning : Temperature set to 0.1 for deterministic outputs.
Data compliance : Sensitive fields (plate numbers, faces) are anonymized on‑device or in a middle‑layer before reaching the LLM; the model only sees non‑sensitive visual cues.
Future Outlook
The architecture will be extended to additional verification scenarios, deeper model capabilities, and stronger security/compliance controls as the system scales globally.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
