How Lalamove Built an AI‑Powered Edge‑Cloud Review System for Global Driver Verification

Lalamove tackled the scalability and accuracy challenges of worldwide driver onboarding by designing a layered edge‑cloud AI architecture that combines lightweight mobile models, cloud‑based large‑language and computer‑vision models, OCR, and multimodal LLMs to filter low‑quality inputs, automate identity checks, and reduce manual effort while maintaining data compliance.

Huolala Tech
Huolala Tech
Huolala Tech
How Lalamove Built an AI‑Powered Edge‑Cloud Review System for Global Driver Verification

Background

Lalamove’s Driver Operations (DOP) team processes massive driver‑onboarding verification tasks (face matching, document validation, vehicle inspection) across many markets. Data formats vary by country and fraud tactics evolve, creating latency, cost, and accuracy challenges.

Edge‑Cloud Collaborative Architecture

A layered governance model separates responsibilities:

Edge (mobile) : Ultra‑lightweight models run on the driver’s device to reject low‑quality or non‑compliant images (blurred, no face, multiple faces, off‑center, incomplete license‑plate) before upload.

Cloud : High‑capacity resources (large language models, high‑precision computer‑vision models) handle complex, long‑tail cases and make final decisions.

This reduces invalid uploads, lowers third‑party API costs, and lets human reviewers focus on high‑risk cases.

Scenario 1 – Face‑Matching Verification

Business background

To prevent impersonation, a live driver photo is compared with the stored registration image using Face Recognition Technology (FRT).

Solution

Front‑end (edge) : A millisecond‑level lightweight model performs a health check on the uploaded photo and blocks images that meet any of the following conditions:

No face detected

More than one face detected

Face not centered (to avoid side‑views or distant shots)

Result: early feedback to the driver and fewer calls to downstream FRT services.

Back‑end (cloud) : Accepted images are sent to AWS Rekognition (or equivalent) to extract a high‑dimensional embedding and compute a 1:1 similarity score against the stored profile.

Face detection & feature extraction → embedding vector.

Similarity computation between live embedding and stored embedding.

Threshold decision: score ≥ pass‑threshold → verification success; score ≤ reject‑threshold → failure.

Scenario 2 – Driver Document Verification

Business background

Drivers submit various identity documents (ID, driver’s license, etc.) that differ across regions, leading to high rejection rates due to wrong document type or non‑conforming photos.

Solution

The pipeline adds OCR on both edge and cloud, then uses a large language model (LLM) on the cloud to interpret raw OCR text, classify document type, and flag anomalies.

Limitations of traditional OCR + rule‑based extraction:

Highly sensitive to layout changes.

Regex/keyword rules explode across languages.

OCR errors cause rule failures.

Inability to reason about semantics (e.g., mixed document types).

LLM‑enhanced approach:

Run OCR to obtain raw text.

Prompt LLM with the text to extract structured fields (e.g., name, number, expiry) and determine document type.

Apply business‑level validation on the extracted JSON.

This reduces manual review volume and improves accuracy across multilingual markets.

Scenario 3 – Overseas Vehicle Monthly Review

Business background

Monthly vehicle inspections are required for driver payouts. Manual review suffered from inconsistent standards and a ~40 % rejection rate, primarily because license‑plate photos were incomplete.

Solution

Edge : A custom lightweight model embedded in the driver app detects missing or poorly framed plates in real time. The model achieves >99.9 % detection accuracy while keeping the app responsive.

Cloud : A multimodal LLM processes the remaining images, extracting structured JSON (e.g., {"text_visible": true, "side_view": true}) that downstream rule engines consume. The LLM is used only for information extraction, not direct decision making, to avoid hallucinations.

Iterative prompt engineering:

Version 1 – Separate LLM call per rule (high token cost, impractical).

Version 2 – Single massive prompt (attention dilution, frequent hallucinations).

Version 3 – Group‑wise, relevance‑based prompts with structured output (stable accuracy, manageable cost).

Key engineering choices:

Parameter tuning : Temperature set to 0.1 for deterministic outputs.

Data compliance : Sensitive fields (plate numbers, faces) are anonymized on‑device or in a middle‑layer before reaching the LLM; the model only sees non‑sensitive visual cues.

Future Outlook

The architecture will be extended to additional verification scenarios, deeper model capabilities, and stronger security/compliance controls as the system scales globally.

AIOCRmultimodal LLMedge cloudDriver Verification
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.