Artificial Intelligence 16 min read

How DiDi’s OpenClaw Skill Automates Ride‑Hailing: Design, Challenges & Lessons

The article details the creation of the didi-ride-skill for OpenClaw, explaining how a single voice command triggers a full ride‑hailing workflow, the underlying MCP toolset, engineering trade‑offs such as file splitting, attention handling, cron isolation, key management, testing strategies, and future roadmap.

Didi Tech

Apr 9, 2026

How DiDi’s OpenClaw Skill Automates Ride‑Hailing: Design, Challenges & Lessons

Background and Goal

DiDi released an OpenClaw‑specific ride‑hailing Skill called didi-ride-skill (GitHub: https://github.com/didi/didi-ride-skill). The Skill enables a user to say “help me book a car to the airport” and have the AI automatically perform address parsing, price estimation, user confirmation, order creation, and status tracking without opening the DiDi app.

Why a Skill?

DiDi’s MCP (Model Context Protocol) already offers 13 tool capabilities covering address resolution, pricing, order creation, and driver tracking. However, ordinary users cannot manually compose JSON‑RPC calls, creating a usability gap that the Skill fills by orchestrating the entire linear ride‑hailing flow.

Core Capabilities

The Skill supports three main scenarios:

Scenario 1 – First‑time ride booking: After installing the Skill and configuring the API key, a single utterance triggers the full end‑to‑end process.

Scenario 2 – Scheduled rides: Using OpenClaw’s cron mechanism, the AI creates a timed task that runs in an isolated session, automatically invoking the ride‑hailing flow at the specified time.

Scenario 3 – Reasoning‑enabled itinerary understanding: The AI can infer missing start points, calculate actual departure times from a landing time, and generate a coherent travel plan.

Underlying MCP Toolset

The Skill is built on DiDi’s MCP services, which provide standardized tool‑calling interfaces. The 13 tools are split into two categories:

Ride‑hailing tools (6): price estimation, order creation, order query, driver location, etc.

Map services (7): POI search, driving, public‑transport, walking, and cycling route planning.

The Skill’s responsibility is to invoke these tools in the correct order with proper parameters.

Stability Design – Key Engineering Decisions

File Splitting & Attention Distribution : The initial monolithic SKILL.md exceeded 500 lines, causing latency when the AI read sub‑files and leading to attention‑weight issues (the “Lost in the Middle” problem). The solution was to keep a concise main file and move detailed logic to separate sub‑files, reducing read‑time overhead and keeping the AI’s focus on critical steps.

Restatement (Explicit Constraint Re‑statement) : Because LLMs tend to forget middle sections, the team repeats essential constraints right before critical nodes. This mitigates attention drift and ensures stable execution.

Cron Isolated Mode – Cost and Trade‑offs

Cron tasks can run in main (shared context) or isolated (fresh context) mode. isolated is required to avoid context clashes in long‑running tasks, but each trigger must rebuild the entire Skill context, leading to >1 minute startup latency. The team compromised by limiting the post‑order check to a single 5‑minute callback instead of high‑frequency polling.

Key Management Challenge

Initially the MCP API key was stored in the Skill directory. In isolated sessions the key was not inherited, causing silent failures. The final solution stores the key in openclaw.json and injects it as the environment variable DIDI_MCP_KEY via the command:

openclaw config set 'skills.entries.didi-ride-skill.apiKey' 'YOUR_KEY'

This leverages OpenClaw’s automatic injection for isolated sessions.

Platform Evolution Impact

OpenClaw’s rapid iteration introduced a pre‑hook permission check (released late 2026) that blocks terminal commands unless manually approved. Since the Skill relies heavily on such commands, the update broke the entire ride‑hailing flow, illustrating the risk of platform‑level changes on Skill stability.

Testing Framework

Because the Skill’s output is nondeterministic, traditional input‑output assertions are insufficient. The testing pipeline consists of:

Test case design: 24 cases across six scenarios, each specifying expected tool calls and risk markers.

Execution engine: The openclaw agent CLI drives conversations and records full transcripts (JSONL) for precise tool‑call extraction.

Evaluation layer: Automated checks for correct tool usage and risk flags, plus manual review for LLM‑generated quality and cron‑related side effects.

Two approaches were explored: IM‑based message sending (unstable) and direct CLI driving (current preferred method). The CLI approach provides synchronous text output but cannot capture image messages or cron push notifications.

Future Work

Reduce cron isolated session startup cost.

Adapt the Skill to non‑OpenClaw agent platforms.

Move testing from semi‑automatic to fully automatic, including cron verification.

Implement real‑time trip sharing and driver‑location push after order creation.

Conclusion

Developing an AI Skill differs from traditional software: the “code” is natural language, and the runtime is a probabilistic model. Success requires iterative prompt refinement, careful file organization, explicit constraint restatement, and robust testing to handle both model variability and platform changes.