Implementation and Practice of MRCP in Meituan Voice Interaction

This article details Meituan’s adoption of the Media Resource Control Protocol (MRCP) to standardize ASR and TTS integration, describing its architecture, key components, high‑availability deployment, and measured performance gains such as up to 55% latency reduction and a 15% increase in outbound call success rates.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Implementation and Practice of MRCP in Meituan Voice Interaction

Voice bots have become a leading AI application, typically chaining ASR, NLU, dialogue management, and TTS. To integrate these media services with telephone systems, Meituan adopts the Media Resource Control Protocol (MRCP), which standardizes control of speech resources over SIP, RTP and SDP.

What is MRCP

MRCP (Media Resource Control Protocol) defines Request, Response and Event messages for media services such as speech recognition (ASR) and synthesis (TTS). It relies on a media session created via RTP and a control session created via SIP/SDP. An example interaction for a TTS request is shown below:

MRCP/2.0 380 SPEAK 14321
Channel-Identifier: 43b9ae17@speechsynth
Content-Type: application/ssml+xml
Content-Length: 253

您好,有什么能帮助您?

The server replies with IN-PROGRESS and finally SPEAK-COMPLETE indicating successful synthesis.

Usage Scenarios

MRCP is supported by major voice platforms such as Asterisk and FreeSWITCH. A simple call‑center architecture connects the telephony gateway (FreeSWITCH) to an MRCP‑enabled speech service, allowing the caller’s audio to be streamed to ASR, processed by dialogue logic, and answered via TTS.

Meituan’s In‑house ASR/TTS

Since 2018 Meituan has built proprietary ASR and TTS engines. In telephone‑call test sets the ASR word‑accuracy reaches 94.6%, compared with ~89% for leading vendors. The TTS system supports a wide range of voice styles and handles billions of daily requests across delivery, ride‑hailing, and customer‑service scenarios.

Why MRCP

Before MRCP, each ASR/TTS vendor required a custom HTTP/RPC integration, leading to high development cost, inconsistent voice quality, and latency. MRCP provides a uniform interface so that a single client program can work with any MRCP‑compliant engine, achieving “write once, run everywhere”.

Design Goals

Expose ASR/TTS via a standard protocol compatible with industry‑leading solutions.

Decouple dialogue logic from media processing to enable horizontal scaling and easier internal adoption.

Support external commercial partners by offering a private‑deployed MRCP server.

System Architecture

The AI for Contact Center (AICC) platform sits atop Meituan’s speech‑technology stack. MRCP‑TTS and MRCP‑ASR plugins run inside a UniMRCP‑based MRCP server. SIP proxy Kamailio provides seven‑layer load balancing and session stickiness, while the internal MGW (four‑layer load balancer) offers a stable virtual IP for client access. Figure 3 in the original article compares the traditional HTTP‑based flow with the MRCP‑based flow.

Key Technical Components

MRCP‑TTS and MRCP‑ASR plugins that wrap the internal ASR/TTS engines.

Event/Interface management for channel creation, requests, and responses.

Session management, thread‑pool handling, and configuration loading.

Logging integration with Meituan’s log framework and monitoring via the Raptor platform.

Authentication through Meituan’s unified voice‑service auth service.

Deployment Scheme

High availability is achieved by:

Unified service entry via MGW providing a VIP.

Resource isolation: separate MRCP‑TTS and MRCP‑ASR services.

Cross‑region, multi‑data‑center deployment.

Kamailio hot‑standby with SIP‑level routing to keep the same server for a call’s lifetime.

Seven‑layer load balancing for SIP/MRCP traffic, inserting <Via address> and <Contact address> to preserve stickiness.

Practice and Effects

MRCP is now used in many internal scenarios:

Outbound call robots (marketing, notification) – >1 million synthesis calls per day, peak ~1 k concurrent.

Inbound call‑center robots – ~10 million synthesis calls per day, peak >1 k concurrent.

External partners (e.g., WeiHu Technology, Zhongtong Tianhong) deploy a private MRCP server capable of handling ~600 concurrent streams in an 8C 16G container.

Performance improvements include:

End‑to‑end latency reduced by ~55 % for inbound calls and ~33 % for outbound calls.

Customer dissatisfaction decreased by 0.25–3.92 percentage points; average call handling time shortened by 2.19–5.30 seconds.

Outbound call success rate increased by 15 % in an A/B test after switching to MRCP‑TTS with the “MeiFanNan” voice.

Audio quality issues such as mismatched voice timbre and high synthesis delay were eliminated, as demonstrated by before/after recordings in the article.

Conclusion

MRCP and its associated TTS/ASR services have become mature, delivering stable, low‑latency speech interaction for Meituan’s internal and external customers. Future work will extend the protocol to support voiceprint recognition (VPR) and other biometric checks, further enriching Meituan’s AI‑driven voice ecosystem.

References

Media Resource Control Protocol (2022) Wikipedia.

Shi Junbo, Zhan Shubo (2010) MRCPv2 protocol and its application in distributed voice resource solutions.

Zhu, James (2018) MRCP Overview.

UniMRCP Introduction, https://www.unimrcp.org/.

Kamailio Introduction (2022), https://www.kamailio.org/w/.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

TTSvoice interactionASRMeituanSIPTelephonyMRCP
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.