Cloud Native 16 min read

Momo's Service Mesh Architecture: Evolution, Pain Points, and Implementation

This article recounts Momo's journey from its early micro‑service framework MOA to the adoption of a self‑developed Service Mesh, detailing architectural evolution, identified governance challenges, evaluation phases, and the design of data‑plane and control‑plane components for a cloud‑native environment.

Java Architect Essentials
Java Architect Essentials
Java Architect Essentials
Momo's Service Mesh Architecture: Evolution, Pain Points, and Implementation

Since Service Mesh entered the public eye in 2016, it has become a widely recognized next‑generation micro‑service architecture and a key CNCF technology for building fault‑tolerant, manageable, and observable cloud‑native applications.

At the end of 2019, after years of micro‑service practice and analysis of existing architectural pain points, Momo launched the Service Mesh project MOA Mesh. The article describes the evolution of Momo's micro‑service architecture and shares lessons for engineers interested in Service Mesh adoption.

1. Momo Micro‑service Architecture Evolution

In 2013, Momo's architecture team built and promoted a micro‑service framework called MOA (Momo Service Oriented Architecture) because mature open‑source solutions were lacking at the time. MOA was applied to multiple business lines such as nearby dynamics, live streaming, IM, and short video.

As business grew, MOA proved its capabilities: over 2,000 services, 20,000 registered instances, 350 billion daily calls, and peak QPS reaching 6 million.

To support large‑scale services, MOA had to cooperate with other infrastructure products such as monitoring, async messaging, configuration, logging, distributed tracing, and load testing platforms.

Reviewing Momo's micro‑service timeline: monolithic applications in 2011, system splitting in 2012, MOA framework in 2013, containerization in 2017, and Service Mesh rollout in 2020.

Multi‑language Support

Momo's backend uses a PHP API plus Java services to combine PHP's development speed with Java's performance. MOA's core component is a Java SDK, and mechanisms were introduced to simplify SDK development for other languages, including a Redis GET cross‑language protocol, a Lookup service, and a sidecar‑deployed MOA Proxy.

Reusable Redis GET cross‑language transmission protocol.

Lookup service with domain‑based address queries.

MOA Proxy (Java sidecar) acting as an inbound traffic agent, supporting non‑Java services.

Overall Architecture

After integrating infrastructure products and multi‑language support, MOA 1.0 achieved comprehensive service governance capabilities. The architecture includes a centralized health‑checking MOA Watcher, automatic fault‑tolerance in the client, and a parallel‑call proxy for PHP.

2. Architecture Pain Points and Service Mesh

Pain Point Analysis

As business scale expands, several unresolved issues persist, collectively described as “lagging service governance capabilities.”

Non‑Java Application Lag

Although mechanisms were built to simplify non‑Java SDK development, the middleware team’s Java focus caused other language SDKs to lag behind, lacking key features and slow to receive fixes.

Java Application Lag

Java SDKs are feature‑rich, but with thousands of Java services, upgrading the SDK becomes extremely difficult, consuming significant time from both business and infrastructure teams.

Impact of Lag

Lagging governance hampers stability, can cause failures, and becomes a blocker during major architectural changes, forcing sub‑optimal solutions.

Introducing Service Mesh

Service Mesh separates core service‑framework logic into a local proxy process, solving Java SDK upgrade and multi‑language SDK duplication issues. Before adoption, Momo evaluated maturity, stability impact, alternative solutions, cost, and expected value.

Is it mature enough without affecting stability?

Are there alternative solutions achieving the same goals?

Is the cost acceptable?

Will it truly solve the problems and deliver value?

After careful consideration, Momo decided to adopt Service Mesh.

Observation Phase

Momo closely followed Service Mesh developments while awaiting internal infrastructure maturity such as containerization and logging agents.

Experiment Phase

Attempts to solve problems with other approaches (e.g., traffic routing agents) revealed complex interactions and insufficient decoupling.

Evaluation Phase

A thorough assessment covered latency overhead, server cost, and manpower investment to ensure impacts stayed within acceptable bounds.

Launch Phase

The primary goal was to improve developer efficiency and reduce team burden by accelerating service‑governance iteration and freeing teams from SDK upgrade work.

3. Momo Service Mesh Practice

Industry Solution Survey

Istio : The most influential open‑source solution, offering a full data‑plane proxy and control‑plane components, but still rapidly evolving with performance bottlenecks and tight coupling to Kubernetes.

Ant Financial : Developed SOFA MOSN (Go) data‑plane and SOFA Mesh control‑plane, achieving smooth upgrades and integrating with Istio.

Meituan : Extended Envoy (Istio data‑plane) to fit internal systems and leveraged its OCTO service framework for control‑plane capabilities.

Momo Solution Selection

Considering compatibility, current pain points (SDK upgrade and multi‑language support), and technical reserves (preference for Java), Momo chose a fully self‑developed solution: a Java‑based data‑plane Agent and a control‑plane gradually aligning with Istio standards.

Overall Architecture

The focus is on the data‑plane Agent. The control‑plane adds a lightweight Pilot Proxy to decouple the Agent from internal systems, using Istio’s standard protocols (xDS, MCP) for future community alignment.

Data‑Plane Design

The data‑plane directly interacts with business processes, emphasizing smooth upgrades, Agent resilience, and proxy performance.

Smooth Upgrade

The upgrade must be transparent to developers: no process restarts and unchanged traffic. Momo adopted an FD‑migration scheme using Linux sendmsg/recvmsg to transfer file descriptors between old and new processes, implemented via Netty’s JNI‑wrapped APIs.

Agent Resilience

Agent failures are handled by leveraging existing health‑check mechanisms to detach faulty instances, or by switching traffic to other Agents within the same application, ensuring minimal disruption.

Proxy Performance

Because the original MOA protocol lacks extensible headers, a new transport protocol was designed to avoid full request body decoding and to enable connection reuse for Redis GET, improving proxy throughput.

4. Outlook

Momo's Service Mesh practice is still in its early stages. The data‑plane is completed and undergoing gradual gray‑release. Future work includes building the control‑plane, enhancing data‑plane features, and addressing large‑scale deployment challenges, with plans to share experiences widely.

JavaCloud Nativearchitecturemicroservicesservice meshData PlaneMOMO
Java Architect Essentials
Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.