How Meituan’s Shepherd API Gateway Achieves High‑Performance, Cloud‑Native Scalability
This article examines the background, design principles, high‑availability techniques, extensibility, and future roadmap of Meituan’s internally built Shepherd API gateway, showing how it streamlines microservice API management, improves developer productivity, and evolves toward a cloud‑native architecture.
Background
API gateway definition
An API gateway is a traffic entry point between external clients and internal micro‑services. It performs protocol conversion, authentication, rate limiting, parameter validation, monitoring and other cross‑cutting functions.
Motivation for Shepherd
Before Shepherd, each business line had to maintain a separate web application to expose HTTP APIs, resulting in low development efficiency and poor resource utilization. Shepherd provides a unified, high‑performance, highly available gateway that can be configured without writing code.
Architecture
Overall structure
The system consists of three modules:
Control plane – management console and monitoring center for API lifecycle and metric collection.
Configuration center – stores DSL‑based API definitions in the Lion configuration service and pushes hot‑reloaded configs to the data plane.
Data plane – receives requests after Nginx load balancing, executes built‑in and custom components, and forwards calls to backend RPC/HTTP/function services.
Control plane
The management console handles API creation, DSL generation, versioning, approval workflow and configuration distribution. The monitoring center aggregates request metrics and triggers alerts.
Configuration center
API definitions are written in a domain‑specific language (DSL) that describes:
Name, Group – identifier and logical grouping.
Request – domain, path, query and header parameters.
Response – result assembly, error handling, headers, cookies.
Filters, FilterConfigs – functional components (e.g., auth, rate‑limit) and their parameters.
Invokers – backend invocation rules for RPC, HTTP or function services.
Data plane
Routing is performed using two structures for low‑latency lookup:
A direct MAP for static paths (key = full host+path, value = API config).
A prefix‑tree for paths containing variables, enabling fast prefix matching without regular expressions.
After routing, a chain of functional components is executed, including tracing, real‑time monitoring, access logging, parameter validation, authentication, rate limiting, circuit breaking and gray release.
Protocol conversion uses JsonPath expressions to map HTTP request fields to backend service parameters and vice‑versa.
High‑availability design
Asynchronous processing
All request handling is fully asynchronous. Requests arrive on Jetty (later Netty) IO threads, are handed off to a business thread pool, and backend calls use asynchronous RPC/HTTP clients. This eliminates thread‑blocking bottlenecks. After tuning Nginx long‑connections and switching from Jetty to Netty, end‑to‑end QPS increased from ~2,000 to >15,000.
Service isolation
Isolation is achieved on two levels:
Cluster isolation – separate clusters per business line, optionally dedicated deployments for critical services.
Request‑level thread‑pool isolation – fast and slow pools; APIs whose processing time exceeds a configurable threshold are routed to the slow pool to protect the fast pool.
Stability mechanisms
Shepherd provides a set of safeguards:
Multi‑dimensional flow control (UUID, app, IP, cluster).
Request caching for idempotent, read‑heavy APIs.
Per‑API timeout management with fast‑fail handling.
Circuit‑breaker that trips on configurable failure rates and returns fallback values.
Gray release support
Traffic can be split at the API level, downstream service level, or both. Strategies include percentage‑based rollout and condition‑based routing.
Usability features
Automatic DSL generation
Developers provide service interface information (AppKey, service name, method). Shepherd fetches the latest JSON Schema from the service framework, generates mock data, and produces a DSL with parameter mappings, eliminating manual DSL authoring.
API creation acceleration
Quick‑create wizards support four API types (RPC, HTTP, SSO callback, Nest). Batch operations allow bulk updates of common configurations (filters, error codes, CORS). Import/export tools enable moving API definitions between environments.
Extensibility
Custom components
Through a plugin model, developers can load custom components (e.g., signature verification, custom result handling). The component implements getName() and invoke() methods and is registered via SPI.
Service orchestration
Shepherd integrates with the internal Pirate middleware to orchestrate multiple backend calls. Shepherd sends orchestration configuration to Pirate via RPC; Pirate executes the composed calls and returns aggregated results, keeping the gateway lightweight.
Future roadmap
Cloud‑native migration
Shepherd will be migrated to Meituan’s Serverless platform Nest, and core functionality will be extracted into an SDK. This reduces the WAR package size, improves security, and enables elastic scaling.
Static site hosting
Shepherd will offer a high‑availability static‑site hosting service with custom domain support, authentication, CI/CD integration and scalable storage.
Component marketplace
A marketplace will allow teams to publish reusable custom components, fostering an ecosystem and avoiding duplicate development.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
