How iQIYI Scaled Its Microservices with Dubbo: Architecture, Extensions, and Future Plans

This article details iQIYI's adoption of Apache Dubbo for microservice architecture, covering its history, SDK extensions such as health‑check isolation, region‑aware routing, authentication, protobuf support, the evolution of their service registry to Nacos, monitoring built on Prometheus and Sentinel, and outlines future cloud‑native and service‑mesh initiatives.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How iQIYI Scaled Its Microservices with Dubbo: Architecture, Extensions, and Future Plans

1. Apache Dubbo Overview

Apache Dubbo is an open‑source high‑performance RPC framework from Alibaba that also provides built‑in microservice governance features such as service registration, discovery, and routing. Since its revival in 2017, the Dubbo ecosystem has grown to include Nacos, Sentinel, and language‑specific implementations like dubbo‑go for Go, Python, and Node.js.

iQIYI introduced Dubbo in June 2019, integrating it with internal infrastructure (registry, monitoring, container platform) and released the first internal version in August 2019. Rather than maintaining a fork, iQIYI leveraged Dubbo’s extension mechanism to add new features without blocking community upgrades.

2. Dubbo SDK Extensions at iQIYI

Infrastructure Adaptation : integration with internal registry, monitoring, and container platforms.

Availability Enhancements : non‑healthy instance isolation and region‑aware routing.

Security Enhancements : service‑to‑service authentication using digital signatures and AK/SK.

Serialization : added protobuf support.

2.1 Non‑Healthy Instance Isolation

Dubbo’s default random load‑balancing can still route requests to providers that are unhealthy (e.g., disk full) because they remain registered. iQIYI introduced a client‑side health‑check that tracks success rates, isolates unhealthy providers for a configurable period, and exponentially increases isolation time for repeatedly failing instances. A fallback mechanism disables isolation when the proportion of unhealthy providers exceeds a threshold.

2.2 Region‑Aware Routing

To reduce latency across iQIYI’s multi‑region data centers, providers publish their region information in the service URL. Consumers compare this with their own region and preferentially select nearby providers. If local healthy providers fall below a threshold, the routing rule is ignored and normal load‑balancing resumes, providing automatic failover.

2.3 Authentication Mechanism

For services requiring restricted access, iQIYI built a signature‑based authentication system using AK/SK. Providers enable authentication per service, consumers obtain AK/SK via an authorization service, and each request includes a timestamp, AK, and a digital signature of the payload. Providers verify the signature to ensure authenticity and integrity.

3. Microservice Ecosystem Construction

3.1 Registry Evolution

Initially iQIYI used ZooKeeper, but its lack of horizontal scalability and susceptibility to network partitions prompted a migration to Nacos. Nacos offers high performance, horizontal scaling, cloud‑native compatibility (including Istio), and a Nacos‑Sync component for seamless data migration.

The migration steps are:

Deploy Nacos‑Sync to replicate data from ZooKeeper to Nacos.

Upgrade consumers to discover services from Nacos.

Upgrade providers to register with Nacos.

Decommission Nacos‑Sync and the old registry.

3.2 Monitoring System

The monitoring stack consists of three layers:

Metric Monitoring : QPS, latency, error rate, JVM metrics, and infrastructure metrics collected via Prometheus. A custom Nacos adapter enables Prometheus to discover services registered in Nacos.

Log Monitoring : error log counting and AI‑assisted pattern analysis.

Trace Monitoring : distributed tracing using agents that emit call‑chain data to Kafka, stored in Elasticsearch/HBase for raw traces, Druid for time‑series metrics, and a graph database for topology.

Grafana dashboards visualize metrics, while Alertmanager handles alerts routed to the internal monitoring platform.

3.3 Sentinel for Fault Tolerance

Sentinel provides circuit breaking and rate limiting. iQIYI extended Sentinel to support complex parameter‑based limiting (e.g., limiting based on an object’s id field) by exposing an abstraction for custom resource extraction. Configuration changes made in the Sentinel dashboard are persisted to the internal config center, allowing dynamic rule updates without service restarts. A hosted Sentinel dashboard is also offered in the internal management platform.

4. Current Status and Open‑Source Contributions

Within a year, iQIYI deployed over 100 services with more than 5,000 instances. The team contributed ~30 patches to the Dubbo project, including the authentication mechanism that became a feature in Dubbo 2.7.6, and one member became a Dubbo committer.

5. Future Plans

Integrate Dubbo with service mesh technologies to enable smooth cloud‑native migration.

Develop a unified control plane that works for both traditional microservices and service mesh environments.

Provide project scaffolding and online debugging tools to improve developer productivity and simplify production issue diagnosis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativeMicroservicesservice-mesh
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.