iQIYI Microservice Standard Architecture: Design Principles, Components, and Practices
iQIYI’s middleware team introduced a unified microservice standard architecture—combining a single SDK, centralized infrastructure (Nacos registry, Kong gateway, Apollo config, Prometheus‑SkyWalking monitoring, ChaosBlade), the QDAS platform, and extensible open‑source practices—to eliminate redundant builds, ensure high availability, streamline governance, and pave the way for cloud‑native service‑mesh evolution.
iQIYI's technology product team serves hundreds of millions of video users. To keep up with rapid business iteration and massive request volume, many teams independently migrated their systems to a microservice architecture.
During this migration, teams adopted various open‑source frameworks such as Apache Dubbo and Spring Cloud, as well as some internally developed frameworks. They also built their own monitoring systems and other infrastructure.
As the practice deepened, several problems emerged:
Redundant construction of basic infrastructure, leading to resource waste and stability issues.
Lack of unified technology stack and SDK, making best‑practice propagation difficult.
Non‑uniform architectures caused custom gateways to be added, lengthening service chains and degrading troubleshooting efficiency and latency.
To address these issues, iQIYI's middleware team collected requirements from business units and released a Microservice Standard Architecture based on the following principles:
Architecture Unification : Consolidate technology choices so that each domain uses at most one framework (e.g., Dubbo or Spring Cloud).
Extensibility : Provide extensible SDKs; when open‑source versions cannot meet internal needs, maintain a unified customized version.
High Availability : Centralize common infrastructure (service registry, monitoring, etc.) on a public platform, conduct architecture reviews, and evaluate service maturity with the SMMI model.
Technology Evolution : Prefer actively maintained open‑source projects (e.g., Sentinel over deprecated Hystrix) and define a standardized process for adopting new technologies.
Internal Open‑Source : Encourage internal teams to contribute to and maintain shared services, fostering both business relevance and industry leadership.
The standard architecture consists of:
Unified Microservice SDK : Core development frameworks (Dubbo/Spring Cloud) and resilience components (Sentinel).
Unified Infrastructure :
Service Registry: Nacos/Consul.
API Gateway: Kong‑based gateway with authentication and rate‑limiting.
Configuration Center: Apollo‑based platform.
Metrics Monitoring: Prometheus cluster.
Tracing: SkyWalking‑based tracing platform.
Chaos Engineering: ChaosBlade‑based fault‑injection platform.
Unified Microservice Platform : QDAS (QIYI Distributed Application Service) for lifecycle management, service governance, and marketplace.
Key ecosystem details :
SDK Customization : The Dubbo SDK was extended for infrastructure adaptation, availability (health‑check isolation, region‑aware routing), security (service‑to‑service authentication), and protobuf serialization.
Registry Evolution : Migrated from heterogeneous registries (ZooKeeper, Eureka, Consul) to Nacos for its horizontal scalability, cloud‑native support, and Nacos‑Sync data‑migration tool. High‑availability deployment spans multiple availability zones with VIPs and multi‑region MySQL.
Migration Strategy : Deploy Nacos‑Sync for incremental data sync, upgrade consumers first, then providers, and finally decommission old registries.
Monitoring System : Combines three layers – metrics (QPS, latency, error rate, JVM, host resources), logs, and tracing. Metrics are collected via a customized SkyWalking agent and scraped by Prometheus (with a Nacos adapter). Dashboards are built with Grafana and integrated into an internal full‑link monitoring platform.
Tracing follows the Dapper model: agents emit trace data to Kafka, which is processed and stored in Elasticsearch/HBase (raw data), Druid (time‑series), and a graph database (topology). Features include dependency analysis, dual‑view metrics, exception analysis, and log correlation.
Circuit‑Breaking & Rate‑Limiting : Sentinel is used, with extensions for complex parameter‑based limiting (e.g., limiting by an object’s id field). Sentinel rules are dynamically pushed via the internal configuration center, enabling hot‑updates without restarts. Sentinel dashboards are hosted on the QDAS platform using Kubernetes.
API Gateway : Built on Kong, offering authentication, rate‑limiting, and access control without code changes or manual ticketing.
QDAS Platform : Provides a one‑stop solution for application information, traditional service governance (instance management, Grafana dashboards, Sentinel dashboard), lifecycle management across containers and VMs, service marketplace, and Swagger‑based contract management.
Chaos Engineering : An internal platform (based on ChaosBlade) enables fault injection for servers, containers, databases, middleware, networks, and Kubernetes clusters, with real‑time monitoring, logging, and alerting, and automatically generates post‑exercise reports.
Future Plans focus on cloud‑native evolution, introducing service mesh with smooth migration paths, extending QDAS to support both service mesh and traditional microservices, and providing developer tooling such as project scaffolding and online debugging.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
