How iQIYI Selects, Optimizes, and Manages Its Diverse Database Stack
This article explains iQIYI's multi‑dimensional approach to database selection, details the practical use and optimization of MySQL, TiDB, Redis, Couchbase, and the internally built HiKV, and offers concrete recommendations for choosing the right database in different scenarios.
Database Selection Dimensions
When evaluating a database, iQIYI first identifies the stakeholder (procurement, DBA, or application developer) because each group prioritises different factors. The core criteria are:
Operational cost : monitoring, backup/recovery, upgrade/migration effort, community stability, tuning ease, and troubleshooting simplicity.
Stability : multi‑replica support, high availability, multi‑write‑multi‑active capabilities.
Performance : latency, QPS, and advanced tiered‑storage features.
Scalability : ability to scale horizontally or vertically as business demands evolve.
Security : audit compliance, resistance to SQL injection and data leakage.
Developer friendliness : schema evolution, interface convenience, and integration with existing tooling.
iQIYI Database Portfolio
MySQL – core relational store for transactional workloads.
TiDB – HTAP database (covered in a separate session).
Redis – key‑value store and cache.
Couchbase – high‑performance KV system.
Other systems: MongoDB, graph databases, self‑developed KV store HiKV, Hive, Impala, etc.
MySQL Architecture and Optimisations
iQIYI runs MySQL in a master‑slave + semi‑sync topology with weekly full backups and daily incremental backups.
Backup/restore acceleration : Optimised xtrabackup by reducing redundant disk writes and parallelising I/O. Full‑database restore time dropped from ~5 hours to ~100 minutes; single‑table restores are now supported.
DDL/DML tooling : Integrated gh‑ost and oak‑online‑alter‑table with real‑time replication‑lag monitoring. When lag exceeds a threshold, the tool is automatically paused.
High availability : Adopted an MHA‑style agent architecture. Each physical node runs an agent that heartbeats to a master; failover can be triggered within the same room, across rooms, or across regions. A Raft‑based master group provides multi‑region resilience.
Scalability : Used ShardingSphere SDK and various proxy solutions. Complexity and limited SQL support led some workloads to migrate to TiDB.
Audit pipeline : Full SQL statements are streamed to Kafka, then consumed by ClickHouse for statistical analysis. A custom plugin buffers metrics in a two‑level RingBuffer, keeping overhead below 2%.
Security : Real‑time detection of SQL‑injection and data‑exfiltration patterns, with alerting.
Tiered storage : Hot data stays in MySQL; older data is automatically migrated to TiDB or TokuDB, reducing storage cost while preserving query latency.
Redis Architecture and Optimisations
Redis is deployed in master‑slave mode with a customised Sentinel deployment per data centre to avoid split‑brain scenarios.
Real‑time backup : A background process mirrors master data to a ScyllaDB KV store, enabling point‑in‑time recovery.
Latency mitigation : Sentinel parameters are tuned and buffer sizes are auto‑adjusted during master‑slave rebuilds to handle network jitter.
Redis Name Service (RNS) : Extracts topology from Sentinel and provides clients with the current master IP, bypassing DNS TTL delays.
Jedis optimisation : Reconnection is limited to the affected shard instead of the whole cluster, preserving overall QPS during failover.
Proxy for async replication : Writes are also sent to Kafka; downstream consumers replicate data across clusters.
Couchbase Usage
Couchbase is used as a high‑performance KV store with two bucket types:
Memcached bucket – pure in‑memory cache, no persistence.
Couchbase bucket – persistent JSON storage with configurable replicas and automatic rebalancing.
Clients hash keys to vBuckets; the cluster map updates dynamically during data migration, making failover transparent. iQIYI has operated Couchbase from version 1.8 to 5.0 (research on 6.0), encountering issues such as NTP mis‑configuration and high XDCR concurrency, which were mitigated through operational tooling.
HiKV – Self‑Developed KV Store
HiKV is built on ScyllaDB and targets workloads where Couchbase cost is prohibitive.
Storage engine : Keys reside in memory; values are stored on SSDs. A fixed‑length memory distributor and a red‑black‑tree index keep per‑record index size to 64 bytes.
Stability mechanisms : Checkpointing, rate‑limiting, and circuit‑breaker logic ensure consistent performance under load.
Adoption : Replaced roughly 30 % of Couchbase deployments, reducing storage expenses while maintaining low latency.
Database Operations Management Evolution
iQIYI’s operational workflow has progressed through four stages:
DBA‑written scripts for manual management.
Self‑service private‑cloud portal exposing status dashboards and allowing developers to provision clusters.
Web‑based UI enabling ~90 % of routine operations with a click.
One‑click diagnostic utilities that codify DBA expertise for developers.
Additional practices include proactive alerting, an intelligent chatbot for troubleshooting, and resource‑aware scheduling based on instance tagging.
Practical Database Selection Guidance
Based on experience, iQIYI recommends a decision‑tree approach that considers:
Data volume and scalability requirements.
Backup strategy (cold backups, point‑in‑time recovery) and required storage engines (e.g., TokuDB, TiDB).
Workload type: OLTP (high QPS, low latency, transactional) vs. OLAP (large scans, longer latency).
NoSQL patterns: master‑slave, client‑side sharding, Redis Cluster, Couchbase, or HiKV, chosen according to data size, latency tolerance, and operational complexity.
Key takeaways:
Validate true business needs before pushing requirements onto the database layer.
Avoid blind adoption of popular technologies; choose the simplest solution that meets the requirements.
Embrace open‑source solutions when appropriate, and only develop custom stores when clear cost or performance benefits are demonstrated.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
