Inside LeTV’s Private Cloud RDS: Architecture, Monitoring, and Backup Strategies
This article details LeTV's diverse database landscape, explains the design and implementation of its private‑cloud RDS built on Docker and Percona XtraDB Cluster, outlines monitoring solutions such as Lepus, Prometheus and Open‑Falcon, and shares practical backup approaches and operational lessons learned.
1. LeTV Database Overview
LeTV operates a heterogeneous database platform that includes MySQL, Oracle, MongoDB, Redis and distinguishes between traditional DB instances and cloud‑based RDS services. MySQL is the most widely used internally, and the sharing focuses on its operational practices.
Database product status
Multiple MySQL versions: official MySQL 5.5, MariaDB 10, Percona XtraDB Cluster (PXC) 5.6, etc.
Architectures: 1‑master‑N‑slave, 1‑master‑multiple‑layers, PXC cluster.
Hardware: SAS disks and SATA SSDs, with SATA SSDs being the primary storage medium.
Master‑Slave structures
1 master + N slaves
1 master + N slaves + MB (master‑backup)
1 master + N slaves + Relay (adds a relay node for cross‑datacenter bandwidth reduction)
Relay introduces benefits for cross‑datacenter high availability but adds architectural complexity and can become difficult to manage beyond three layers.
2. Database Monitoring
LeTV relies mainly on open‑source monitoring tools:
Lepus : a Python + PHP‑based enterprise monitoring system supporting MySQL, Oracle, MongoDB, Redis, etc.; suitable for modest DB scales and reduces development effort.
Prometheus : collects metrics via HTTP, stores them in a time‑series database, and provides a web UI, powerful query language and HTTP API.
Open‑Falcon : used for server‑level monitoring; extended with phone‑call alerts and IDC concepts.
Additional alerts via Zabbix, WeChat, email, SMS, and voice calls.
3. Database Backup Strategies
Backup is a critical focus. LeTV treats real‑time replica slaves as a form of backup and employs both physical and logical methods.
Real‑time replica slaves
Large‑storage machines host 20‑30+ MySQL instances per host.
Important services have off‑site replica slaves.
Multi‑source replication for analytics databases.
Cold backup
Physical hot backup using xtrabackup.
Full backup plus incremental backups (weekly full backup, multiple incrementals).
Mount large storage (≈30 TB) for backup data.
Scripts periodically compress and purge old backups to avoid capacity exhaustion.
4. Private‑Cloud RDS Practice
Project background : The private‑cloud RDS was driven by the rise of PaaS, containerization, cost optimization, and user‑experience demands—ultimately, the team was simply “busy”.
RDS definition : An online relational database service built on Docker containers and Mcluster (a private‑cloud MySQL cluster based on Percona XtraDB Cluster).
Mcluster architecture
Multi‑point read/write: any node can accept writes.
Parallel replication with multiple threads, transaction‑level parallelism.
Strong consistency across nodes.
High availability – single‑node failures do not affect the cluster.
Near‑full compatibility with standard MySQL.
RDS overall architecture
Docker containers host the database instances.
Database layer (MySQL, PostgreSQL, etc.).
Matrix component for front‑end creation, management, monitoring, and resource scheduling.
BeeHive (similar to Kubernetes) for resource orchestration.
Data Analysis for log and user‑behavior analytics.
Users request RDS via the Matrix UI; BeeHive selects three suitable machines to deploy an Mcluster DB and a VIP container for high‑availability access.
Mcluster‑Manager runs inside a Docker container and provides a Tornado‑based web API to start/stop, initialize, monitor, backup, and manage MySQL instances without direct external access.
5. Operational Insights and Pitfalls
Strict database standards and processes are essential for automation.
Private‑cloud RDS significantly reduces hardware and labor costs (20‑30 MySQL instances per machine).
Design differences from public‑cloud RDS: internal users can trigger large‑table modifications that cause failures, requiring DBA intervention.
High write concurrency in multi‑master setups can lead to deadlocks.
Component inter‑dependencies increase system complexity; future versions aim to simplify.
Recovery from host failures is manual and labor‑intensive; scripts exist but are not fully automated.
6. Q&A Highlights
Q1: Are backups mainly physical or logical?
A1: Physical backups using xtrabackup are predominant.
Q2: Is incremental backup restoration cumbersome?
A2: LeTV uses incremental backups based on full backups; restoration is straightforward.
Q3: Does the PXC multi‑master architecture use a single write node?
A3: Writes are distributed across multiple nodes; however, to avoid lock issues, a primary‑write‑node plus read‑only nodes pattern is often used.
Q4: What happens if the write node fails?
A4: A VIP (via load balancer) provides automatic failover.
Q5: Are cross‑region replications done over public internet or dedicated lines?
A5: Dedicated lines are used; LeTV operates over ten+ data centers nationwide.
Q6: Any licensing concerns with LeTV Cloud RDS?
A6: No; the stack is built on open‑source Docker and MySQL.
Q7: Will LeTV Cloud RDS be open‑sourced or offered as a public service?
A7: Currently internal only, but future public‑cloud offerings are not ruled out.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
