Data Replication: Fundamentals, Technologies, and Industry Trends
The article explains data replication concepts, processes, and technologies across storage hardware, operating system, and database layers, outlines synchronous, asynchronous, and hybrid methods, discusses industry applications, trends such as hardware‑software decoupling, cloud replication, and big‑data real‑time copying, and highlights challenges and future directions.
Data refers to any electronically recorded information, including numbers, text, images, and sound. Data that can be duplicated, circulated, and utilized becomes an asset, while unduplicated data may become a burden. Data replication, a key application in data circulation, exchange, sharing, protection, integration, analysis, and management, plays a crucial role in maximizing data value.
The replication process consists of three stages: data capture, data transmission, and data restoration. Capture identifies and extracts change data from production systems with fine granularity and minimal impact. Transmission involves segmenting, encrypting, and compressing data for accurate, efficient, and secure transfer. Restoration writes data to the target system, ensuring consistency and availability.
Data replication follows three core principles: compliance (ensuring data security, consistency, encryption, classification, and archiving), timeliness (real‑time or periodic replication, rapid recovery, meeting RTO/RPO requirements), and diversity (supporting multiple data formats and layers such as system, database, and storage).
Major application areas include data compliance, big‑data collection, and system migration. In compliance, replication supports disaster recovery, backup, governance, archiving, encryption, masking, database auditing, classification, and protection. Disaster recovery relies on replication to meet RPO (Recovery Point Objective) and RTO (Recovery Time Objective) metrics, influencing system design.
1. Basic Knowledge of Data Replication Data replication copies a set of data from one source to one or more targets. Based on the OSI model, replication can occur at the storage hardware layer, operating‑system layer, or database layer.
2. Replication Types Synchronous replication (full sync) requires each write operation to complete on both source and target before proceeding, minimizing data loss but impacting performance. Asynchronous replication does not wait for target completion, allowing a time lag but reducing impact on production. Semi‑synchronous replication waits for at least one replica to acknowledge receipt before returning to the client, balancing safety and latency. Serialization transmission converts objects to binary for network transfer, ensuring reversibility.
All methods aim for non‑intrusive data capture that does not disrupt production workloads.
3. Series of Replication Technologies
A. Storage‑Hardware‑Level Replication Direct mirroring between disk arrays using firmware or OS over IP or fiber, offering high performance for critical tasks but requiring homogeneous hardware and costly low‑latency links.
B. Operating‑System‑Level Replication Includes byte‑level and block‑level approaches. Byte‑level captures real‑time I/O operations, generating logs for precise replay, suitable for disaster recovery. Block‑level captures disk changes via bitmap, suitable for large‑file systems and scheduled backups.
C. Database‑Level Replication Logical replication parses redo and archive logs into SQL statements for execution on the target, enabling cross‑vendor replication, read/write splitting, and high availability.
Choosing the appropriate technology depends on environment requirements and project goals.
4. Development Trends
A. Hardware‑Software Decoupled Replication – moves away from vendor‑locked solutions, allowing replication across heterogeneous storage and databases, supporting system upgrades, tiered storage, and domestic‑made equipment.
B. Cloud‑Based Replication – adapts to virtualized resources, addressing bandwidth constraints, unstable networks, and the need for compression, resumable transfers, and data privacy (encryption, masking) in public‑cloud scenarios.
C. Real‑Time Big‑Data Platform Replication – traditional database replication cannot meet the needs of platforms like Hadoop/HDFS; emerging solutions aim to provide real‑time data flow between big‑data ecosystems and traditional databases.
5. Heterogeneous Replication Scenarios Include file‑level migration across different servers, OSes, NAS, and object storage; database‑level migration via Kafka or direct copy; whole‑machine (OS) migration combining byte‑ and block‑level techniques; and HDFS migration, which faces challenges in real‑time disaster recovery.
For more detailed information, refer to the original "China Data Replication Industry Whitepaper (2022)" and related disaster‑recovery publications.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.