Data Replication: Fundamentals, Technologies, and Future Trends
This article explains the concept of data replication, its three-stage process, key principles of compliance, timeliness, and diversity, various replication methods, layered technologies across storage, operating system, and database levels, emerging cloud and big‑data solutions, and heterogeneous use‑case scenarios.
Data refers to any electronically recorded information such as numbers, text, images, or sound; when data can be copied, circulated, and utilized, it becomes a valuable asset rather than a burden.
Source: China Data Replication Industry White Paper (2022)
Data replication process includes three stages: data capture, data transmission, and data restoration. Capture identifies and extracts change data from production systems with fine granularity and minimal impact; transmission splits, encrypts, and compresses data for accurate, efficient, and secure transfer; restoration writes data to the target while ensuring consistency and availability.
Three core principles of data replication are compliance (security, consistency, encryption, classification, etc.), timeliness (real‑time or periodic replication, fast recovery, RTO compliance), and diversity (support for multiple data formats and layers such as system, database, and storage).
Main application areas include data compliance, big‑data acquisition, and system migration. Compliance scenarios cover disaster recovery, backup, governance, archiving, encryption, masking, database audit, classification, and protection levels. Disaster recovery performance is measured by RPO (Recovery Point Objective) and RTO (Recovery Time Objective), which depend heavily on replication capabilities.
1. Basic knowledge of data replication – Replication copies data from one source to one or more targets. According to the OSI model, it can occur at the storage hardware layer, operating‑system layer, or database layer.
Synchronous replication : also called full‑sync replication; each I/O write must complete on both source and target before proceeding, resulting in minimal data loss but higher impact on production performance unless the target is geographically close.
Asynchronous replication : the next I/O write proceeds without waiting for data to reach the target, causing a time lag but having little impact on production systems.
Semi‑synchronous replication : the primary waits until at least one replica writes to its relay log before responding, improving safety over async replication while introducing a small TCP round‑trip delay.
Serialized transmission replication : because network transmission requires binary data, objects are serialized into a reversible binary format before being sent.
All methods aim for non‑intrusive data capture that does not affect production workloads.
2. Series of data‑replication technologies
A. Storage‑hardware‑layer replication uses direct mirroring between disk arrays via firmware or OS, over IP or fiber, in synchronous or asynchronous mode. Advantages: no server CPU overhead, suitable for mission‑critical and high‑end transaction scenarios. Disadvantages: limited to homogeneous storage, requires low‑latency, high‑bandwidth links, and is costly for remote replication.
B. Operating‑system‑layer replication includes byte‑level and block‑level techniques. Byte‑level captures file‑system I/O operations in real time, generating serialized logs that are replayed on the target, offering fine granularity and low resource usage. Block‑level captures changes at the disk‑block level, suitable for large‑file or non‑standard file systems, but with coarser granularity.
C. Database‑layer replication typically uses logical replication: redo and archive logs are parsed into SQL statements and replayed on the target, enabling cross‑vendor replication, read/write splitting, and high‑availability scenarios.
3. Development trends of data‑replication technology
A. Soft‑hardware decoupled replication breaks the traditional binding between replication tools and specific storage or database products, allowing cross‑vendor data movement and supporting system upgrades, tiered storage, and domestic‑technology adoption.
B. Cloud‑based replication leverages the high‑efficiency, low‑maintenance, multi‑center nature of cloud computing. Cloud replication must handle narrower bandwidth, unstable links, and stronger compression and resume capabilities, while also addressing data privacy, encryption, and masking, especially in public‑cloud environments.
C. Real‑time replication for big‑data platforms such as Hadoop/HDFS requires specialized mechanisms because traditional database replication cannot directly handle distributed file systems; emerging solutions aim to enable real‑time data flow between traditional databases and big‑data ecosystems.
4. Heterogeneous replication scenarios include file‑level migration across different servers, OSes, NAS, or object storage; database‑level migration via Kafka or direct copy; whole‑machine migration combining byte‑ and block‑level techniques; and HDFS migration, which remains challenging for real‑time disaster recovery.
For more detailed information, refer to the original white paper and the linked technical resources.
Promotional note: The full set of technical materials can be obtained from the provided download links and may be bundled for purchase.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.