Incremental File Synchronization Using Optimized rsync and zsync Techniques
The article proposes an incremental synchronization scheme for Meituan’s Rhino Cloud Disk that shifts rsync‑style signature and delta generation to the client, leverages zsync‑like HTTP range fetching, and discusses challenges such as block fragmentation, low locality in binary files, server‑side merge overhead, and future optimizations like variable‑size chunking and Rabin fingerprint‑based content‑defined chunking.
Background
Meituan’s internal file collaboration platform, Rhino Cloud Disk, relies heavily on efficient file synchronization. The goal is to accelerate content transfer (upload/download) using incremental techniques.
rsync Incremental Transfer Algorithm
First published in 1996 by Andrew Tridgell and Paul Mackerras, rsync detects differences by dividing the source file into fixed‑size blocks, computing a weak Adler‑32 checksum and a strong MD5 hash for each block. The destination file is scanned with a sliding window of the same length; matching weak and strong checksums identify identical blocks, while mismatches form delta blocks.
The algorithm’s two main features are the combination of fixed‑size block checksums with a sliding‑window scan, and the use of both weak and strong digests to speed up comparison.
rsync Tool Workflow
Typical usage on UNIX‑like systems involves three steps between host A (new file) and host B (old file):
B generates a signature (sign) file from the old file and sends it to A.
A compares its new file with the sign file, creates a delta file containing block indices and new data, and sends the delta to B.
B applies the delta to the old file to reconstruct the new file.
This symmetric process can overload a server when many clients request synchronization.
zsync Tool Workflow
zsync adapts rsync for HTTP‑based distribution of large, rarely‑changed files (e.g., Ubuntu ISO images). The server provides a sign file; clients with an old version download the sign, compute a delta locally, and retrieve only the missing blocks via HTTP range requests.
Proposed Cloud Disk Incremental Sync Scheme
The design moves both sign and delta generation to the PC client, keeping the server’s role limited to merging patches and storing sign/delta files. If the hit rate is low, the client may fall back to full transfer.
Key design points:
Client‑side computation of sign and delta reduces server load.
Incremental download follows the zsync model, with client‑side processing.
Browser‑based sync is not feasible due to limited processing capability.
Server stores sign files, delta files, and merged new files to ensure fallback paths.
Remaining Issues
Fragmented blocks caused by the sliding‑window approach; reducing block size improves hit rate but increases sign size and computation.
Large binary formats (JPEG, video) exhibit low locality, making incremental sync inefficient.
Server‑side patch merging is resource‑intensive, requiring delta reception, old‑file retrieval, merging, and upload of the new file.
Potential optimizations include streaming merge processing and asynchronous patch handling with client notifications.
Future Optimization Items
Store original file length in sign files and use fixed‑size length fields in delta files for simpler processing.
Apply format‑aware variable‑size chunking (e.g., ZIP boundaries for OpenXML) to improve hit rates.
Explore content‑defined chunking (CDC) using Rabin fingerprints to create adaptive block boundaries, balancing chunk size and detection accuracy.
References are provided for the original rsync paper, detailed algorithm explanations, zsync design, Rabin fingerprinting, and related research.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
