How to Quickly Sync Massive Third-Party Data with SFTP, Jobs, and Rate-Limiting
Learn a step‑by‑step strategy to synchronize millions of records from dozens of third‑party systems across all provinces, using SFTP for bulk transfers, standardized file formats, scheduled jobs, Redis‑based rate limiting, MQ‑driven ingestion, and data‑consistency checks, while addressing security and performance challenges.
Preface
A colleague asked how to quickly synchronize data from third‑party platforms covering all 34 provinces and multiple systems with both full and incremental data.
The challenges include:
Inability to directly access third‑party databases.
Risk of data leakage if exporting historical data to Excel.
How to quickly sync historical data.
How to handle incremental data.
Whether the interface needs rate limiting.
How to verify data consistency for incremental data.
1. How to Quickly Sync Historical Data?
Direct database access is not possible, and exporting to Excel is insecure. The solution is to use SFTP for secure file transfer.
2. How to Use SFTP?
SFTPis a secure file‑transfer protocol based on SSH, unlike traditional FTP which uses plain TCP port 21. Differences include connection method, security, transfer efficiency, protocol, and the presence of a secure channel.
To implement SFTP synchronization, plan accounts, directories, and file formats.
2.1 Account and Permission Control
Set up an SFTP server with a public domain and port, create a root directory /data, and sub‑directories for each province (e.g., /data/sichuan, /data/shenzhen, /data/beijing). Assign each sub‑directory a dedicated account with read/write permissions only for that path. Use strong, random passwords.
These read permissions are mainly for troubleshooting and should be assigned based on actual needs.
Also create a read‑only account for internal use covering the entire /data directory.
2.2 Unified Data Format
Define a uniform file naming convention: province_pinyin_YYYYMMDD.txt (e.g., sichuan_20230724.txt). Specify fixed‑width fields inside the txt file (e.g., id 20 chars, name 30 chars, amount 10 chars) and pad with zeros when necessary. Parsing then reads each line and extracts columns based on length.
2.3 Using Jobs to Sync Data
When third‑party systems place their historical txt files in the designated directories, a job reads all txt files under /data, parses the fixed‑width records, applies business logic, and writes them to the database. The job can be multithreaded for speed.
3. How to Handle Incremental Data?
Incremental data requires real‑time processing, so a file‑based SFTP approach is insufficient. Provide a unified data‑reporting API that accepts batch uploads (e.g., up to 500 records per request).
Implement rate limiting using Redis: allow each third‑party system up to 10 calls per second, and a total of 500 calls per second across all systems. If the limit is exceeded, return a “request too frequent” error.
To improve write performance, the API should enqueue received data to a message queue (e.g., RocketMQ) instead of writing directly to the database. Consumers read from the queue and persist data asynchronously.
If a consumer fails, enable automatic retries (up to 3 times) and, after repeated failures, move the message to a dead‑letter queue for manual inspection.
4. How to Verify Data Consistency?
Historical data consistency can be checked by comparing the txt files uploaded via SFTP with the database records.
For incremental data, require third‑party systems to generate a nightly txt dump of the previous day's increments and upload it to SFTP. A nightly job (e.g., at 1 am) reads this file and compares each record with the database.
If a record exists in the database with the same ID but a modification timestamp of today, ignore it (it will be handled in the next day's run). If the timestamps match yesterday, verify the fields; if they differ, overwrite the database record with the txt file data. New IDs are inserted.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
