Cutting WeChat SQLite Corruption in Half: Strategies and Lessons
Facing a 0.02% SQLite corruption rate that threatened years of chat history, the WeChat mobile team identified three main causes—insufficient space, power loss, and sync failures—and implemented space management, full sync settings, and master‑table backups, halving damage and doubling repair success.
1. Introduction
WeChat does not store chat records on backend servers; all records reside in a local SQLite database on the mobile client. When this database becomes corrupted, years of chat history can be lost. The observed corruption rate in production is 0.02% (2 out of 10,000 users), and the official repair algorithm succeeds only about 30% of the time.
2. SQLite Corruption Causes and Optimizations
While SQLite’s official documentation lists many possible corruption sources, analysis of real cases on mobile devices shows that only three factors dominate:
Insufficient storage space
Device power loss
File sync failures
We address each of these causes directly.
2.1 Optimize Space Usage
We introduced proactive space‑management policies: files must be requested before use, each file declares an explicit lifetime (day, week, month, or permanent), and expired files are automatically cleaned up.
Unrequested files are detected and removed.
All business files declare a validity period.
Expired files are cleared automatically.
For storage used by other apps (photos, videos, etc.), WeChat can only prompt users to clean up their device.
2.2 Optimize File Sync
2.2.1 synchronous = FULL
Set SQLite’s synchronous pragma to FULL so that each transaction is fully flushed to disk.
2.2.2 fullfsync = 1
On iOS, enable the fullfsync pragma (fullfsync=1) to guarantee that write order matches commit order, preventing corruption after sudden power loss.
2.3 Optimization Results
These combined measures reduced the corruption rate by more than half, though some corruption remains and repair success still needs improvement.
3. SQLite Repair Logic Optimization
3.1 master Table
SQLite stores each table’s metadata in the special sqlite_master table, which itself is a B‑tree. The entire database consists of a forest of B‑trees.
3.2 Low Success Rate of Official Repair Algorithm
The official algorithm reads each table’s schema from sqlite_master and copies data to a new database. If any node of the master table is corrupted, all tables under that node become unrecoverable; if the root page (page 0) is corrupted, the entire database cannot be repaired, explaining the low success rate.
3.3 Backup master Table
We introduced periodic backups of the master table using simple array serialization, avoiding B‑tree complexities.
3.3.1 Backup Timing
Analysis shows that the master table’s root node only changes when a table is created or dropped, allowing a straightforward timed polling strategy for backups.
3.3.2 Backup File Validity
To guard against backup file corruption, we maintain two backup files, each with a CRC header. On each backup, we overwrite the older or corrupted file, ensuring at least one valid copy. This mirrors the MVCC approach used by Realm.
When space is insufficient (a common corruption scenario), we pre‑allocate 32 KB for the backup file and grow it in 32 KB increments, enough for roughly 750 tables’ metadata.
3.4 Backup Results
Backing up the master table more than doubled the repair success rate.
4. Other
These reliability improvements, together with earlier concurrency optimizations, will be merged into the upcoming open‑source WCDB (WeChat Database) component, slated for release in mid‑2017.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent TDS Service
TDS Service offers client and web front‑end developers and operators an intelligent low‑code platform, cross‑platform development framework, universal release platform, runtime container engine, monitoring and analysis platform, and a security‑privacy compliance suite.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
