How WeChat Solved SQLite Corruption: Inside WCDB’s Multi‑Layer Recovery Strategies
WeChat’s WCDB component tackles pervasive SQLite corruption on mobile devices by combining dump‑based, backup, and B‑tree parsing recovery methods, achieving up to 78% success while supporting encrypted databases, large data volumes, and seamless user experience.
Introduction
SQLite databases have long suffered from corruption issues across Android, iOS, Windows, and Linux. Since WeChat stores all messages locally and the server does not keep backups, a corrupted DB would erase user messages, which is unacceptable.
WeChat is about to open‑source its mobile database component WCDB (WeChat Database) to address data loss caused by DB corruption.
Our Requirements
High recovery success rate – aiming for 90%‑99% rather than a mere “try”.
Support encrypted DBs – WeChat on Android uses SQLCipher, where a single‑byte change can corrupt the whole decrypted view.
Handle massive data – some heavy users have DBs larger than 2 GB.
Zero impact on user experience – the failure rate is less than 0.01%, so any preparation must be invisible to users.
Over years, WeChat has iterated three different recovery solutions, gradually approaching these goals.
Official Dump Recovery Scheme
The classic SQLite .dump command exports the entire database as SQL statements. It reads the sqlite_master table to generate CREATE TABLE statements, then iterates each table with SELECT * FROM … to produce INSERT statements. Even if a table is corrupted, the process can skip the error and continue, extracting undamaged tables and the readable portion of corrupted tables.
Because this runs entirely on top of SQLite, it naturally supports encrypted SQLCipher databases without extra handling.
The approach requires no preparation; only users with a broken DB spend a few minutes restoring, which most users do not notice. However, the success rate after launch was only about 30% – defined as restoring at least one record.
Failure analysis showed that when the first page of sqlite_master is corrupted, the whole DB becomes unreadable, leading to the low success rate.
Backup Recovery Scheme
When data is irreparably damaged, the most straightforward solution is backup. SQLite offers several backup mechanisms:
Copy – Directly copy the DB file (main DB + journal or WAL).
Dump – Use .dump on a healthy DB to produce SQL statements for later restoration.
Backup API – SQLite’s built‑in page‑level hot backup.
For a mobile app, the key metrics are backup size, backup performance, and restore performance. We tested with a 50 MB DB containing ~100 k rows.
The most balanced choice was Dump + compression , which minimized backup size while offering acceptable backup speed; restore speed was slower but acceptable because restore scenarios are rare.
We further optimized this scheme by using a custom binary format for the dump output (to avoid costly SQL formatting) and performing gzip compression on a separate thread. The binary dump is also encrypted. This eliminated the need to re‑compile SQL during restore, improving both backup and restore throughput by 150% and 40% respectively.
Even with these improvements, backing up very large DBs on a mobile device can be time‑ and power‑consuming, so WeChat schedules backups only when the device is charging and the screen is off.
After deployment, the backup‑based scheme achieved a 72% success rate, but the most data‑sensitive users still suffered losses, prompting a new approach.
B‑tree Parsing Recovery Scheme (RepairKit)
The backup approach’s high cost led us back to the dump method, which essentially tries to read whatever is still readable from a corrupted DB. Two outcomes are possible:
The DB’s basic format is intact but some pages are corrupted; SQLite returns SQLITE_CORRUPT for those pages, yet the readable data can be recovered.
The fundamental structure (file header or sqlite_master) is damaged, causing immediate SQLITE_CORRUPT and making recovery impossible.
Most failures fall into the second case, so we needed a way to bypass the reliance on sqlite_master. This system table stores table names, types, creation SQL, and the root page number (always page 1). If sqlite_master is unreadable, we must supply its information from a backup.
Because sqlite_master is tiny and only changes when the schema changes, backing it up costs only a few milliseconds and can be refreshed whenever a DDL statement runs.
We built a minimal read‑only system that implements only the necessary VFS functions (open/read/close), SQLCipher decryption, and B‑tree parsing logic, omitting the full SQLite query engine, journaling, and B‑tree balancing.
One subtle issue is that after an ALTER TABLE ADD COLUMN, existing B‑tree rows lack the new column; SQLite fills missing columns with default values during query. Our parser must detect and pad missing columns during restoration, otherwise inserts would fail.
After launch, the B‑tree parsing scheme achieved roughly a 78% success rate (recovered pages ÷ total pages). It has low preparation cost, negligible backup overhead, and restores the most recent data without the time‑lag of backup‑based methods. However, if corruption occurs in the middle of a B‑tree (non‑leaf nodes), subsequent data becomes unreadable.
Combining Different Schemes
Because the B‑tree parser and backup recovery have different failure modes, WeChat combines them to cover more corruption scenarios. Some data is temporary or can be fetched from the server, so it may be left unrepaired; other data that is critical or costly to recover must be fixed.
If the B‑tree parser succeeds, it is preferred because it restores the latest records without the latency of backups. If it fails partway, the backup can rescue the remaining data.
In practice, the original dump‑based method is now largely superseded by the B‑tree parser, with dump recovery serving as a last resort.
The three recovery methods rely only on the SQLite file format and basic file‑system operations, making them cross‑platform. Platform‑specific optimizations are possible, such as Android’s JobScheduler to run backups only while charging and the screen is off.
Our Component
WCDB – WeChat Database – integrates the above recovery schemes along with encryption, connection‑pool concurrency, ORM, and performance optimizations. It will be open‑sourced soon.
For more details, see the previous articles in the WCDB series.
WeChat Client Technology Team
Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
