Databases 13 min read

How WeChat Supercharged Mobile Full‑Text Search with SQLite FTS Optimizations

This article explains the principles of SQLite’s Full‑Text Search extension, details WeChat’s architecture for mobile search, and shares practical performance‑tuning techniques—including custom tokenizers, offset function improvements, and SQL‑level ranking—to achieve sub‑50 ms query times on billions of records.

WeChat Client Technology Team

Oct 18, 2017

How WeChat Supercharged Mobile Full‑Text Search with SQLite FTS Optimizations

Introduction

Full‑text search (FTS) on mobile devices is critical for local data retrieval, but limited hardware makes performance a challenge. This article uses SQLite FTS Extension, the de‑facto standard on Android and iOS, and shares WeChat’s practical optimization experience.

SQLite FTS Extension

SQLite FTS Extension is a built‑in plugin that provides fast, stable, easy‑to‑integrate full‑text search using inverted indexes. It is compatible across platforms and offers three main versions:

FTS3 : basic version with full features and custom tokenizers.

FTS4 : performance‑enhanced version with relevance functions (MatchInfo).

FTS5 : major redesign with segmented instance‑list storage, larger capacity, and extensible auxiliary functions.

Storage Architecture

WeChat’s full‑text search, launched in 2014, isolates its search database from the main business database, uses read‑only connections for indexing, reduces database operations by batching inserts, and employs delayed updates for hot data.

The schema separates an IndexTable (pure search index) from a MetaTable (business filtering), linking them via DocId and business type fields.

Search Optimization

From version 5.4 to 6.5.7, average query time grew tenfold as data volume increased, prompting a deep analysis of the search pipeline. Data retrieval accounted for over 80 % of total latency.

Reducing Single‑Loop Cost

The Offsets function in FTS4 consumed >70 % of each loop’s time, especially with large result sets. Offsets converts term positions to byte offsets for highlighting.

Input: Query, hit document, target term offsets. Output: target byte offsets.

Example: Query=我, Doc=我和我弟弟去逛街, term offsets=0,2 → byte offsets=0,6.

Optimizing the tokenizer reduced Offsets processing of 100 KB data to 21 ms, but further gains were needed for larger workloads.

Tokenizer Improvements

WeChat replaced the ICU tokenizer with a custom Simple tokenizer that treats English letters and digits as a single token and Chinese characters individually, eliminating unnecessary UTF‑8→Unicode conversion and dictionary lookup.

Reducing Overall Loop Count

Moving sorting logic from Java to SQL required a 64‑bit ranking value ( MMRank ) that encodes multiple ranking factors. The SQL then orders by this single column.

Special Optimization – Chat Record Search

Chat search must handle statistical attributes and massive result sets (up to 200 k hits per keyword). Three optimization strategies were evaluated.

Solution 1: GROUP BY

SQL‑level aggregation counts hits per conversation, but cannot use indexes efficiently and performs full scans.

Solution 2: Stepwise Calculation

First select the three most recent active conversations, then run separate count queries. This reduces work but requires multiple SQL statements.

Solution 3: MessageCount Aggregation

A custom aggregate function computes hit counts for the top conversations in a single query.

Conclusion

After these optimizations, average query latency for all users fell below 50 ms and below 200 ms for heavy users—a five‑fold improvement. Further gains are possible, such as embedding byte offsets directly in the DocList for faster highlighting.

The author hopes the sharing provides value to readers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Mobile Development sqlite WeChat Full-Text Search

Written by

WeChat Client Technology Team

Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Introduction

SQLite FTS Extension

Storage Architecture

Search Optimization

Reducing Single‑Loop Cost

Tokenizer Improvements

Reducing Overall Loop Count

Special Optimization – Chat Record Search

Solution 1: GROUP BY

Solution 2: Stepwise Calculation

Solution 3: MessageCount Aggregation

Conclusion

WeChat Client Technology Team

How this landed with the community

Was this worth your time?

0 Comments

Solution 1: GROUP BY

Solution 2: Stepwise Calculation

Solution 3: MessageCount Aggregation