Boost Address Keyword Search with PostgreSQL: GIN, pg_trgm, and pg_bigm
This article explains how PostgreSQL's native full‑text search, GIN indexes, and extensions like pg_trgm, pg_bigm, and pg_jieba can dramatically improve the speed and accuracy of Chinese address keyword searches, offering a cost‑effective alternative to external search engines.
In map or GIS scenarios, address keyword search is crucial; users often type a keyword such as "China Agricultural University" instead of the full address.
PostgreSQL is praised as the most advanced open‑source database, offering strong text‑search capabilities, including full‑text, fuzzy, and regex queries, as well as expression and GIN indexes, making it advantageous for address keyword retrieval.
1. Application Background
In address search scenarios, user input is tokenized and matched against a full‑text index of address corpora to obtain normalized address information. Queries are split into keywords, matched to historical address data, and results returned. Matching can take seconds to tens of seconds depending on tokenization and matching methods.
Common approaches include:
Using dedicated search engines such as Elasticsearch or Lucene.
Leveraging built‑in database search capabilities.
While Elasticsearch provides flexible search, it increases development and operational costs. This article explores how to achieve efficient Chinese address search using PostgreSQL’s native features.
2. Technical Solution
GIN (Generalized Inverted Index) stores (key, posting list) pairs, where each key maps to the tuple IDs where it appears, enabling fast lookup of rows containing specific keywords.
The pg_trgm extension implements a trigram (3‑gram) tokenization; it slides a window of three characters over the text, creates tokens, and builds a GIN index for efficient fuzzy matching.
The pg_bigm extension is similar but uses bigrams (2‑gram), which performs better for one‑ or two‑character queries.
We built a dataset of about 100 million Beijing address records and benchmarked PostgreSQL. Results show that both pg_trgm+gin and pg_bigm+gin outperform traditional B‑tree indexes for fuzzy searches. Because logistics keywords are usually three characters or longer, pg_trgm+gin provides millisecond‑level response times.
For Chinese text segmentation, the pg_jieba extension integrates the Jieba library, offering customizable dictionaries, keyword extraction, and part‑of‑speech tagging, further improving search relevance.
3. Conclusion
PostgreSQL’s rich indexing options, powerful full‑text search, and extensive plugin ecosystem allow developers to implement high‑performance address keyword search without synchronizing data to external search engines, simplifying architecture, reducing development cost, and ensuring real‑time query results.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
