Databases 6 min read

Boost Address Keyword Search with PostgreSQL: GIN, pg_trgm, and pg_bigm

This article explains how PostgreSQL's native full‑text search, GIN indexes, and extensions like pg_trgm, pg_bigm, and pg_jieba can dramatically improve the speed and accuracy of Chinese address keyword searches, offering a cost‑effective alternative to external search engines.

JD Cloud Developers

Dec 24, 2021

Boost Address Keyword Search with PostgreSQL: GIN, pg_trgm, and pg_bigm

In map or GIS scenarios, address keyword search is crucial; users often type a keyword such as "China Agricultural University" instead of the full address.

PostgreSQL is praised as the most advanced open‑source database, offering strong text‑search capabilities, including full‑text, fuzzy, and regex queries, as well as expression and GIN indexes, making it advantageous for address keyword retrieval.

1. Application Background

In address search scenarios, user input is tokenized and matched against a full‑text index of address corpora to obtain normalized address information. Queries are split into keywords, matched to historical address data, and results returned. Matching can take seconds to tens of seconds depending on tokenization and matching methods.

Common approaches include:

Using dedicated search engines such as Elasticsearch or Lucene.

Leveraging built‑in database search capabilities.

While Elasticsearch provides flexible search, it increases development and operational costs. This article explores how to achieve efficient Chinese address search using PostgreSQL’s native features.

2. Technical Solution

GIN (Generalized Inverted Index) stores (key, posting list) pairs, where each key maps to the tuple IDs where it appears, enabling fast lookup of rows containing specific keywords.

The pg_trgm extension implements a trigram (3‑gram) tokenization; it slides a window of three characters over the text, creates tokens, and builds a GIN index for efficient fuzzy matching.

The pg_bigm extension is similar but uses bigrams (2‑gram), which performs better for one‑ or two‑character queries.

We built a dataset of about 100 million Beijing address records and benchmarked PostgreSQL. Results show that both pg_trgm+gin and pg_bigm+gin outperform traditional B‑tree indexes for fuzzy searches. Because logistics keywords are usually three characters or longer, pg_trgm+gin provides millisecond‑level response times.

For Chinese text segmentation, the pg_jieba extension integrates the Jieba library, offering customizable dictionaries, keyword extraction, and part‑of‑speech tagging, further improving search relevance.

3. Conclusion

PostgreSQL’s rich indexing options, powerful full‑text search, and extensive plugin ecosystem allow developers to implement high‑performance address keyword search without synchronizing data to external search engines, simplifying architecture, reducing development cost, and ensuring real‑time query results.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

PostgreSQL Full-Text Search GIN Index chinese segmentation pg_trgm address keyword search pg_bigm

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.