Databases 6 min read

Boost Address Keyword Search with PostgreSQL: GIN, pg_trgm, and pg_bigm

This article explains how PostgreSQL's native full‑text search, GIN indexes, and extensions like pg_trgm, pg_bigm, and pg_jieba can dramatically improve the speed and accuracy of Chinese address keyword searches, offering a cost‑effective alternative to external search engines.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
Boost Address Keyword Search with PostgreSQL: GIN, pg_trgm, and pg_bigm

In map or GIS scenarios, address keyword search is crucial; users often type a keyword such as "China Agricultural University" instead of the full address.

PostgreSQL is praised as the most advanced open‑source database, offering strong text‑search capabilities, including full‑text, fuzzy, and regex queries, as well as expression and GIN indexes, making it advantageous for address keyword retrieval.

1. Application Background

In address search scenarios, user input is tokenized and matched against a full‑text index of address corpora to obtain normalized address information. Queries are split into keywords, matched to historical address data, and results returned. Matching can take seconds to tens of seconds depending on tokenization and matching methods.

Common approaches include:

Using dedicated search engines such as Elasticsearch or Lucene.

Leveraging built‑in database search capabilities.

While Elasticsearch provides flexible search, it increases development and operational costs. This article explores how to achieve efficient Chinese address search using PostgreSQL’s native features.

2. Technical Solution

GIN (Generalized Inverted Index) stores (key, posting list) pairs, where each key maps to the tuple IDs where it appears, enabling fast lookup of rows containing specific keywords.

The

pg_trgm

extension implements a trigram (3‑gram) tokenization; it slides a window of three characters over the text, creates tokens, and builds a GIN index for efficient fuzzy matching.

The

pg_bigm

extension is similar but uses bigrams (2‑gram), which performs better for one‑ or two‑character queries.

We built a dataset of about 100 million Beijing address records and benchmarked PostgreSQL. Results show that both

pg_trgm+gin

and

pg_bigm+gin

outperform traditional B‑tree indexes for fuzzy searches. Because logistics keywords are usually three characters or longer,

pg_trgm+gin

provides millisecond‑level response times.

For Chinese text segmentation, the

pg_jieba

extension integrates the Jieba library, offering customizable dictionaries, keyword extraction, and part‑of‑speech tagging, further improving search relevance.

3. Conclusion

PostgreSQL’s rich indexing options, powerful full‑text search, and extensive plugin ecosystem allow developers to implement high‑performance address keyword search without synchronizing data to external search engines, simplifying architecture, reducing development cost, and ensuring real‑time query results.

PostgreSQLFull-Text SearchGIN indexChinese segmentationpg_trgmaddress keyword searchpg_bigm
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.