Fundamentals 4 min read

How Inverted Indexes Power Fast Full-Text Search

This article explains what an inverted index is, why it’s essential for full‑text search, how it is built and queried, and common token transformations such as stop‑word removal, lemmatization, and stemming.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How Inverted Indexes Power Fast Full-Text Search

Content Overview

What is an inverted index and why is it needed?

How does an inverted index work?

1. What is an inverted index?

Imagine a dating website with a user table. A user might say, “I want to find a PHP developer in Shanghai,” or “I want a Java developer in Beijing who loves travel and food.” Matching on gender, city, and language quickly becomes a combinatorial problem.

Relational‑database indexes struggle with such multi‑field full‑text queries, so a full‑text search engine uses an inverted index.

An inverted index stores a "content → document" mapping, enabling rapid full‑text lookup.

2. How does an inverted index work?

The process consists of two main steps:

Creating the inverted index

Searching the inverted index

2.1 Creating the inverted index

Consider two documents:

"Recipe of pasta with sauce pesto"
"Recipe of delicious carbonara pasta"

Each document is tokenized into individual words (tokens) and the token‑to‑document relationships are stored.

The resulting index looks like:

2.2 Inverted index search

Search example: "pasta recipe"

Tokenization yields two tokens: pasta and recipe . Both tokens match in both documents, so both documents are returned with equal scores.

Another search: "carbonara pasta"

Both documents match, but Document #2 receives a higher score because it matches both tokens ( carbonara and pasta ), whereas Document #1 matches only pasta .

2.3 Token transformations

Before storing or searching, tokens can be transformed to improve relevance:

Stop‑word removal : discard high‑frequency, low‑information words such as "of", "the", "for".

Lemmatization (normalization) : map inflected forms to their dictionary base, e.g., "running" → "run", "walks" → "walk", "thought" → "think".

Stemming : truncate word endings to obtain a root form; useful for words not covered by a dictionary, though it cannot handle irregular verbs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

search engineinformation retrievalinverted indextokenizationFull‑Text Search
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.