Databases 12 min read

Using MySQL 5.7 ngram Full-Text Search for Simple Text Retrieval

This article explains how to leverage MySQL 5.7's built‑in ngram full‑text parser to implement lightweight Chinese full‑text search, covering configuration, index creation, query modes, operator usage, and performance considerations.

政采云技术

Jan 17, 2024

Using MySQL 5.7 ngram Full-Text Search for Simple Text Retrieval

Background Introduction

In development, full‑text search is often needed. While Elasticsearch is common, for small data and low concurrency it adds complexity and waste. MySQL 5.7 includes an ngram full‑text plugin supporting Chinese tokenization for MyISAM and InnoDB, allowing simple full‑text search via MySQL.

MySQL Full‑Text Index Overview

MySQL full‑text indexes work on char, varchar, text fields and support InnoDB and MyISAM. The built‑in ngram parser handles Chinese, Japanese, Korean. Three search modes are supported: Boolean, Natural Language, and Query Expansion.

ngram Parser Overview

The ngram algorithm uses a sliding window of size n to split text into n‑grams. Default n is 2, configurable via ngram_token_size. Example tokenizations for “全文索引” are shown.

ngram_token_size =1 → ‘全’,‘文’,‘索’,‘引’; ngram_token_size =2 → ‘全文’,‘文索’,‘索引’; ngram_token_size =3 → ‘全文索’,‘文索引’; ngram_token_size =4 → ‘全文索引’.

How to View ngram_token_size

# show variables like '%token%';

Result shows innodb_ft_min_token_size=3, innodb_ft_max_token_size=84, ngram_token_size=2. When using the ngram parser, the InnoDB token size settings are ignored.

How to Modify ngram_token_size

Two ways: start mysqld with --ngram_token_size=1 or add ngram_token_size=1 under [mysqld] in my.cnf. Changes require MySQL restart and rebuilding indexes.

Creating Full‑Text Index

1. Create table with full‑text index:

CREATE TABLE `announcement` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `content` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT 'content',
  `title` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT 'title',
  PRIMARY KEY (`id`) USING BTREE,
  FULLTEXT INDEX `idx_full_text`(`content`) WITH PARSER `ngram`
) ENGINE=InnoDB AUTO_INCREMENT=6 CHARACTER SET=utf8mb4 COLLATE=utf8mb4_general_ci ROW_FORMAT=Dynamic;

2. Add index via ALTER TABLE:

ALTER TABLE announcement ADD FULLTEXT INDEX idx_full_text(content) WITH PARSER ngram;

3. Or create index directly:

CREATE FULLTEXT INDEX idx_full_text ON announcement(content) WITH PARSER `ngram`;

Full‑Text Search Tests

Insert sample data:

INSERT INTO announcement (id, content, title) VALUES
(1, '杭州市最近有大雪，出门多穿衣服', '杭州天气'),
(2, '杭州市最近温度很低，不适合举办杭州马拉松', '杭州马拉松'),
(3, '杭州市最近有大雪，西湖断桥会很美', '杭州西湖雪景'),
(4, '浙江大学的雪景也很美，周末可以去杭州逛逛', '浙江大学雪景'),
(5, '城北万象城开业，打折力度很大', '城北万象城开业火爆');

Boolean Mode

Examples of operators:

+ (must appear)

- (must not appear)

no operator (higher relevance)

> / < (increase/decrease relevance)

~ (negative relevance)

* (wildcard)

"" (phrase)

Sample queries and screenshots illustrate each operator.

Operator + (must appear)

SELECT * FROM announcement WHERE MATCH(content) AGAINST('+杭州' IN BOOLEAN MODE);

Operator - (must not appear)

SELECT * FROM announcement WHERE MATCH(content) AGAINST('+杭州 -大学' IN BOOLEAN MODE);

No operator (higher relevance)

SELECT * FROM announcement WHERE MATCH(content) AGAINST('杭州 大雪' IN BOOLEAN MODE);

Operator > (increase relevance)

SELECT * FROM announcement WHERE MATCH(content) AGAINST('+杭州 >大学' IN BOOLEAN MODE);

Operator ~ (negative relevance)

SELECT * FROM announcement WHERE MATCH(content) AGAINST('+杭州 ~大学' IN BOOLEAN MODE);

Wildcard *

SELECT * FROM announcement WHERE MATCH(content) AGAINST('杭州*' IN BOOLEAN MODE);

Phrase ""

SELECT * FROM announcement WHERE MATCH(content) AGAINST('"杭州"' IN BOOLEAN MODE);

Natural Language Mode

Default mode, equivalent to Boolean mode without operators. Example:

SELECT * FROM announcement WHERE MATCH(content) AGAINST('杭州 大学' IN NATURAL LANGUAGE MODE);

Query Expansion

First query with keyword “万象城” returns related n‑grams; second query expands the search string with those terms.

SELECT * FROM announcement WHERE MATCH(content) AGAINST('万象城' WITH QUERY EXPANSION);
SELECT * FROM announcement WHERE MATCH(content) AGAINST('城北 北万 万象 象城 城开 开业 打折 折力 力度 度很 很大' IN NATURAL LANGUAGE MODE);

Conclusion

Full‑text indexes using inverted indexes greatly improve search efficiency but increase storage. Proper token size configuration is essential for good results; otherwise performance may suffer or relevance may be unsatisfactory.

References

1. https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html “Boolean Full‑Text Searches”.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Full-Text Search Boolean Mode database indexing NGram

Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.