Optimizing Small‑Scale Product Title Search with MySQL FULLTEXT and Ngram Indexing
This article explains how to efficiently search product titles in a few‑million‑row MySQL table by using built‑in FULLTEXT indexes, jieba‑php tokenization, and the ngram parser, avoiding unnecessary external search engines and providing practical code examples.
When querying product titles in a relatively small dataset (e.g., a few million rows), many developers instinctively reach for Elasticsearch or XunSearch, but MySQL’s native FULLTEXT index can handle the workload without extra infrastructure.
The author demonstrates a workflow that uses the jieba-php library to tokenize Chinese titles, stores the tokens in a separate product_title table linked to product, and creates a FULLTEXT index on the title column.
Example of initializing jieba and tokenizing a title:
Jieba::init();
Finalseg::init();
$tokens = Jieba::cut($model->title);MySQL can create a FULLTEXT index directly:
CREATE TABLE test (
id INT AUTO_INCREMENT PRIMARY KEY,
title TEXT,
FULLTEXT(title)
);To combine LIKE and FULLTEXT search, the following query can be used:
$searchTerm = '关键词';
$results = YourModel::where('title', 'like', "%{$searchTerm}%")
->orWhereRaw("MATCH(title) AGAINST(? IN NATURAL LANGUAGE MODE)", [$searchTerm])
->get();Because MySQL’s default tokenization does not handle Chinese, the ngram parser is required for multilingual support. Adding an ngram FULLTEXT index looks like this:
ALTER TABLE 表名 ADD FULLTEXT(title) WITH PARSER ngram;After tokenization, searching for a term such as “饼干” will quickly locate the corresponding document IDs via the index.
For compound queries like “饼干包装”, the author suggests storing each token separately and using a BOOLEAN MODE query:
SELECT * FROM test_fulltext WHERE MATCH(title) AGAINST('饼干+包装' IN BOOLEAN MODE);Additional performance tips include adjusting MySQL’s index buffer size and avoiding in‑memory middlewares when the index resides on disk.
The article concludes with a humorous note that most developers will never need such complex setups, but the presented solution is a lightweight alternative to heavyweight search stacks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
