Backend Development 12 min read

How to Install and Use the IK Chinese Analyzer Plugin in Elasticsearch

This article explains why Elasticsearch's built‑in tokenizers struggle with Chinese text, introduces the IK analyzer plugin, provides step‑by‑step Docker and file‑based installation methods, shows how to configure custom dictionaries via Nginx, and demonstrates smart and max‑word tokenization queries.

Wukong Talks Architecture

Mar 31, 2021

How to Install and Use the IK Chinese Analyzer Plugin in Elasticsearch

Elasticsearch's built‑in tokenizers do not handle Chinese well, so searching Chinese terms such as “悟空哥” fails.

1. Tokenizer principles in Elasticsearch

1.1 Tokenizer concept

A tokenizer receives a character stream and splits it into tokens, which can be combined into custom analyzers.

1.2 Standard tokenizer

The standard tokenizer splits on whitespace and records token positions and offsets, useful for phrase queries and highlighting.

1.3 English and punctuation example

POST _analyze
{
  "analyzer": "standard",
  "text": "Do you know why I want to study ELK? 2 3 33..."
}

Result:

do, you, know, why, i, want, to, study, elk, 2,3,33

1.4 Chinese tokenization example

POST _analyze
{
  "analyzer": "standard",
  "text": "悟空聊架构"
}

The standard tokenizer splits each Chinese character separately, producing 悟, 空, 聊, 架, 构 instead of the desired words.

2. Installing the IK Chinese analyzer plugin

2.1 Plugin source

https://github.com/medcl/elasticsearch-analysis-ik/releases

Match the plugin version with the Elasticsearch version (e.g., 7.4.2).

{
  "name" : "8448ec5f3312",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "xC72O3nKSjWavYZ-EPt9Gw",
  "version" : {
    "number" : "7.4.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
    "build_date" : "2019-10-28T20:40:44.881551Z",
    "build_snapshot" : false,
    "lucene_version" : "8.2.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

2.2 Installation methods

2.2.1 Inside the Elasticsearch container

Enter the container: docker exec -it <container id> /bin/bash Download the plugin zip:

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip

Unzip and clean up:

unzip ELK-IKv7.4.2.zip -d ./ik
chmod -R 777 ik/
rm -rf *.zip

2.2.2 Via a mapping directory

Copy the zip to the plugins folder:

cd /mydata/elasticsearch/plugins
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
unzip ELK-IKv7.4.2.zip -d ./ik
rm -rf *.zip

2.2.3 Upload with Xftp

Use XShell/Xftp to copy the zip into the container, then unzip as above.

3. Verifying the installation

docker exec -it <container id> /bin/bash
elasticsearch-plugin list

The command should output ik, confirming the plugin is installed. Restart the container:

exit
docker restart elasticsearch

4. Using the IK analyzer

The plugin provides two modes: ik_smart (intelligent) and ik_max_word (maximum word segmentation).

Smart mode example

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "一颗小星星"
}

Result: “一颗”, “小星星”.

Max‑word mode example

POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "一颗小星星"
}

Result: “一颗”, “一”, “颗”, “小星星”, “小星”, “星星”.

5. Custom dictionary

To keep terms like “悟空哥” intact, add them to a custom dictionary and reference it in IKAnalyzer.cfg.xml (path:

/usr/share/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
  <comment>IK Analyzer 扩展配置</comment>
  <entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
  <entry key="ext_stopwords">custom/ext_stopword.dic</entry>
  <entry key="remote_ext_dict">location</entry>
  <entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry>
</properties>

Place a file (e.g., ik.txt) on a remote Nginx server and set remote_ext_dict to its URL.

Deploying Nginx for remote dictionary

docker run -p 80:80 --name nginx -d nginx:1.10
docker container cp nginx:/etc/nginx ./conf
mkdir nginx
mv conf nginx/
docker stop nginx
docker rm <container id>
docker run -p 80:80 --name nginx \
  -v /mydata/nginx/html:/usr/share/nginx/html \
  -v /mydata/nginx/logs:/var/log/nginx \
  -v /mydata/nginx/conf:/etc/nginx \
  -d nginx:1.10

Create ik.txt containing “悟空哥” and make it accessible at http://192.168.56.10/ik/ik.txt. After updating IKAnalyzer.cfg.xml to point to this URL, restart Elasticsearch:

docker restart elasticsearch
docker update elasticsearch --restart=always

Now a query for “悟空哥聊架构” yields the three tokens “悟空哥”, “聊”, “架构”.

- END -

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker Elasticsearch Nginx Chinese Tokenization Custom Dictionary IK Analyzer

Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.