Build Your Own Full‑Text Search Engine with Elasticsearch: A Step‑by‑Step Guide
This tutorial walks you through installing Elasticsearch, understanding its core concepts such as nodes, clusters, indexes, documents and types, configuring Chinese analyzers, performing CRUD operations, and executing various search queries with practical command‑line examples.
Full‑text search is a common requirement, and the open‑source Elasticsearch (referred to as Elastic) is the leading engine for it. It can quickly store, search, and analyze massive data and is used by Wikipedia, Stack Overflow, and GitHub.
1. Installation
Elastic requires a Java 8 environment. Install Java and set the JAVA_HOME variable correctly. Download the zip package (e.g., elasticsearch-5.5.1.zip), unzip it, and start Elastic with ./bin/elasticsearch. If you encounter the error “max virtual memory areas vm.maxmapcount [65530] is too low”, run sudo sysctl -w vm.max_map_count=262144. When running correctly, Elastic listens on port 9200; you can verify it with curl localhost:9200, which returns a JSON object describing the node, cluster, and version. By default Elastic only allows local access; to enable remote access edit config/elasticsearch.yml, set network.host: 0.0.0.0, and restart.
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.zip $ unzip elasticsearch-5.5.1.zip $ cd elasticsearch-5.5.1/ $ ./bin/elasticsearch
$ sudo sysctl -w vm.max_map_count=262144
$ curl localhost:9200
2. Basic Concepts
2.1 Node and Cluster
Elastic is a distributed system. A single Elastic instance is called a node; a group of nodes forms a cluster.
2.2 Index
All fields are indexed into an inverted index. An index is the top‑level data unit (its name must be lowercase). List all indexes with curl -X GET 'http://localhost:9200/_cat/indices?v'.
$ curl -X GET 'http://localhost:9200/_cat/indices?v'
2.3 Document
A document is a single record inside an index, represented as JSON, e.g.:
{ "user": "张三", "title": "工程师", "desc": "数据库管理" }
2.4 Type
Types are logical groupings within an index, used for filtering documents. Different types should have similar schemas, but types are being deprecated: Elasticsearch 6.x allows only one type per index, and 7.x removes them entirely.
$ curl 'localhost:9200/_mapping?pretty=true'
3. Create and Delete Index
Create an index with a PUT request, e.g. curl -X PUT 'localhost:9200/weather'. The response contains "acknowledged":true. Delete the index with a DELETE request.
$ curl -X PUT 'localhost:9200/weather'
$ curl -X DELETE 'localhost:9200/weather'
4. Chinese Analyzer Settings
Install the IK analyzer plugin:
$ ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.5.1/elasticsearch-analysis-ik-5.5.1.zip
Then define a mapping that uses the ik_max_word analyzer for text fields:
$ curl -X PUT 'localhost:9200/accounts' -d ' { "mappings": { "person": { "properties": { "user": {"type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word"}, "title": {"type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word"}, "desc": {"type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word"} } } } }'
5. Data Operations
5.1 Add Document
PUT /accounts/person/1 with JSON adds a record. POST without an ID generates a random ID.
$ curl -X PUT 'localhost:9200/accounts/person/1' -d ' { "user": "张三", "title": "工程师", "desc": "数据库管理" }'
$ curl -X POST 'localhost:9200/accounts/person' -d ' { "user": "李四", "title": "工程师", "desc": "系统管理" }'
5.2 View Document
GET /accounts/person/1?pretty=true returns the document; the found field indicates success and _source contains the original record.
$ curl 'localhost:9200/accounts/person/1?pretty=true'
5.3 Delete Document
DELETE /accounts/person/1 removes the record.
$ curl -X DELETE 'localhost:9200/accounts/person/1'
5.4 Update Document
PUT the same path with new JSON updates the record; the version number increments.
$ curl -X PUT 'localhost:9200/accounts/person/1' -d ' { "user": "张三", "title": "工程师", "desc": "数据库管理,软件开发" }'
6. Data Query
6.1 Return All Records
GET /accounts/person/_search returns all documents in the index.
$ curl 'localhost:9200/accounts/person/_search'
6.2 Full‑Text Search
Use a match query on a field, e.g. searching for “软件” in desc:
$ curl 'localhost:9200/accounts/person/_search' -d ' { "query": {"match": {"desc": "软件"}} }'
6.3 Pagination
Control result size with size and offset with from:
$ curl 'localhost:9200/accounts/person/_search' -d ' { "query": {"match": {"desc": "管理"}}, "size": 1, "from": 1 }'
6.4 Logical Operations
Multiple terms are OR by default. For AND, use a bool query with must clauses.
$ curl 'localhost:9200/accounts/person/_search' -d ' { "query": { "bool": { "must": [ {"match": {"desc": "软件"}}, {"match": {"desc": "系统"}} ] } } }'
7. References
Elasticsearch official guide: https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html
A Practical Introduction to Elasticsearch: https://www.elastic.co/blog/a-practical-introduction-to-elasticsearch
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
