Big Data 14 min read

Build Your Own Full‑Text Search Engine with Elasticsearch: A Step‑by‑Step Guide

This tutorial walks you through installing Elasticsearch, understanding its core concepts such as nodes, clusters, indexes, documents and types, configuring Chinese analyzers, performing CRUD operations, and executing various search queries with practical command‑line examples.

21CTO
21CTO
21CTO
Build Your Own Full‑Text Search Engine with Elasticsearch: A Step‑by‑Step Guide

Full‑text search is a common requirement, and the open‑source Elasticsearch (referred to as Elastic) is the leading engine for it. It can quickly store, search, and analyze massive data and is used by Wikipedia, Stack Overflow, and GitHub.

1. Installation

Elastic requires a Java 8 environment. Install Java and set the JAVA_HOME variable correctly. Download the zip package (e.g., elasticsearch-5.5.1.zip), unzip it, and start Elastic with ./bin/elasticsearch. If you encounter the error “max virtual memory areas vm.maxmapcount [65530] is too low”, run sudo sysctl -w vm.max_map_count=262144. When running correctly, Elastic listens on port 9200; you can verify it with curl localhost:9200, which returns a JSON object describing the node, cluster, and version. By default Elastic only allows local access; to enable remote access edit config/elasticsearch.yml, set network.host: 0.0.0.0, and restart.

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.zip $ unzip elasticsearch-5.5.1.zip $ cd elasticsearch-5.5.1/ $ ./bin/elasticsearch
$ sudo sysctl -w vm.max_map_count=262144
$ curl localhost:9200

2. Basic Concepts

2.1 Node and Cluster

Elastic is a distributed system. A single Elastic instance is called a node; a group of nodes forms a cluster.

2.2 Index

All fields are indexed into an inverted index. An index is the top‑level data unit (its name must be lowercase). List all indexes with curl -X GET 'http://localhost:9200/_cat/indices?v'.

$ curl -X GET 'http://localhost:9200/_cat/indices?v'

2.3 Document

A document is a single record inside an index, represented as JSON, e.g.:

{ "user": "张三", "title": "工程师", "desc": "数据库管理" }

2.4 Type

Types are logical groupings within an index, used for filtering documents. Different types should have similar schemas, but types are being deprecated: Elasticsearch 6.x allows only one type per index, and 7.x removes them entirely.

$ curl 'localhost:9200/_mapping?pretty=true'

3. Create and Delete Index

Create an index with a PUT request, e.g. curl -X PUT 'localhost:9200/weather'. The response contains "acknowledged":true. Delete the index with a DELETE request.

$ curl -X PUT 'localhost:9200/weather'
$ curl -X DELETE 'localhost:9200/weather'

4. Chinese Analyzer Settings

Install the IK analyzer plugin:

$ ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.5.1/elasticsearch-analysis-ik-5.5.1.zip

Then define a mapping that uses the ik_max_word analyzer for text fields:

$ curl -X PUT 'localhost:9200/accounts' -d ' { "mappings": { "person": { "properties": { "user": {"type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word"}, "title": {"type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word"}, "desc": {"type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word"} } } } }'

5. Data Operations

5.1 Add Document

PUT /accounts/person/1 with JSON adds a record. POST without an ID generates a random ID.

$ curl -X PUT 'localhost:9200/accounts/person/1' -d ' { "user": "张三", "title": "工程师", "desc": "数据库管理" }'
$ curl -X POST 'localhost:9200/accounts/person' -d ' { "user": "李四", "title": "工程师", "desc": "系统管理" }'

5.2 View Document

GET /accounts/person/1?pretty=true returns the document; the found field indicates success and _source contains the original record.

$ curl 'localhost:9200/accounts/person/1?pretty=true'

5.3 Delete Document

DELETE /accounts/person/1 removes the record.

$ curl -X DELETE 'localhost:9200/accounts/person/1'

5.4 Update Document

PUT the same path with new JSON updates the record; the version number increments.

$ curl -X PUT 'localhost:9200/accounts/person/1' -d ' { "user": "张三", "title": "工程师", "desc": "数据库管理,软件开发" }'

6. Data Query

6.1 Return All Records

GET /accounts/person/_search returns all documents in the index.

$ curl 'localhost:9200/accounts/person/_search'

6.2 Full‑Text Search

Use a match query on a field, e.g. searching for “软件” in desc:

$ curl 'localhost:9200/accounts/person/_search' -d ' { "query": {"match": {"desc": "软件"}} }'

6.3 Pagination

Control result size with size and offset with from:

$ curl 'localhost:9200/accounts/person/_search' -d ' { "query": {"match": {"desc": "管理"}}, "size": 1, "from": 1 }'

6.4 Logical Operations

Multiple terms are OR by default. For AND, use a bool query with must clauses.

$ curl 'localhost:9200/accounts/person/_search' -d ' { "query": { "bool": { "must": [ {"match": {"desc": "软件"}}, {"match": {"desc": "系统"}} ] } } }'

7. References

Elasticsearch official guide: https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html

A Practical Introduction to Elasticsearch: https://www.elastic.co/blog/a-practical-introduction-to-elasticsearch

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

IndexingElasticsearchinstallationQuery DSLfull-text searchChinese Analyzer
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.