Introduction to Elasticsearch: Core Concepts, Query Types, Pagination, and Data Synchronization
This article provides a comprehensive overview of Elasticsearch, covering its distributed storage architecture, core data model concepts, analysis and query capabilities, practical next‑token pagination techniques, join strategies, and various data synchronization methods for integrating Elasticsearch with other systems.
Elasticsearch (ES) is a distributed storage and search engine widely used in scenarios such as Wikipedia and GitHub search. This article introduces its core concepts, including nodes, clusters, shards, replicas, and data‑model elements like index, type, and document.
It explains the analysis capabilities of ES, covering the inverted index, analyzers, tokenization, normalization, and filtering, and discusses the limitations of built‑in analyzers for Chinese text.
The article then describes major query types supported by ES, from term and fuzzy queries at the word level to full‑text queries such as match and match_phrase , and details the Bool query structure (must, should, must_not, filter) and relevance scoring using TF‑IDF and field length.
For practical pagination, it presents the sort + search_after (nextToken) approach with example DSL, showing how to construct the request and use the returned cursor for subsequent pages:
GET /service_version_index/service_version_type/_search
{
"size": 100,
"sort": [
{"gmt_modified": "desc"},
{"score": "desc"},
{"id": "desc"}
],
...
}Example of the cursor returned by ES:
{
"sort": [1614561419000, "6FxZJXgBE6QbUWetnarH"]
}Using the cursor for the next page:
GET /service_version_index/service_version_type/_search
{
"size": 100,
"sort": [
{"gmt_modified": "desc"},
{"score": "desc"},
{"id": "desc"}
],
"query": { ... },
"search_after": [1614561419000, "6FxZJXgBE6QbUWetnarH"]
}The article also covers strategies for implementing joins in ES, including parent‑child documents, service‑side joins, and the use of wide tables, comparing wide versus narrow table designs.
Finally, data synchronization methods are discussed, ranging from manual writes and Alibaba Cloud DTS to Logstash and view‑based ETL. An example of creating a SQL view for feeding ES is provided:
CREATE VIEW my_view AS
SELECT sv.*, s.score, sc.category
FROM service_version sv
JOIN service s ON sv.service_id = s.service_id
JOIN service_category sc ON s.service_id = sc.service_id;Additional references and resources are listed for further reading.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.