Elasticsearch Logical and Physical Design, Indexing and Search Operations
This article explains Elasticsearch's logical and physical design, how documents are structured and indexed, the role of shards and replicas, and provides practical examples of indexing, searching, aggregations, and retrieving documents using RESTful APIs.
Elasticsearch is a distributed, near‑real‑time search engine designed for high‑performance search over massive datasets, typically billions of documents.
Logical design treats documents as the basic unit, analogous to rows in a relational table, grouped into types (now deprecated) and stored in indices, which act like databases.
Physical design splits each index into primary shards (default 5) and replica shards, distributing them across cluster nodes for scalability and fault tolerance.
Documents are JSON objects, self‑contained, hierarchical, and schema‑free; fields are mapped to types (e.g., text, keyword) during indexing.
Indexing a document is performed via HTTP PUT requests; the response includes index, type, ID, version and shard information.
curl -XPUT '172.16.1.127:9200/get-together/_doc/1?pretty' -H 'Content-Type: application/json' -d '{
"name": "Elasticsearch Denver",
"organizer": "Lee"
}'Search queries can be expressed with query_string, term, or bool filters, optionally limiting returned fields and using aggregations for analytics.
curl "172.16.1.127:9200/get-together/_search?pretty" -H 'Content-Type: application/json' -d '{
"query": {
"query_string": {
"query": "elasticsearch",
"fields": ["name", "title"],
"default_operator": "AND"
}
}
}'Aggregations require keyword or doc_values fields; text fields need fielddata enabled or a keyword sub‑field.
# Enable fielddata
curl -XPOST "172.16.1.127:9200/get-together/_mapping/_doc?pretty" -H 'Content-Type: application/json' -d '{
"properties": {
"organizer": {
"type": "text",
"fielddata": "true"
}
}
}'
# Aggregation query
curl 172.16.1.127:9200/get-together/_doc/_search?pretty -H 'Content-Type: application/json' -d '{
"aggregations": {
"organizers": {
"terms": {"field": "organizer"}
}
}
}'Retrieving a document by ID is faster than searching because it bypasses the query phase.
curl '172.16.1.127:9200/get-together/_doc/1?pretty'Understanding both logical and physical design helps optimize Elasticsearch performance, scalability, and reliability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
