Big Data 21 min read

Elasticsearch Logical and Physical Design, Indexing and Search Operations

This article explains Elasticsearch's logical and physical design, how documents are structured and indexed, the role of shards and replicas, and provides practical examples of indexing, searching, aggregations, and retrieving documents using RESTful APIs.

Big Data Technology & Architecture

Nov 30, 2020

Elasticsearch Logical and Physical Design, Indexing and Search Operations

Elasticsearch is a distributed, near‑real‑time search engine designed for high‑performance search over massive datasets, typically billions of documents.

Logical design treats documents as the basic unit, analogous to rows in a relational table, grouped into types (now deprecated) and stored in indices, which act like databases.

Physical design splits each index into primary shards (default 5) and replica shards, distributing them across cluster nodes for scalability and fault tolerance.

Documents are JSON objects, self‑contained, hierarchical, and schema‑free; fields are mapped to types (e.g., text, keyword) during indexing.

Indexing a document is performed via HTTP PUT requests; the response includes index, type, ID, version and shard information.

curl -XPUT '172.16.1.127:9200/get-together/_doc/1?pretty' -H 'Content-Type: application/json' -d '{
  "name": "Elasticsearch Denver",
  "organizer": "Lee"
}'

Search queries can be expressed with query_string, term, or bool filters, optionally limiting returned fields and using aggregations for analytics.

curl "172.16.1.127:9200/get-together/_search?pretty" -H 'Content-Type: application/json' -d '{
  "query": {
    "query_string": {
      "query": "elasticsearch",
      "fields": ["name", "title"],
      "default_operator": "AND"
    }
  }
}'

Aggregations require keyword or doc_values fields; text fields need fielddata enabled or a keyword sub‑field.

# Enable fielddata
curl -XPOST "172.16.1.127:9200/get-together/_mapping/_doc?pretty" -H 'Content-Type: application/json' -d '{
  "properties": {
    "organizer": {
      "type": "text",
      "fielddata": "true"
    }
  }
}'

# Aggregation query
curl 172.16.1.127:9200/get-together/_doc/_search?pretty -H 'Content-Type: application/json' -d '{
  "aggregations": {
    "organizers": {
      "terms": {"field": "organizer"}
    }
  }
}'

Retrieving a document by ID is faster than searching because it bypasses the query phase.

curl '172.16.1.127:9200/get-together/_doc/1?pretty'

Understanding both logical and physical design helps optimize Elasticsearch performance, scalability, and reliability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Indexing Search Engine distributed architecture Elasticsearch Doc Values

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.