Big Data 26 min read

Master Elasticsearch: From Basics to Advanced Search, Indexing, and Aggregations

This comprehensive guide introduces Elasticsearch’s core concepts, installation, REST API operations, and practical examples of indexing, updating, deleting, bulk processing, searching, filtering, and aggregations, helping readers build scalable, near‑real‑time search and analytics solutions for diverse use cases.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master Elasticsearch: From Basics to Advanced Search, Indexing, and Aggregations

Elasticsearch is a distributed RESTful search and analytics engine.

Query – supports structured, unstructured, geo, and metric searches.

Analysis – aggregates large data sets to reveal trends and patterns.

Speed – extremely fast.

Scalability – runs on a laptop or thousands of servers handling petabytes.

Resilience – designed for distributed environments.

Flexibility – works with numeric, text, geo, structured, and unstructured data.

Hadoop & Spark integration.

Getting Started

Elasticsearch is a highly scalable open‑source full‑text search and analytics engine that stores, searches and analyzes large volumes of data in near real time.

Typical use cases: e‑commerce product search, log collection and analysis, price‑alert platform, business intelligence dashboards.

Basic Concepts

Near Real‑Time (NRT) – indexing to searchable latency is typically one second.

Cluster – a collection of one or more nodes that share data and provide unified indexing and search.

Node – an individual server that belongs to a cluster, stores data and participates in indexing and search.

Index – a collection of documents with similar characteristics; index name must be lowercase.

Document – the basic unit of information stored in an index, represented in JSON.

Shards & Replicas – an index can be split into multiple primary shards; each shard can have replica copies for high availability and increased throughput.

Installation

tar -zxf elasticsearch-6.3.2.tar.gz
cd elasticsearch-6.3.2
bin/elasticsearch
# do not run as root
By default Elasticsearch uses port 9200 to provide access to its REST API. This port is configurable if necessary.

Check that Elasticsearch is running:

curl http://localhost:9200/

The REST API

Cluster health

Request: curl -X GET "localhost:9200/_cat/health?v" Response shows status green, yellow or red.

Green – all functions normal.

Yellow – all data available but some replicas unassigned.

Red – some data unavailable.

List nodes

curl -X GET "localhost:9200/_cat/nodes?v"

Shows node name, IP, roles, etc.

List indices

curl -X GET "localhost:9200/_cat/indices?v"

Shows existing indices; initially none.

Create an index

curl -X PUT "localhost:9200/customer?pretty"

Response acknowledges creation with default 5 primary shards and 1 replica.

Index a document

curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d '{"name":"John Doe"}'

Response confirms document created.

Get a document

curl -X GET "localhost:9200/customer/_doc/1?pretty"

Returns the stored JSON source.

Delete an index

curl -X DELETE "localhost:9200/customer?pretty"

Index is removed.

Modifying Data

Update a document

curl -X POST "localhost:9200/customer/_doc/1/_update?pretty" -H 'Content-Type: application/json' -d '{"doc":{"name":"Jane Doe","age":20}}'

Result shows version increment.

Scripted update (increment age)

curl -X POST "localhost:9200/customer/_doc/1/_update?pretty" -H 'Content-Type: application/json' -d '{"script":"ctx._source.age += 5"}'

Updates the age field using a painless script.

Bulk operations

curl -X POST "localhost:9200/customer/_doc/_bulk?pretty" -H 'Content-Type: application/json' -d '
{ "index":{ "_id":"1" } }
{ "name":"John Doe" }
{ "index":{ "_id":"2" } }
{ "name":"Jane Doe" }
'

Creates two documents in a single request.

curl -X POST "localhost:9200/customer/_doc/_bulk?pretty" -H 'Content-Type: application/json' -d '
{ "update":{ "_id":"1" } }
{ "doc":{"name":"John Doe becomes Jane Doe"} }
{ "delete":{ "_id":"2" } }
'

Updates one document and deletes another.

Retrieving Data

Search API

Match‑all query returns all documents:

curl -X GET "localhost:9200/bank/_search?q=*&sort=account_number:asc&pretty"

Response includes took, timed_out, _shards and hits.

Request‑body version:

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d '
{
  "query": { "match_all": {} },
  "sort": [ { "account_number": "asc" } ]
}
'

Query DSL

Match query on a field:

{
  "query": { "match": { "account_number": 20 } }
}

Match query on text field:

{
  "query": { "match": { "address": "mill" } }
}

Bool query with must (AND):

{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

Bool query with should (OR) and must_not (NOT) follow the same pattern.

Filtering

Range filter without scoring:

{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": { "balance": { "gte": 20000, "lte": 30000 } }
      }
    }
  }
}

Aggregations

Terms aggregation by state:

{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": { "field": "state.keyword" }
    }
  }
}

Average balance per state:

{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": { "field": "state.keyword" },
      "aggs": {
        "average_balance": { "avg": { "field": "balance" } }
      }
    }
  }
}

Nested aggregation: age ranges, then gender, then average balance:

{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          { "from": 20, "to": 30 },
          { "from": 30, "to": 40 },
          { "from": 40, "to": 50 }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": { "field": "gender.keyword" },
          "aggs": {
            "average_balance": { "avg": { "field": "balance" } }
          }
        }
      }
    }
  }
}

These examples demonstrate how to install Elasticsearch, manage clusters, index and manipulate documents, and perform powerful search, filter, and aggregation operations using its RESTful API.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

indexingsearch engineElasticsearchREST APITutorialbulk APIaggregation
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.