Mastering Elasticsearch: Painless Scripts for Advanced Array Operations

This article provides a step‑by‑step guide on using Elasticsearch’s Painless scripting language to create, index, and manipulate array‑type fields, covering basic operations like length and element access, as well as advanced aggregations, filtering, and weighted calculations, while highlighting performance considerations.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Mastering Elasticsearch: Painless Scripts for Advanced Array Operations

Background and Challenges

In modern data environments, array fields are common in Elasticsearch. Using Painless scripts for advanced queries and data manipulation can raise debugging and performance challenges.

Painless scripting in Elasticsearch

Painless is a fast, safe, and maintainable scripting language designed for Elasticsearch. It enables calculations, transformations, and conditional logic directly within queries.

Step‑by‑step array operations

3.1 Index creation and data insertion

Create an index vehicles with an array‑capable integer field car_length and other properties.

PUT /vehicles
{
  "mappings": {
    "properties": {
      "issue_date": {"type": "date"},
      "online_date": {
        "properties": {
          "gte": {"type": "date"},
          "lte": {"type": "date"}
        }
      },
      "owner": {"type": "integer"},
      "company_id": {"type": "integer"},
      "goods_type": {"type": "integer"},
      "car_length": {"type": "integer"},
      "car_type": {"type": "keyword"}
    }
  }
}

Bulk insert sample documents where car_length is an array of integers.

POST /vehicles/_bulk
{ "index": {} }
{ "issue_date": "2024-07-07T21:44:56Z", "owner": 6347, "company_id": 2513, "goods_type": 21, "car_length": [16, 18], "car_type": ["sedan", "truck", "trailer"] }

3.2 Array operation examples

Retrieve the first element

Get array length

Sum of elements

Min and max values

Filter elements by condition

Calculate average

Weighted sum (custom business logic)

3.2.1 Retrieve the first element

POST /vehicles/_search
{
  "script_fields": {
    "first_car_length": {
      "script": {
        "lang": "painless",
        "source": "if (doc['car_length'].size() > 0) { return doc['car_length'][0]; } else { return 'none'; }"
      }
    }
  }
}

3.2.2 Get array length

POST /vehicles/_search
{
  "script_fields": {
    "car_length_count": {
      "script": {
        "lang": "painless",
        "source": "doc['car_length'].size()"
      }
    }
  }
}

3.2.3 Sum of elements

POST /vehicles/_search
{
  "script_fields": {
    "car_length_sum": {
      "script": {
        "lang": "painless",
        "source": "int sum = 0; for (int length : doc['car_length']) { sum += length; } return sum;"
      }
    }
  }
}

3.2.4 Min and max values

POST /vehicles/_search
{
  "script_fields": {
    "max_car_length": {
      "script": {
        "lang": "painless",
        "source": "int max = Integer.MIN_VALUE; for (int v : doc['car_length']) { if (v > max) { max = v; } } return max;"
      }
    },
    "min_car_length": {
      "script": {
        "lang": "painless",
        "source": "int min = Integer.MAX_VALUE; for (int v : doc['car_length']) { if (v < min) { min = v; } } return min;"
      }
    }
  }
}

3.2.5 Filter elements by condition

Using the Stream API:

POST /vehicles/_search
{
  "script_fields": {
    "filtered_lengths": {
      "script": {
        "lang": "painless",
        "source": "doc['car_length'].stream().filter(l -> l > 15).collect(Collectors.toList())"
      }
    }
  }
}

Or an explicit loop:

POST /vehicles/_search
{
  "script_fields": {
    "filtered_lengths": {
      "script": {
        "lang": "painless",
        "source": "List filtered = new ArrayList(); for (int l : doc['car_length']) { if (l > 15) { filtered.add(l); } } return filtered;"
      }
    }
  }
}

3.2.6 Average length

POST /vehicles/_search
{
  "script_fields": {
    "average_car_length": {
      "script": {
        "lang": "painless",
        "source": "double sum = 0; for (int l : doc['car_length']) { sum += l; } return sum / doc['car_length'].size();"
      }
    }
  }
}

3.2.7 Weighted sum (custom business logic)

POST /vehicles/_search
{
  "script_fields": {
    "weighted_sum": {
      "script": {
        "lang": "painless",
        "source": "double sum = 0; for (int i = 0; i < doc['car_length'].size(); i++) { sum += doc['car_length'][i] * (i + 1); } return sum;"
      }
    }
  }
}

Conclusion

When using Painless scripts for array manipulation, consider performance and resource consumption. Optimize by caching results, limiting operation size, and choosing appropriate index mappings and data models. Pre‑ingest processing and careful script design improve efficiency.

Further reading

Official Elasticsearch documentation: https://elastic.co/guide/en/elasticsearch/reference/current/index.html

Painless array operators guide: https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-operators-array.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backenddata-processingElasticsearchArrayScriptingPainless
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.