Backend Development 6 min read

How to Set Default Values in Elasticsearch: Pipelines, Scripts, and Workarounds

This article explains three practical methods for assigning default values in Elasticsearch—using ingest pipelines, update‑by‑query scripts, and pipeline scripts—while also addressing how to maintain create_time and update_time fields in a way similar to relational databases.

Programmer DD

Sep 3, 2021

How to Set Default Values in Elasticsearch: Pipelines, Scripts, and Workarounds

1. Practical Issues

When using Elasticsearch, you often still have traces of relational databases such as MySQL. Two common questions arise:

Can a default value be set in the mapping when adding a new field?

What is a good way to maintain document create_time and update_time?

This article discusses implementation schemes for default values in Elasticsearch.

2. Default Values at the Mapping Level

Strictly speaking, Elasticsearch does not support setting a default value for a field in the mapping definition.

Some may wonder whether null_value counts as a default; it does not.

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "status_code": {
        "type": "keyword",
        "null_value": "NULL"
      }
    }
  }
}

The null_value replaces a null with the string "NULL" so that the empty value can be indexed or searched.

Since mapping cannot directly define arbitrary defaults, we must look for alternative solutions.

3. Workarounds for Setting Default Values

Three approaches are presented.

3.1 Solution 1: Use an Ingest Pipeline

# Create append pipeline
PUT _ingest/pipeline/add_default_pipeline
{
  "processors": [
    {
      "set": {
        "field": "sale_count",
        "value": 1
      }
    }
  ]
}

# Create index
PUT customer
{
  "mappings": {
    "properties": {
      "sale_count": { "type": "integer" },
      "major": { "type": "keyword", "null_value": "NULL" }
    }
  },
  "settings": {
    "index": { "default_pipeline": "add_default_pipeline" }
  }
}

Insert a document to verify:

POST customer/_doc/1
{
  "major": null
}

Result:

{
  "max_score": 1.0,
  "hits": [
    {
      "_index": "customer",
      "_type": "_doc",
      "_id": "1",
      "_score": 1.0,
      "_source": { "major": null, "sale_count": 1 }
    }
  ]
}

This sets a default value of 1 for sale_count by linking the pipeline in the index settings.

Similarly, create_time can be added via a pipeline:

PUT _ingest/pipeline/create_time_pipeline
{
  "description": "Adds create_time timestamp to documents",
  "processors": [
    { "set": { "field": "_source.create_time", "value": "{{_ingest.timestamp}}" } }
  ]
}

PUT my_index_0003
{
  "settings": { "index.default_pipeline": "create_time_pipeline" }
}

POST my_index_0003/_doc/1
{}

update_time

can be maintained by application code or scripts that add a timestamp on updates.

3.2 Solution 2: Update‑by‑Query to Add Default Values

POST customer/_doc/2
{
  "major": null
}

POST customer/_update_by_query
{
  "script": {
    "lang": "painless",
    "source": "if (ctx._source.major == null) {ctx._source.major = 'student'}"
  }
}

All documents where major is null are updated to "student".

This approach writes data first, then updates it, which is a less direct way of providing defaults.

3.3 Solution 3: Pipeline Script Update

PUT _ingest/pipeline/update_pipeline
{
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": """
          if (ctx['major'] == null) {ctx['major'] = 'student'}
        """
      }
    }
  ]
}

POST customer/_doc/4
{
  "major": null
}

POST customer/_update_by_query?pipeline=update_pipeline
{
  "query": { "match_all": {} }
}

The result is the same as Solution 2. The key difference is the script’s access syntax: in a script processor you use ctx['major'], while in an ingest pipeline you can also use ctx._source.major.

4. Summary

This article presented three ways to emulate relational‑database‑style default values in Elasticsearch. Only the first method sets defaults before indexing; the other two update documents after they have been written. For most practical scenarios, the pipeline‑based first solution is recommended.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Elasticsearch Ingest Pipeline update_by_query Default Values

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.