Databases 11 min read

How to Build a Custom Elasticsearch Query that Sorts by Time Buckets and Relevance

This article walks through creating a custom Elasticsearch 8.x query that first groups documents into time buckets (e.g., within 3 days, 4‑7 days, older) using a pre‑computed time_bucket field, then sorts each bucket by the _score relevance of the content field, covering pipeline definition, mapping, sample data, DSL query, and practical considerations.

Mingyi World Elasticsearch

Jan 14, 2025

How to Build a Custom Elasticsearch Query that Sorts by Time Buckets and Relevance

Problem Statement

We need to sort search results in Elasticsearch 8.x by two criteria: a predefined time bucket (e.g., "within 3 days", "4‑7 days", "older") and, within each bucket, by relevance score of the content field.

Solution Overview

The approach consists of four steps:

Pre‑compute a time_bucket integer field during indexing.

Define a mapping that sets createTime as date (format yyyy - MM - dd) and time_bucket as integer.

Use an ingest pipeline with a Painless script to calculate time_bucket from createTime.

Query with a DSL that sorts first on time_bucket (ascending) and then on _score (descending).

Step 1: Define Ingest Pipeline

PUT _ingest/pipeline/add_time_bucket
{
  "description": "Add time_bucket based on createTime",
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": """
          def sdf = new SimpleDateFormat(\"yyyy - MM - dd\");
          sdf.setTimeZone(TimeZone.getTimeZone(\"UTC\"));
          def createDate = sdf.parse(ctx.createTime).getTime();
          def now = System.currentTimeMillis();
          def diffDays = (now - createDate) / (1000 * 60 * 60 * 24);
          if (diffDays <= 3) {
            ctx.time_bucket = 1;
          } else if (diffDays <= 7) {
            ctx.time_bucket = 2;
          } else {
            ctx.time_bucket = 3;
          }
        """
      }
    }
  ]
}

The script parses createTime, computes the day difference to the current UTC time, and assigns time_bucket values 1, 2, or 3 accordingly.

Step 2: Create Index and Mapping

PUT t1
{
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "createTime": { "type": "date", "format": "yyyy - MM - dd" },
      "content": { "type": "text" },
      "time_bucket": { "type": "integer" }
    }
  },
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1,
    "default_pipeline": "add_time_bucket"
  }
}

The mapping ensures correct field types and automatically applies the ingest pipeline on document insertion.

Step 3: Index Sample Documents

POST t1/_bulk
{ "index": { "_id": "1" } }
{ "id": "102", "createTime": "2025 - 01 - 06", "content": "这是一个3天内测试内容1" }
{ "index": { "_id": "2" } }
{ "id": "101", "createTime": "2025 - 01 - 06", "content": "这是一个3天内测试内容2" }
{ "index": { "_id": "3" } }
{ "id": "103", "createTime": "2025 - 01 - 06", "content": "这是一个3天内测试内容3" }
{ "index": { "_id": "4" } }
{ "id": "4", "createTime": "2025 - 01 - 02", "content": "另一个测试内容" }
{ "index": { "_id": "5" } }
{ "id": "5", "createTime": "2025 - 01 - 02", "content": "另一个测试内容2" }
{ "index": { "_id": "6" } }
{ "id": "5", "createTime": "2024 - 12 - 28", "content": "更早的测试内容" }

After the pipeline runs, the documents receive time_bucket values: 1 for the three newest docs, 2 for the two middle‑dated docs, and 3 for the oldest doc.

Step 4: Execute the Query

GET t1/_search
{
  "query": { "match": { "content": "测试" } },
  "sort": [
    { "time_bucket": { "order": "asc" } },
    { "_score": { "order": "desc" } }
  ]
}

The query first orders results by time_bucket (1 → 2 → 3) and then, within each bucket, by the relevance score of content.

Result Explanation

Assuming the current date is 2025 - 01 - 09, the response shows documents grouped by bucket, with the highest‑scoring documents appearing first inside each group. The sort array in each hit reflects the bucket number and the _score value.

Key Considerations

Time format consistency: Ensure createTime values match the mapping format yyyy - MM - dd.

Timezone handling: The script uses UTC to avoid discrepancies.

Script performance: Keep the ingest script simple to minimize indexing overhead.

Data updates: Because time_bucket depends on the current date, run periodic Update By Query jobs to refresh bucket values.

Debugging: Simplify the script step‑by‑step if errors occur, e.g., first output createDate then add bucket logic.

Index settings: Adjust number_of_shards and number_of_replicas based on data volume and query load.

Conclusion

By pre‑computing a time_bucket field via an ingest pipeline, defining a proper mapping, and using a two‑level sort in the DSL, Elasticsearch can efficiently satisfy complex sorting requirements that combine temporal bucketing with relevance ranking.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Ingest Pipeline custom sorting DSL query time_bucket

Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.