How to Build a Custom Elasticsearch Query that Sorts by Time Buckets and Relevance
This article walks through creating a custom Elasticsearch 8.x query that first groups documents into time buckets (e.g., within 3 days, 4‑7 days, older) using a pre‑computed time_bucket field, then sorts each bucket by the _score relevance of the content field, covering pipeline definition, mapping, sample data, DSL query, and practical considerations.
Problem Statement
We need to sort search results in Elasticsearch 8.x by two criteria: a predefined time bucket (e.g., "within 3 days", "4‑7 days", "older") and, within each bucket, by relevance score of the content field.
Solution Overview
The approach consists of four steps:
Pre‑compute a time_bucket integer field during indexing.
Define a mapping that sets createTime as date (format yyyy - MM - dd) and time_bucket as integer.
Use an ingest pipeline with a Painless script to calculate time_bucket from createTime.
Query with a DSL that sorts first on time_bucket (ascending) and then on _score (descending).
Step 1: Define Ingest Pipeline
PUT _ingest/pipeline/add_time_bucket
{
"description": "Add time_bucket based on createTime",
"processors": [
{
"script": {
"lang": "painless",
"source": """
def sdf = new SimpleDateFormat(\"yyyy - MM - dd\");
sdf.setTimeZone(TimeZone.getTimeZone(\"UTC\"));
def createDate = sdf.parse(ctx.createTime).getTime();
def now = System.currentTimeMillis();
def diffDays = (now - createDate) / (1000 * 60 * 60 * 24);
if (diffDays <= 3) {
ctx.time_bucket = 1;
} else if (diffDays <= 7) {
ctx.time_bucket = 2;
} else {
ctx.time_bucket = 3;
}
"""
}
}
]
}The script parses createTime, computes the day difference to the current UTC time, and assigns time_bucket values 1, 2, or 3 accordingly.
Step 2: Create Index and Mapping
PUT t1
{
"mappings": {
"properties": {
"id": { "type": "keyword" },
"createTime": { "type": "date", "format": "yyyy - MM - dd" },
"content": { "type": "text" },
"time_bucket": { "type": "integer" }
}
},
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"default_pipeline": "add_time_bucket"
}
}The mapping ensures correct field types and automatically applies the ingest pipeline on document insertion.
Step 3: Index Sample Documents
POST t1/_bulk
{ "index": { "_id": "1" } }
{ "id": "102", "createTime": "2025 - 01 - 06", "content": "这是一个3天内测试内容1" }
{ "index": { "_id": "2" } }
{ "id": "101", "createTime": "2025 - 01 - 06", "content": "这是一个3天内测试内容2" }
{ "index": { "_id": "3" } }
{ "id": "103", "createTime": "2025 - 01 - 06", "content": "这是一个3天内测试内容3" }
{ "index": { "_id": "4" } }
{ "id": "4", "createTime": "2025 - 01 - 02", "content": "另一个测试内容" }
{ "index": { "_id": "5" } }
{ "id": "5", "createTime": "2025 - 01 - 02", "content": "另一个测试内容2" }
{ "index": { "_id": "6" } }
{ "id": "5", "createTime": "2024 - 12 - 28", "content": "更早的测试内容" }After the pipeline runs, the documents receive time_bucket values: 1 for the three newest docs, 2 for the two middle‑dated docs, and 3 for the oldest doc.
Step 4: Execute the Query
GET t1/_search
{
"query": { "match": { "content": "测试" } },
"sort": [
{ "time_bucket": { "order": "asc" } },
{ "_score": { "order": "desc" } }
]
}The query first orders results by time_bucket (1 → 2 → 3) and then, within each bucket, by the relevance score of content.
Result Explanation
Assuming the current date is 2025 - 01 - 09, the response shows documents grouped by bucket, with the highest‑scoring documents appearing first inside each group. The sort array in each hit reflects the bucket number and the _score value.
Key Considerations
Time format consistency: Ensure createTime values match the mapping format yyyy - MM - dd.
Timezone handling: The script uses UTC to avoid discrepancies.
Script performance: Keep the ingest script simple to minimize indexing overhead.
Data updates: Because time_bucket depends on the current date, run periodic Update By Query jobs to refresh bucket values.
Debugging: Simplify the script step‑by‑step if errors occur, e.g., first output createDate then add bucket logic.
Index settings: Adjust number_of_shards and number_of_replicas based on data volume and query load.
Conclusion
By pre‑computing a time_bucket field via an ingest pipeline, defining a proper mapping, and using a two‑level sort in the DSL, Elasticsearch can efficiently satisfy complex sorting requirements that combine temporal bucketing with relevance ranking.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
