Sorting Image Filenames with Numbers in Elasticsearch: Script and Ingest‑Pipeline Solutions
The article explains how to sort image filenames containing numbers in Elasticsearch by either using a painless _script to extract the numeric part at query time or, more efficiently, by preprocessing filenames with an ingest pipeline that creates a numeric field for fast sorting, recommending the pipeline for performance‑critical use cases.
1. Background
In the digital era, managing image data has become part of data architecture. A common challenge is how to index and retrieve image files efficiently, especially when the file names contain numeric parts that need to be sorted in a natural order.
The problem originates from a discussion in an Elastic technical group.
2. Solution Discussion
Elasticsearch often requires sorting data. The article references several previous posts about sorting in different business scenarios.
Two main approaches are explored:
Using a script sorting.
Pre‑processing the file name with an ingest pipeline that extracts a numeric field.
3. Implementation
3.1 Script‑based sorting
The _script sort allows a custom painless script to extract the numeric part from photo_id and sort by it.
GET /my_photos/_search
{
"query": { "match_all": {} },
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": """
String photoId = doc['photo_id.keyword'].value;
if (photoId == null) return 0;
Matcher m = /[0-9]+/.matcher(photoId);
if (m.find()) {
return Integer.parseInt(m.group(0));
} else {
return 0;
}
"""
},
"order": "asc"
}
}
}The result is correctly ordered according to the numeric part of the file name.
3.2 Pre‑processing with an ingest pipeline
A pipeline using the grok processor extracts the number from photo_id into a new photo_number field.
PUT _ingest/pipeline/extract_photo_number
{
"description": "Extracts numbers from photo_id and stores it in photo_number",
"processors": [
{ "grok": { "field": "photo_id", "patterns": ["%{NUMBER:photo_number:int}"] } }
]
}
DELETE my_photos_20240201
PUT my_photos_20240201
{
"settings": { "default_pipeline": "extract_photo_number" },
"mappings": {
"properties": {
"photo_id": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
"photo_number": { "type": "long" },
"upload_date": { "type": "date" }
}
}
}
POST my_photos_20240201/_bulk
{ "index": { "_id": "1" } }
{ "photo_id": "photo1.jpg", "upload_date": "2024-02-01T10:00:00" }
{ "index": { "_id": "2" } }
{ "photo_id": "photo2.jpg", "upload_date": "2024-02-01T10:05:00" }
{ "index": { "_id": "3" } }
{ "photo_id": "photo12.jpg", "upload_date": "2024-02-01T10:10:00" }
{ "index": { "_id": "4" } }
{ "photo_id": "photo111.jpg", "upload_date": "2024-02-01T10:15:00" }
POST my_photos_20240201/_search
{
"query": { "match_all": {} },
"sort": [ { "photo_number": { "order": "asc" } } ]
}The ingest pipeline extracts the numeric value at index time, allowing fast numeric sorting without runtime script execution.
4. Comparison
The pre‑processing approach moves the parsing work to the indexing phase, reducing query latency.
Sorting on a numeric photo_number field is more efficient and consumes fewer resources.
It also improves data model clarity and overall index efficiency.
5. Conclusion
The article discusses the challenge of sorting image file names that contain numbers in Elasticsearch and presents two viable solutions. For performance‑critical scenarios, the ingest‑pipeline method is recommended; for more flexible or complex requirements, script‑based sorting offers greater adaptability. Future data modeling should consider anticipated query patterns, such as numeric sorting of file names, to enable efficient indexing and retrieval.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.