How EasySearch Rules Engine Tags Data at Ingest Time
The article walks through EasySearch's Rules plugin, showing how its high‑performance C++ rule engine can automatically match and tag documents during the ingest pipeline, enabling zero‑latency content classification for scenarios like regional, sentiment, and entity tagging.
What is the Rules plugin?
The Rules plugin is a built‑in rule‑matching engine of EasySearch. It performs real‑time matching when data is written, eliminating the need for post‑processing scripts. Its core capabilities include:
Real‑time matching : triggers automatically on data write.
High performance : powered by an optimized C++ rule engine that supports complex and large rule sets.
Flexible configuration : custom fields, regex, numeric ranges.
Multi‑rule matching : a document can match multiple rules, with all tags stored in an array.
Cluster broadcast : rules are compiled and distributed to all ingest nodes in a multi‑node environment.
Typical use cases
Regional tags – automatically recognize cities, provinces, and regions in news.
Sentiment tags – classify content as positive, negative, or neutral.
Person tags – extract and label key person names from large text corpora.
Product recommendation tags – tag items by price range or category.
Environment preparation
The Rules plugin depends on a native C++ library, so the EasySearch configuration must disable the system‑call filter: ./bin/easysearch-plugin install rules Then edit config/easysearch.yml and add:
bootstrap.system_call_filter: falseWhy? The plugin uses JNI and native code (libruledb‑r.so) which performs system calls such as fork and exec . The default seccomp filter blocks these calls. Disabling it reduces security, so it should only be done on nodes running the Rules plugin.
Quick start: four steps to run the first rule
Step 1 – Import rules
Use the POST /_match_rules/{repo_id}/_import API to import a rule set into the .match_rules system index. Example for a geographic‑tag rule library:
{
"name": "地域标签规则库",
"description": "用于新闻资讯地域识别的规则库",
"tags": ["geo", "news-tagging"],
"rules": [
{"expression": "北京 or 京城 or 首都", "description": "地域_北京"},
{"expression": "上海 or 沪上 or 浦东 or 申城", "description": "地域_上海"},
{"expression": "深圳 or 鹏城 or 南山区 or 前海", "description": "地域_深圳"}
]
}Response confirms success and shows the rule count.
Step 2 – Compile the rule library
Compiled rules are stored as a binary C++ library. Trigger compilation with:
POST /_match_rules/geo_tags_v1/_compile
{
"quiet": true
}The response includes duration_ms and confirms that the library is now in compiled state.
Step 3 – Create an ingest pipeline
Integrate the rule engine into the write flow:
PUT _ingest/pipeline/geo-tagger
{
"description": "地域标签自动标注",
"processors": [
{"check_match_rules": {"id": "geo_tags_v1"}}
]
}The id parameter must match the repo_id used during import.
Step 4 – Write a document and verify tagging
Index a news article through the pipeline:
POST news-articles/_doc?pipeline=geo-tagger
{
"title": "北京发布2025年经济数据报告",
"content": "上海和深圳的GDP增速均超过全国平均水平",
"timestamp": "2025-12-31T10:00:00Z"
}Retrieving the document shows an automatically generated tags field containing entries such as #0#地域_北京 and #1#地域_上海.
Advanced usage
If you prefer a custom field instead of the default tags, set target_field in the processor:
{
"check_match_rules": {
"id": "geo_tags_v1",
"target_field": "geo_labels"
}
}Two import modes
Overwrite import (POST) : deletes old rules and writes new ones – suitable for full updates.
Append import (PUT) : keeps existing rules and adds new ones – suitable for incremental additions.
Note: After any import, the rule set must be re‑compiled before it takes effect.
Managing rule libraries
Query a rule library without the rule bodies:
GET .match_rules/_doc/geo_tags_v1?_source_excludes=rulesDelete a rule library (the system checks pipeline dependencies and aborts if the library is in use):
DELETE /_match_rules/geo_tags_v1Architecture overview
The data flow is:
Data write → Ingest pipeline triggers → Rules engine matches → Tags written to document → Document persisted.
Conclusion
The guide demonstrates the full lifecycle of EasySearch Rules: import → compile → create pipeline → write and verify automatic tagging. By performing matching at ingest, the plugin eliminates post‑processing and achieves zero‑latency content classification.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
