Backend Development 9 min read

How EasySearch Rules Engine Tags Data at Ingest Time

The article walks through EasySearch's Rules plugin, showing how its high‑performance C++ rule engine can automatically match and tag documents during the ingest pipeline, enabling zero‑latency content classification for scenarios like regional, sentiment, and entity tagging.

Mingyi World Elasticsearch

Mar 27, 2026

How EasySearch Rules Engine Tags Data at Ingest Time

What is the Rules plugin?

The Rules plugin is a built‑in rule‑matching engine of EasySearch. It performs real‑time matching when data is written, eliminating the need for post‑processing scripts. Its core capabilities include:

Real‑time matching : triggers automatically on data write.

High performance : powered by an optimized C++ rule engine that supports complex and large rule sets.

Flexible configuration : custom fields, regex, numeric ranges.

Multi‑rule matching : a document can match multiple rules, with all tags stored in an array.

Cluster broadcast : rules are compiled and distributed to all ingest nodes in a multi‑node environment.

Typical use cases

Regional tags – automatically recognize cities, provinces, and regions in news.

Sentiment tags – classify content as positive, negative, or neutral.

Person tags – extract and label key person names from large text corpora.

Product recommendation tags – tag items by price range or category.

Environment preparation

The Rules plugin depends on a native C++ library, so the EasySearch configuration must disable the system‑call filter: ./bin/easysearch-plugin install rules Then edit config/easysearch.yml and add:

bootstrap.system_call_filter: false

Why? The plugin uses JNI and native code (libruledb‑r.so) which performs system calls such as fork and exec . The default seccomp filter blocks these calls. Disabling it reduces security, so it should only be done on nodes running the Rules plugin.

Quick start: four steps to run the first rule

Step 1 – Import rules

Use the POST /_match_rules/{repo_id}/_import API to import a rule set into the .match_rules system index. Example for a geographic‑tag rule library:

{
  "name": "地域标签规则库",
  "description": "用于新闻资讯地域识别的规则库",
  "tags": ["geo", "news-tagging"],
  "rules": [
    {"expression": "北京 or 京城 or 首都", "description": "地域_北京"},
    {"expression": "上海 or 沪上 or 浦东 or 申城", "description": "地域_上海"},
    {"expression": "深圳 or 鹏城 or 南山区 or 前海", "description": "地域_深圳"}
  ]
}

Response confirms success and shows the rule count.

Step 2 – Compile the rule library

Compiled rules are stored as a binary C++ library. Trigger compilation with:

POST /_match_rules/geo_tags_v1/_compile
{
  "quiet": true
}

The response includes duration_ms and confirms that the library is now in compiled state.

Step 3 – Create an ingest pipeline

Integrate the rule engine into the write flow:

PUT _ingest/pipeline/geo-tagger
{
  "description": "地域标签自动标注",
  "processors": [
    {"check_match_rules": {"id": "geo_tags_v1"}}
  ]
}

The id parameter must match the repo_id used during import.

Step 4 – Write a document and verify tagging

Index a news article through the pipeline:

POST news-articles/_doc?pipeline=geo-tagger
{
  "title": "北京发布2025年经济数据报告",
  "content": "上海和深圳的GDP增速均超过全国平均水平",
  "timestamp": "2025-12-31T10:00:00Z"
}

Retrieving the document shows an automatically generated tags field containing entries such as #0#地域_北京 and #1#地域_上海.

Advanced usage

If you prefer a custom field instead of the default tags, set target_field in the processor:

{
  "check_match_rules": {
    "id": "geo_tags_v1",
    "target_field": "geo_labels"
  }
}

Two import modes

Overwrite import (POST) : deletes old rules and writes new ones – suitable for full updates.

Append import (PUT) : keeps existing rules and adds new ones – suitable for incremental additions.

Note: After any import, the rule set must be re‑compiled before it takes effect.

Managing rule libraries

Query a rule library without the rule bodies:

GET .match_rules/_doc/geo_tags_v1?_source_excludes=rules

Delete a rule library (the system checks pipeline dependencies and aborts if the library is in use):

DELETE /_match_rules/geo_tags_v1

Architecture overview

The data flow is:

Data write → Ingest pipeline triggers → Rules engine matches → Tags written to document → Document persisted.

Conclusion

The guide demonstrates the full lifecycle of EasySearch Rules: import → compile → create pipeline → write and verify automatic tagging. By performing matching at ingest, the plugin eliminates post‑processing and achieves zero‑latency content classification.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch C Tagging Ingest Pipeline Rules plugin

Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.