Big Data 13 min read

How to Build a Scalable Business Operation Log System with Canal and Elasticsearch

This article walks through the design and implementation of a decoupled, high‑performance business operation log solution that captures MySQL binlog changes via Canal, streams them through Kafka, and stores and queries them in Elasticsearch, addressing challenges such as batch operations, multi‑table transactions, and non‑business data filtering.

Mingyi World Elasticsearch

Nov 3, 2024

How to Build a Scalable Business Operation Log System with Canal and Elasticsearch

Problem Origin

Business systems need operation logs to trace user actions, record data changes, and support rollback. The required features include recording user, time, function, log type, description, and before‑after data, providing a visual query page, and supporting erroneous operation rollback.

Log Types

System logs : debug, info, warn, error levels, mainly for developers and ops.

Operation logs : record actual business actions (who, when, what) and are stored in databases for users or admins.

Limitations of Traditional Implementations

Embedding logs in business code

Manually adding log statements around each DB operation increases code complexity and lacks generality.

AOP (Aspect‑Oriented Programming)

While AOP separates logging from business logic, it struggles with capturing before‑and‑after values, batch operations, and multi‑table associations. An example using newData and oldData annotations shows issues such as needing extra queries for old values, difficulty handling List types, and inability to log multi‑table operations.

Solution Exploration

To overcome these problems, the article proposes using Canal , an open‑source Alibaba component that subscribes to MySQL binlog events.

Canal Technical Principle

Canal captures real‑time changes (INSERT, UPDATE, DELETE) from MySQL binlog and converts them into consumable events.

Why Choose Canal?

Decouples business code – no code changes required.

Supports batch operations and multi‑table associations by reading changes at the database layer.

Language‑agnostic – works with any tech stack.

Canal Pros and Cons

Advantages : removes coupling of new/old data, handles batch and multi‑table scenarios, language independent.

Disadvantages : requires unified table design, may face compatibility issues with complex cascades, and needs filtering of non‑business changes.

Implementation

Data Parsing and Transformation

Canal reads the business database binlog, parses it, and pushes the events to Kafka. A sample parsed JSON includes fields such as data, old, database, table, type, and ts:

{
  "data": [{
    "id": "122158992930664499",
    "goodsName": "新商品名称",
    "update_time": "2020-08-26 13:45:46"
  }],
  "old": [{
    "goodsName": "旧商品名称",
    "update_time": "2020-08-26 09:15:13"
  }],
  "database": "db_business",
  "table": "goods",
  "type": "UPDATE",
  "ts": 1587879945698
}

Defining a Generic Interface

A unified API returns before‑and‑after data along with field descriptions. An example payload for a product update includes id, groupID, newObject, oldObject, fieldsDescription, action, description, operator, databaseName, tableName, module, txID, and timestamp:

{
  "id": "10001",
  "groupID": 1700,
  "system": "01",
  "newObject": {"goodsName": "商品名称001", "goodsCode": "商品编码001"},
  "oldObject": {"goodsName": "商品名称", "goodsCode": "商品编码"},
  "fieldsDescription": {"goodsID": "商品ID", "goodsName": "商品名称", "goodsCode": "商品编码"},
  "action": 2,
  "description": "修改商品信息",
  "operator": "user001",
  "databaseName": "db_business",
  "tableName": "goods",
  "module": "商品管理",
  "txID": "36aef98585db4e7a98f9694c8ef28b8c",
  "timestamp": 1587879945698
}

Field explanations (e.g., groupID = group ID, operator = operator, action = 0‑add, 1‑delete, 2‑modify) are listed to aid front‑end display.

Storing Logs in Elasticsearch

Because the schema can vary, a NoSQL store is chosen. Elasticsearch indices are created per month per business line. The mapping defines properties such as id, groupID, bizSource, action, description, operator, databaseName, tableName, bizmodule, txId, and nested objects newObject, oldObject, and fieldsDescription:

PUT goods-nested
{
  "mappings": {
    "properties": {
      "id": {"type": "integer"},
      "groupID": {"type": "integer"},
      "bizSource": {"type": "keyword"},
      "action": {"type": "integer"},
      "description": {"type": "keyword"},
      "operator": {"type": "keyword"},
      "databaseName": {"type": "keyword"},
      "tableName": {"type": "keyword"},
      "bizmodule": {"type": "keyword"},
      "txId": {"type": "keyword"},
      "newObject": {
        "type": "nested",
        "properties": {
          "goodsID": {"type": "integer"},
          "goodsName": {"type": "keyword"},
          "goodsCode": {"type": "keyword"}
        }
      },
      "oldObject": {
        "type": "nested",
        "properties": {
          "goodsID": {"type": "integer"},
          "goodsName": {"type": "keyword"},
          "goodsCode": {"type": "keyword"}
        }
      },
      "fieldsDescription": {
        "type": "nested",
        "properties": {
          "goodsID": {"type": "integer"},
          "goodsName": {"type": "keyword"},
          "goodsCode": {"type": "keyword"}
        }
      }
    }
  }
}

Data Insertion Example

POST goods-nested/_bulk
{ "index": { "_index": "goods-nested", "_id": "10001" } }
{ "id":"10001","groupID":1700,"bizSource":"Scm","newObject":{"goodsID":1001,"goodsName":"商品名称001","goodsCode":"商品编码001"},"oldObject":{"goodsID":1001,"goodsName":"商品名称","goodsCode":"商品编码"},"fieldsDescription":{"goodsName":"商品名称","goodsCode":"商品编码"},"action":2,"description":"修改集团品相","operator":"001","databaseName":"db_supply_chain_basic","tableName":"tbl_chain_distribution","bizmodule":"集团品相","txId":"36aef98585db4e7a98f9694c8ef28b8c" }
... (additional bulk entries omitted for brevity)

Data Query Example

Query logs for a specific product ID:

GET goods-nested/_search
{
  "query": {
    "nested": {
      "path": "newObject",
      "query": {
        "bool": {
          "must": [
            { "match": { "newObject.goodsID": "1001" } }
          ]
        }
      }
    }
  }
}

A more complex query filters by groupID and multiple goodsName values using terms inside a bool must clause.

Handling Multi‑Table Associations

When an operation spans multiple tables, the binlog order may be inconsistent. The solution aggregates operations belonging to the same transaction using the transaction ID ( txID) to enable complete tracing and rollback.

Filtering Non‑Business Changes

Binlog may contain changes from DB maintenance tasks. A filtering step removes these non‑business events, retaining only true business operation logs.

Conclusion

The presented design successfully decouples business logic from logging, captures database‑level changes with Canal, and stores them in Elasticsearch for efficient querying and visualization. Remaining challenges include more sophisticated multi‑table aggregation and ensuring accurate operator attribution, which require ongoing refinement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data pipeline Elasticsearch Kafka Binlog Canal NoSQL Operation Log

Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.