Big Data 15 min read

Turning Raw Logs into Structured Data with DBus Visual Rule Operators

This article explains how the open‑source DBus platform, combined with the Wormhole streaming engine, captures raw application logs, lets users configure visual rule operators, and transforms the unstructured message part into schema‑driven, Kafka‑ready data for downstream analytics.

dbaplus Community
dbaplus Community
dbaplus Community
Turning Raw Logs into Structured Data with DBus Visual Rule Operators

Background

With the explosive growth of internet, IT, and big‑data technologies, enterprise systems generate massive amounts of log data. Traditional log collection tools (Logstash, Filebeat, Flume, Fluentd, etc.) can extract fixed fields such as timestamps and log levels, but the variable message part—containing user actions, business events, and error details—remains unstructured and under‑utilized.

Common Log Processing Solutions

Typical solutions like ELK (Elasticsearch, Logstash, Kibana) integrate log ingestion, transformation, and visualization. They rely heavily on regular‑expression configurations to parse logs, which can be powerful but are hard to maintain and require deep regex expertise.

DBus & Wormhole Approach

DBus (https://github.com/bridata/dbus) together with the Wormhole streaming platform (https://github.com/edp963/wormhole) provides a visual, "what‑you‑see‑is‑what‑you‑get" environment for defining rule operators that filter, split, trim, and reshape log messages into structured tables.

Architecture Overview

Log collection agents (Logstash, Flume, Filebeat, or custom collectors) read raw logs and push them into Kafka as raw data logs .

Users configure visual rules in DBus to map a log source to one or more target tables. Each rule consists of a chain of rule operators .

The rule‑operator chain runs in the DBus execution engine, producing structured records that are written back to Kafka (or other sinks) in the UMS JSON format.

DBus architecture diagram
DBus architecture diagram

Rule Operators

DBus ships with a rich set of operators such as filter , split , trim , toIndex , and saveAs . Operators are independent and can be combined in any order, allowing users to iteratively refine the transformation and instantly preview intermediate results.

Rule operator diagram
Rule operator diagram

Step‑by‑Step Example

Collect raw logs : Use Logstash (or any collector) to read log4j files and push them to Kafka.

Configure visual rules : Create a rule group for the heartbeat_log_new topic.

Extract fields : Add a toIndex operator to select timestamp, log, etc.

Filter irrelevant rows : Use a filter operator to keep only rows containing "insert heartbeat".

Split composite fields : Apply a split operator to break the message into separate columns.

Trim unwanted data : Use a trim operator to clean each column.

Save output : Finish with a saveAs operator to write the structured columns to a target table.

Visual rule configuration
Visual rule configuration

Resulting UMS JSON

{
  "payload": [
    {
      "tuple": ["127046516736228867","2017-12-17 13:57:30.000","i","320171788","2017/12/17 13:57:30.877","edpdb","成功","/DBus/HeartBeat/Monitor/edpdb/TEST1/T1000"]
    },
    {
      "tuple": ["127046516736228869","2017-12-17 13:57:30.000","i","320171790","2017/12/17 13:57:30.946","edpdb","成功","/DBus/HeartBeat/Monitor/edpdb/TEST4/ONEYI"]
    }
  ],
  "protocol": {"type": "data_increment_data","version": "1.3"},
  "schema": {
    "fields": [
      {"name": "ums_id_","type": "long"},
      {"name": "ums_ts_","type": "datetime"},
      {"name": "ums_op_","type": "string"},
      {"name": "ums_uid_","type": "string"},
      {"name": "event_time","type": "datetime"},
      {"name": "datasource","type": "string"},
      {"name": "heartbeat_state","type": "string"},
      {"name": "heartbeat_node","type": "string"}
    ]
  }
}

Monitoring

DBus provides a real‑time monitoring dashboard showing total processed rows, error counts (useful for debugging rule mismatches), and latency, plus an __unknown_table__ that captures logs not matched by any rule.

Monitoring dashboard
Monitoring dashboard

Conclusion

DBus integrates existing log collectors, offers a rich set of visual rule operators, and enables users to transform raw application logs into schema‑driven, Kafka‑ready data without writing code. The platform supports extensible custom operators, one‑to‑many source‑to‑target mappings, and real‑time monitoring, making log‑driven data extraction simple, flexible, and accessible to both developers and analysts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datastream processingstructured loggingLog ProcessingDBus
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.