Big Data 15 min read

Turning Raw Logs into Structured Data with DBus Visual Rule Operators

This article explains how the open‑source DBus platform, combined with the Wormhole streaming engine, captures raw application logs, lets users configure visual rule operators, and transforms the unstructured message part into schema‑driven, Kafka‑ready data for downstream analytics.

dbaplus Community

Dec 26, 2017

Turning Raw Logs into Structured Data with DBus Visual Rule Operators

Background

With the explosive growth of internet, IT, and big‑data technologies, enterprise systems generate massive amounts of log data. Traditional log collection tools (Logstash, Filebeat, Flume, Fluentd, etc.) can extract fixed fields such as timestamps and log levels, but the variable message part—containing user actions, business events, and error details—remains unstructured and under‑utilized.

Common Log Processing Solutions

Typical solutions like ELK (Elasticsearch, Logstash, Kibana) integrate log ingestion, transformation, and visualization. They rely heavily on regular‑expression configurations to parse logs, which can be powerful but are hard to maintain and require deep regex expertise.

DBus & Wormhole Approach

DBus (https://github.com/bridata/dbus) together with the Wormhole streaming platform (https://github.com/edp963/wormhole) provides a visual, "what‑you‑see‑is‑what‑you‑get" environment for defining rule operators that filter, split, trim, and reshape log messages into structured tables.

Architecture Overview

Log collection agents (Logstash, Flume, Filebeat, or custom collectors) read raw logs and push them into Kafka as raw data logs .

Users configure visual rules in DBus to map a log source to one or more target tables. Each rule consists of a chain of rule operators .

The rule‑operator chain runs in the DBus execution engine, producing structured records that are written back to Kafka (or other sinks) in the UMS JSON format.

Rule Operators

DBus ships with a rich set of operators such as filter , split , trim , toIndex , and saveAs . Operators are independent and can be combined in any order, allowing users to iteratively refine the transformation and instantly preview intermediate results.

Step‑by‑Step Example

Collect raw logs : Use Logstash (or any collector) to read log4j files and push them to Kafka.

Configure visual rules : Create a rule group for the heartbeat_log_new topic.

Extract fields : Add a toIndex operator to select timestamp, log, etc.

Filter irrelevant rows : Use a filter operator to keep only rows containing "insert heartbeat".

Split composite fields : Apply a split operator to break the message into separate columns.

Trim unwanted data : Use a trim operator to clean each column.

Save output : Finish with a saveAs operator to write the structured columns to a target table.

Resulting UMS JSON

{
  "payload": [
    {
      "tuple": ["127046516736228867","2017-12-17 13:57:30.000","i","320171788","2017/12/17 13:57:30.877","edpdb","成功","/DBus/HeartBeat/Monitor/edpdb/TEST1/T1000"]
    },
    {
      "tuple": ["127046516736228869","2017-12-17 13:57:30.000","i","320171790","2017/12/17 13:57:30.946","edpdb","成功","/DBus/HeartBeat/Monitor/edpdb/TEST4/ONEYI"]
    }
  ],
  "protocol": {"type": "data_increment_data","version": "1.3"},
  "schema": {
    "fields": [
      {"name": "ums_id_","type": "long"},
      {"name": "ums_ts_","type": "datetime"},
      {"name": "ums_op_","type": "string"},
      {"name": "ums_uid_","type": "string"},
      {"name": "event_time","type": "datetime"},
      {"name": "datasource","type": "string"},
      {"name": "heartbeat_state","type": "string"},
      {"name": "heartbeat_node","type": "string"}
    ]
  }
}

Monitoring

DBus provides a real‑time monitoring dashboard showing total processed rows, error counts (useful for debugging rule mismatches), and latency, plus an __unknown_table__ that captures logs not matched by any rule.

Conclusion

DBus integrates existing log collectors, offers a rich set of visual rule operators, and enables users to transform raw application logs into schema‑driven, Kafka‑ready data without writing code. The platform supports extensible custom operators, one‑to‑many source‑to‑target mappings, and real‑time monitoring, making log‑driven data extraction simple, flexible, and accessible to both developers and analysts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data stream processing structured logging Log Processing DBus

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.