Turning Raw Logs into Structured Data with DBus Visual Rule Operators
This article explains how the open‑source DBus platform, combined with the Wormhole streaming engine, captures raw application logs, lets users configure visual rule operators, and transforms the unstructured message part into schema‑driven, Kafka‑ready data for downstream analytics.
Background
With the explosive growth of internet, IT, and big‑data technologies, enterprise systems generate massive amounts of log data. Traditional log collection tools (Logstash, Filebeat, Flume, Fluentd, etc.) can extract fixed fields such as timestamps and log levels, but the variable message part—containing user actions, business events, and error details—remains unstructured and under‑utilized.
Common Log Processing Solutions
Typical solutions like ELK (Elasticsearch, Logstash, Kibana) integrate log ingestion, transformation, and visualization. They rely heavily on regular‑expression configurations to parse logs, which can be powerful but are hard to maintain and require deep regex expertise.
DBus & Wormhole Approach
DBus (https://github.com/bridata/dbus) together with the Wormhole streaming platform (https://github.com/edp963/wormhole) provides a visual, "what‑you‑see‑is‑what‑you‑get" environment for defining rule operators that filter, split, trim, and reshape log messages into structured tables.
Architecture Overview
Log collection agents (Logstash, Flume, Filebeat, or custom collectors) read raw logs and push them into Kafka as raw data logs .
Users configure visual rules in DBus to map a log source to one or more target tables. Each rule consists of a chain of rule operators .
The rule‑operator chain runs in the DBus execution engine, producing structured records that are written back to Kafka (or other sinks) in the UMS JSON format.
Rule Operators
DBus ships with a rich set of operators such as filter , split , trim , toIndex , and saveAs . Operators are independent and can be combined in any order, allowing users to iteratively refine the transformation and instantly preview intermediate results.
Step‑by‑Step Example
Collect raw logs : Use Logstash (or any collector) to read log4j files and push them to Kafka.
Configure visual rules : Create a rule group for the heartbeat_log_new topic.
Extract fields : Add a toIndex operator to select timestamp, log, etc.
Filter irrelevant rows : Use a filter operator to keep only rows containing "insert heartbeat".
Split composite fields : Apply a split operator to break the message into separate columns.
Trim unwanted data : Use a trim operator to clean each column.
Save output : Finish with a saveAs operator to write the structured columns to a target table.
Resulting UMS JSON
{
"payload": [
{
"tuple": ["127046516736228867","2017-12-17 13:57:30.000","i","320171788","2017/12/17 13:57:30.877","edpdb","成功","/DBus/HeartBeat/Monitor/edpdb/TEST1/T1000"]
},
{
"tuple": ["127046516736228869","2017-12-17 13:57:30.000","i","320171790","2017/12/17 13:57:30.946","edpdb","成功","/DBus/HeartBeat/Monitor/edpdb/TEST4/ONEYI"]
}
],
"protocol": {"type": "data_increment_data","version": "1.3"},
"schema": {
"fields": [
{"name": "ums_id_","type": "long"},
{"name": "ums_ts_","type": "datetime"},
{"name": "ums_op_","type": "string"},
{"name": "ums_uid_","type": "string"},
{"name": "event_time","type": "datetime"},
{"name": "datasource","type": "string"},
{"name": "heartbeat_state","type": "string"},
{"name": "heartbeat_node","type": "string"}
]
}
}Monitoring
DBus provides a real‑time monitoring dashboard showing total processed rows, error counts (useful for debugging rule mismatches), and latency, plus an __unknown_table__ that captures logs not matched by any rule.
Conclusion
DBus integrates existing log collectors, offers a rich set of visual rule operators, and enables users to transform raw application logs into schema‑driven, Kafka‑ready data without writing code. The platform supports extensible custom operators, one‑to‑many source‑to‑target mappings, and real‑time monitoring, making log‑driven data extraction simple, flexible, and accessible to both developers and analysts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
