Technical Overview of Real-time Data Platform (RTDP) Architecture and Component Selection
This article presents a comprehensive technical overview of the Real-time Data Platform (RTDP), detailing its overall architecture, component selection—including DBus, Kafka, Wormhole, Moonbox, and Davinci—design philosophies, functional features, and various deployment patterns such as synchronous, stream-processing, rotation, and intelligent modes.
1. Technical Selection Overview
In the design part, we introduced the overall RTDP architecture. In this technical part we recommend the overall component selection, briefly introduce each component, and focus on the design ideas of the four self‑developed platforms (Unified Data Ingestion, Unified Stream Processing, Unified Compute Service, Unified Data Visualization). We also discuss end‑to‑end aspects such as functionality integration, data management, and data security.
1.1 Overall Technical Selection
The overall architecture diagram (Figure 1) shows the RTDP stack. The key layers are:
Data sources and clients covering most common data‑source types.
DBus as the unified data‑bus platform, extracting incremental or full data from sources and publishing unified UMS messages to Kafka.
Kafka as a distributed, high‑availability, high‑throughput messaging system.
Wormhole as the unified stream‑processing platform, consuming Kafka messages, supporting SQL‑based stream logic, and delivering data to various sinks with exactly‑once semantics.
Flexible storage selection in the compute‑storage layer, allowing multiple storage systems to coexist.
Moonbox as the unified compute service platform, providing cross‑heterogeneous data virtualization, unified SQL interface, and unified metadata/permission management.
Davinci as the unified data‑visualization platform, offering configurable visualizations, collaborative editing, and integration with other data applications.
Additional end‑to‑end aspects such as data management, security, DevOps, and driver engines are integrated via DBus, Wormhole, Moonbox, and Davinci APIs.
1.2 Technical Component Introduction
1.2.1 DBus (Data Bus Platform)
DBus connects various data sources, extracts incremental data (using log‑based extraction for databases and agent‑based extraction for logs), and publishes unified UMS JSON messages to Kafka. It also supports full‑load extraction, merges it with incremental streams, and provides metadata such as monotonic IDs, timestamps, and operation types.
Design Ideas
External view: unified data extraction, UMS message format, and full‑load merging.
Internal view: built on Storm for low latency, standardizes messages, generates unique IDs, timestamps, operation flags, and handles schema versioning and heartbeat monitoring.
Key Features
Configurable full‑load and incremental extraction.
Configurable online log formatting.
Visual monitoring and alerting.
Multi‑tenant security controls.
Table sharding aggregation.
Technical Architecture
1.2.2 Kafka (Distributed Messaging System)
Kafka is the de‑facto standard for large‑scale stream processing. This article focuses on metadata management and schema evolution on top of Kafka.
Confluent’s Schema Registry provides centralized schema management, enabling consumers to understand message structures and supporting schema evolution with compatibility checks.
Metadata Management
DBus records source metadata and publishes it within UMS messages, eliminating the need for a separate schema service.
Wormhole consumes UMS messages, which already contain schema namespace information.
Schema Evolution
UMS includes a 7‑level namespace that uniquely identifies a table version, e.g., oracle.oracle01.db1.table1.v2.dbpar01.tablepar01. Wormhole can handle compatible schema changes automatically; non‑compatible changes require manual intervention.
1.2.3 Wormhole (Stream Processing Platform)
Design Ideas
Consumes UMS and custom JSON messages from Kafka.
Supports multiple sinks with exactly‑once semantics.
SQL‑based stream logic configuration.
Flow abstraction decouples logical flows from Spark/Flink execution engines.
Supports both Kappa (backfill) and Lambda architectures.
Key Features
Visual, configuration‑driven, SQL‑based development.
Dynamic management, monitoring, and diagnostics.
Unified UMS and semi‑structured JSON handling.
Support for insert/update/delete events.
Parallel processing of multiple logical flows on a single physical stream.
Lookup and push‑down capabilities.
Event‑time processing, UDF registration, multi‑target idempotent writes, data quality management, and both Lambda and Kappa architectures.
Private‑cloud deployment, multi‑tenant resource management, and security controls.
Technical Architecture
1.2.4 Common Data Compute & Storage Options
RTDP adopts an open‑integration approach, allowing multiple storage systems (relational databases, columnar stores, HBase, Cassandra, ClickHouse, HDFS/Parquet/Hive, MongoDB, Elasticsearch, Druid/Kylin, etc.) to be plugged in as needed.
1.2.5 Moonbox (Compute Service Platform)
Design Ideas
Provides unified access to heterogeneous data sources for ad‑hoc mixed queries.
Supports RESTful, JDBC, and ODBC clients.
Unified metadata, SQL interface, and permission management.
Result write‑back modes: Merge and Replace; execution modes: Batch and Ad‑hoc.
Data virtualization with multi‑tenant support.
Key Features
Cross‑system seamless mixed computation.
Unified SQL query and write capabilities.
RESTful, JDBC, ODBC interfaces.
Batch and ad‑hoc modes, CLI and Zeppelin support.
Fine‑grained permission control (table, column, read/write, UDF).
YARN resource scheduling, metadata services, scheduled tasks, and security policies.
Technical Architecture
1.2.6 Davinci (Visualization Platform)
Design Ideas
Provides various data‑visualization capabilities.
Supports JDBC data sources and CSV uploads.
Organizational hierarchy (Org → Team → Project) with collaborative editing.
SQL‑based data processing, drag‑and‑drop visual editing, and rich chart components.
Embedding, sharing, and permission‑controlled access.
Row/column level security and LDAP integration.
1.3 Aspect Discussions
1.3.1 Data Management
Metadata management via DBus (source) and Moonbox (target).
Data quality through Wormhole’s HDFS log backfill and Moonbox’s mixed‑query capabilities.
Lineage tracking by aggregating SQL definitions from Wormhole and Moonbox.
1.3.2 Data Security
Each component (DBus, Wormhole, Moonbox, Davinci) provides security controls such as authentication, authorization, and audit logging, ensuring end‑to‑end data protection.
1.3.3 Development & Operations
Visual UI for DBus and Wormhole simplifies operations.
RESTful health‑check, backfill, flow drift, and monitoring APIs enable automated DevOps pipelines.
2. Pattern Scenarios
Based on the component capabilities, RTDP supports several deployment patterns.
2.1 Synchronous Mode
Data is extracted by DBus, published to Kafka, and directly written to sinks by Wormhole without any stream processing logic. Benefits include simple implementation, low operational overhead, and real‑time data sharing across departments.
2.2 Stream‑Processing Mode
On top of the synchronous flow, Wormhole applies SQL‑based stream logic, enabling low‑latency incremental computation, cross‑system lookups, and real‑time enrichment.
2.3 Rotation Mode
Combines stream processing with periodic batch jobs via Moonbox. Data flows Kafka → Wormhole → Sink → Moonbox → Kafka, allowing complex multi‑step calculations with low latency.
2.4 Intelligent Mode
Leverages automation tools to convert offline batch logic into stream logic, perform smart flow drift, and auto‑tune Moonbox pre‑computations, aiming for zero‑maintenance operations.
Author: Lu Shanwei
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
