Big Data 26 min read

Technical Overview of Real-time Data Platform (RTDP) Architecture and Component Selection

This article presents a comprehensive technical overview of the Real-time Data Platform (RTDP), detailing its overall architecture, component selection—including DBus, Kafka, Wormhole, Moonbox, and Davinci—design philosophies, functional features, and various deployment patterns such as synchronous, stream-processing, rotation, and intelligent modes.

Big Data Technology & Architecture

May 20, 2020

Technical Overview of Real-time Data Platform (RTDP) Architecture and Component Selection

1. Technical Selection Overview

In the design part, we introduced the overall RTDP architecture. In this technical part we recommend the overall component selection, briefly introduce each component, and focus on the design ideas of the four self‑developed platforms (Unified Data Ingestion, Unified Stream Processing, Unified Compute Service, Unified Data Visualization). We also discuss end‑to‑end aspects such as functionality integration, data management, and data security.

1.1 Overall Technical Selection

The overall architecture diagram (Figure 1) shows the RTDP stack. The key layers are:

Data sources and clients covering most common data‑source types.

DBus as the unified data‑bus platform, extracting incremental or full data from sources and publishing unified UMS messages to Kafka.

Kafka as a distributed, high‑availability, high‑throughput messaging system.

Wormhole as the unified stream‑processing platform, consuming Kafka messages, supporting SQL‑based stream logic, and delivering data to various sinks with exactly‑once semantics.

Flexible storage selection in the compute‑storage layer, allowing multiple storage systems to coexist.

Moonbox as the unified compute service platform, providing cross‑heterogeneous data virtualization, unified SQL interface, and unified metadata/permission management.

Davinci as the unified data‑visualization platform, offering configurable visualizations, collaborative editing, and integration with other data applications.

Additional end‑to‑end aspects such as data management, security, DevOps, and driver engines are integrated via DBus, Wormhole, Moonbox, and Davinci APIs.

1.2 Technical Component Introduction

1.2.1 DBus (Data Bus Platform)

DBus connects various data sources, extracts incremental data (using log‑based extraction for databases and agent‑based extraction for logs), and publishes unified UMS JSON messages to Kafka. It also supports full‑load extraction, merges it with incremental streams, and provides metadata such as monotonic IDs, timestamps, and operation types.

Design Ideas

External view: unified data extraction, UMS message format, and full‑load merging.

Internal view: built on Storm for low latency, standardizes messages, generates unique IDs, timestamps, operation flags, and handles schema versioning and heartbeat monitoring.

Key Features

Configurable full‑load and incremental extraction.

Configurable online log formatting.

Visual monitoring and alerting.

Multi‑tenant security controls.

Table sharding aggregation.

Technical Architecture

1.2.2 Kafka (Distributed Messaging System)

Kafka is the de‑facto standard for large‑scale stream processing. This article focuses on metadata management and schema evolution on top of Kafka.

Confluent’s Schema Registry provides centralized schema management, enabling consumers to understand message structures and supporting schema evolution with compatibility checks.

Metadata Management

DBus records source metadata and publishes it within UMS messages, eliminating the need for a separate schema service.

Wormhole consumes UMS messages, which already contain schema namespace information.

Schema Evolution

UMS includes a 7‑level namespace that uniquely identifies a table version, e.g., oracle.oracle01.db1.table1.v2.dbpar01.tablepar01. Wormhole can handle compatible schema changes automatically; non‑compatible changes require manual intervention.

1.2.3 Wormhole (Stream Processing Platform)

Design Ideas

Consumes UMS and custom JSON messages from Kafka.

Supports multiple sinks with exactly‑once semantics.

SQL‑based stream logic configuration.

Flow abstraction decouples logical flows from Spark/Flink execution engines.

Supports both Kappa (backfill) and Lambda architectures.

Key Features

Visual, configuration‑driven, SQL‑based development.

Dynamic management, monitoring, and diagnostics.

Unified UMS and semi‑structured JSON handling.

Support for insert/update/delete events.

Parallel processing of multiple logical flows on a single physical stream.

Lookup and push‑down capabilities.

Event‑time processing, UDF registration, multi‑target idempotent writes, data quality management, and both Lambda and Kappa architectures.

Private‑cloud deployment, multi‑tenant resource management, and security controls.

Technical Architecture

1.2.4 Common Data Compute & Storage Options

RTDP adopts an open‑integration approach, allowing multiple storage systems (relational databases, columnar stores, HBase, Cassandra, ClickHouse, HDFS/Parquet/Hive, MongoDB, Elasticsearch, Druid/Kylin, etc.) to be plugged in as needed.

1.2.5 Moonbox (Compute Service Platform)

Design Ideas

Provides unified access to heterogeneous data sources for ad‑hoc mixed queries.

Supports RESTful, JDBC, and ODBC clients.

Unified metadata, SQL interface, and permission management.

Result write‑back modes: Merge and Replace; execution modes: Batch and Ad‑hoc.

Data virtualization with multi‑tenant support.

Key Features

Cross‑system seamless mixed computation.

Unified SQL query and write capabilities.

RESTful, JDBC, ODBC interfaces.

Batch and ad‑hoc modes, CLI and Zeppelin support.

Fine‑grained permission control (table, column, read/write, UDF).

YARN resource scheduling, metadata services, scheduled tasks, and security policies.

Technical Architecture

1.2.6 Davinci (Visualization Platform)

Design Ideas

Provides various data‑visualization capabilities.

Supports JDBC data sources and CSV uploads.

Organizational hierarchy (Org → Team → Project) with collaborative editing.

SQL‑based data processing, drag‑and‑drop visual editing, and rich chart components.

Embedding, sharing, and permission‑controlled access.

Row/column level security and LDAP integration.

1.3 Aspect Discussions

1.3.1 Data Management

Metadata management via DBus (source) and Moonbox (target).

Data quality through Wormhole’s HDFS log backfill and Moonbox’s mixed‑query capabilities.

Lineage tracking by aggregating SQL definitions from Wormhole and Moonbox.

1.3.2 Data Security

Each component (DBus, Wormhole, Moonbox, Davinci) provides security controls such as authentication, authorization, and audit logging, ensuring end‑to‑end data protection.

1.3.3 Development & Operations

Visual UI for DBus and Wormhole simplifies operations.

RESTful health‑check, backfill, flow drift, and monitoring APIs enable automated DevOps pipelines.

2. Pattern Scenarios

Based on the component capabilities, RTDP supports several deployment patterns.

2.1 Synchronous Mode

Data is extracted by DBus, published to Kafka, and directly written to sinks by Wormhole without any stream processing logic. Benefits include simple implementation, low operational overhead, and real‑time data sharing across departments.

2.2 Stream‑Processing Mode

On top of the synchronous flow, Wormhole applies SQL‑based stream logic, enabling low‑latency incremental computation, cross‑system lookups, and real‑time enrichment.

2.3 Rotation Mode

Combines stream processing with periodic batch jobs via Moonbox. Data flows Kafka → Wormhole → Sink → Moonbox → Kafka, allowing complex multi‑step calculations with low latency.

2.4 Intelligent Mode

Leverages automation tools to convert offline batch logic into stream logic, perform smart flow drift, and auto‑tune Moonbox pre‑computations, aiming for zero‑maintenance operations.

Author: Lu Shanwei

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Integration Data Governance

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.