Big Data 22 min read

Technical Architecture and Component Selection of a Real‑time Data Platform (RTDP)

This article details the technical architecture of a Real‑time Data Platform (RTDP), covering component selection such as DBus, Kafka, Wormhole, Moonbox and Davinci, and discusses design considerations, data management, security, operational practices, and various deployment modes for big‑data applications.

Big Data Technology & Architecture

Sep 12, 2020

Technical Architecture and Component Selection of a Real‑time Data Platform (RTDP)

1. Technical Component Selection Overview

The article introduces the overall RTDP architecture (Figure 1) and then recommends concrete technology components for each layer, focusing on four unified platforms: a unified data collection platform (DBus), a unified stream processing platform (Wormhole), a unified compute service platform (Moonbox), and a unified visualization platform (Davinci). It also discusses cross‑cutting topics such as data management, data security, and operations.

Figure 1: Overall RTDP Architecture

2. Component Details

(1) DBus – Data Bus Platform

DBus connects heterogeneous data sources, extracts incremental or full data, formats messages into a unified UMS JSON schema, and publishes them to Kafka. It supports configurable full‑ and incremental data pulls, online log formatting, visual monitoring, multi‑tenant security controls, and table‑level data merging.

Figure 2: DBus Architecture

(2) Kafka – Distributed Messaging System

Kafka serves as the backbone for high‑throughput, fault‑tolerant message transport. The article highlights metadata management and schema evolution via Confluent’s Schema Registry, which stores schema namespaces and enables downstream services to interpret message structures without external lookups.

Figure 3: Kafka Schema Registry Overview

(3) Wormhole – Unified Stream Processing Platform

Wormhole consumes UMS messages from Kafka, supports SQL‑based stream processing, ensures exactly‑once semantics through idempotent writes, and can push data to multiple sinks (e.g., HDFS, Kudu, ClickHouse). It offers Flow abstraction, backfill (Kappa) and batch (Lambda) architectures, and integrates with Spark Streaming or Flink for low‑latency or high‑throughput workloads.

Figure 4: Wormhole Data Flow

(4) Moonbox – Unified Compute Service Platform

Moonbox provides a virtualized SQL layer over heterogeneous data stores, exposing RESTful, JDBC, and ODBC interfaces. It parses SQL, pushes down supported operations to underlying engines, and merges results for cross‑system analytics. Features include multi‑tenant access control, YARN resource scheduling, and metadata services.

Figure 5: Moonbox Logical Modules

(5) Davinci – Unified Visualization Platform

Davinci offers drag‑and‑drop visual analytics, supports JDBC and CSV data sources, and provides fine‑grained row/column permissions and LDAP integration. Users can create projects, teams, and dashboards, share them publicly or with specific users, and embed visualizations into other applications.

Figure 6: Davinci Dashboard

3. Cross‑cutting Topics

Data Management : DBus and Moonbox expose real‑time metadata services; Wormhole logs provide lineage information; combined, they enable enterprise‑level metadata and lineage tracking.

Data Security : Each component implements security controls (e.g., UMS metadata, access control, LDAP), and Moonbox audit logs can feed security alerting systems.

Operations & Monitoring : DBus and Wormhole offer health checks, heartbeat, and stats APIs; visual UI dashboards give throughput and latency metrics, supporting automated ops tooling.

4. Deployment Modes

Sync Mode : Direct data extraction (DBus → Kafka) and sink loading (Wormhole) without stream processing; suitable for real‑time data replication.

Stream Mode : Adds configurable SQL processing in Wormhole, enabling low‑latency incremental computation and cross‑system lookups.

Rotation Mode : Combines stream and batch cycles (Wormhole → Moonbox → Kafka) to achieve complex multi‑step processing with near‑real‑time results.

Intelligent Mode : Automates flow drift, Moonbox pre‑computation tuning, and conversion of batch logic to streaming, aiming for zero‑maintenance pipelines.

In summary, the article provides a comprehensive overview of RTDP’s architectural design, component choices, and practical deployment patterns for building scalable, secure, and maintainable big‑data pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

stream processing Data Security Big Data Architecture real-time data platform RTDP operational management

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.