Big Data 24 min read

How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns

This article explains the architecture of a Real‑Time Data Platform (RTDP), details the technical selection of core components such as DBus, Kafka, Wormhole, Moonbox and Davinci, and discusses data management, security, operations, and four deployment modes—synchronization, flow, rotation and intelligent—illustrating how each fits different business scenarios.

dbaplus Community
dbaplus Community
dbaplus Community
How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns

Technical Overview of the Real‑time Data Platform (RTDP)

RTDP is a modular big‑data infrastructure that enables continuous OLTP‑to‑OLAP data flow. It is organized into four unified layers:

DBus (Data Bus Platform) – Connects heterogeneous data sources, extracts incremental or full snapshots, formats records into a standard UMS JSON message, and publishes them to Kafka.

Kafka – Distributed, high‑throughput pub/sub system that carries UMS messages. Metadata management and schema evolution are handled via Confluent Schema Registry.

Wormhole (Streaming Processing Platform) – Consumes UMS messages, executes SQL‑based stream logic, guarantees exactly‑once semantics, and writes results to configurable sinks. It abstracts flows, supports Spark or Flink engines, and allows custom sinks, UDFs, and feedback loops.

Moonbox (Compute Service Platform) – Provides a unified query gateway (REST, JDBC, ODBC) across heterogeneous stores, pushes down logical plans to underlying engines, and supports virtual database namespaces for mixed‑engine analytics.

Davinci (Visualization Platform) – Offers configurable dashboards, supports JDBC and CSV data sources, and enables collaborative data exploration through view‑widget abstractions.

RTDP overall architecture diagram
RTDP overall architecture diagram

Component Details

DBus

Supports CDC‑based incremental extraction for databases and log‑based agents for file sources.

Generates a unique monotonic ums_id_, event timestamp ums_ts_, and operation type ums_op_ for each message.

Maintains table‑level schema versioning; version changes are reflected in the UMS Namespace string (e.g., oracle.oracle01.db1.table1.v2.dbpar01.tablepar01).

Publishes messages to Kafka with at‑least‑once delivery and strong ordering guarantees.

Provides heartbeat tables for end‑to‑end liveness detection.

Documentation: https://bridata.github.io/DBus/

DBus architecture
DBus architecture

Kafka

Provides high‑throughput, fault‑tolerant messaging.

Schema Registry stores Avro/JSON schemas and enables schema evolution without breaking downstream consumers.

Wormhole

Consumes UMS or custom JSON messages from Kafka.

SQL‑based stream processing; supports backfill (Kappa) and batch‑oriented (Lambda) architectures.

Abstracts a Flow as a pair of source and sink namespaces; a single physical stream can host multiple logical flows.

Implements idempotent writes via ums_id_ and ums_op_.

Supports Spark Streaming (high throughput, batch lookup) and Flink (low latency, CEP).

Extensible interfaces: SinkProcessor, SwiftsInterface, UDF.

Provides real‑time feedback messages for monitoring and alerting.

GitHub: https://github.com/edp963/wormhole

Wormhole data flow architecture
Wormhole data flow architecture

Moonbox

Acts as a virtual database layer; parses SQL, applies Catalyst optimizations, and pushes down sub‑trees to underlying stores.

Supports two namespace levels ( database.table) for virtual database experience.

Provides three client interfaces: REST, JDBC, ODBC.

Offers batch and ad‑hoc execution modes, with result write‑back strategies (Merge, Replace).

Implements multi‑tenant access control, row/column permissions, and integrates with YARN for resource scheduling.

Exposes metadata services for schema discovery and lineage tracking.

GitHub: https://github.com/edp963/moonbox

Moonbox logical modules
Moonbox logical modules

Davinci

Provides configurable dashboards; supports JDBC and CSV data sources.

Uses View (logical data view) and Widget (visual component) abstractions.

Allows SQL template definition, syntax highlighting, testing, and write‑back.

Offers pre‑defined charts, custom styling, full‑screen mode, and interactive filtering/linkage.

Supports row/column level security and LDAP integration.

GitHub: https://github.com/edp963/davinci

Davinci architecture
Davinci architecture

Data Management, Security, and Operations

Metadata : DBus and Moonbox expose RESTful metadata services for real‑time schema discovery and lineage collection.

Security : Multi‑tenant access control in DBus, audit logging in Moonbox, and end‑to‑end encryption policies across the pipeline.

Operations : Visual UI and health‑check APIs in DBus and Wormhole; backfill, flow migration, and heartbeat services enable automated monitoring and alerting.

Mode Scenarios

RTDP supports four typical usage patterns, each built on the same core components.

1. Synchronization Mode

Data is extracted by DBus, streamed via Kafka, and written directly to sinks by Wormhole without transformation. Use cases include real‑time replication, decoupling OLTP from OLAP, and building ODS layers.

2. Flow (Streaming) Mode

Extends synchronization by configuring SQL logic in Wormhole. This reduces latency for incremental analytics, enables cross‑store lookups, and powers low‑latency dashboards.

3. Rotation Mode

Combines streaming and batch: after a short‑interval batch job in Moonbox, results are fed back to Kafka for another round of stream processing. This supports complex multi‑step pipelines with near‑real‑time latency.

4. Intelligent Mode

Leverages rule engines or machine‑learning models to automate flow drift, optimize Moonbox pre‑computations, and auto‑convert batch logic to streaming, aiming for zero‑maintenance pipelines.

Key repository references:

DBus manual: https://bridata.github.io/DBus/

Wormhole source: https://github.com/edp963/wormhole

Moonbox source: https://github.com/edp963/moonbox

Davinci source: https://github.com/edp963/davinci

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

stream processingKafkaData IntegrationBig Data Architecturereal-time data platformRTDP
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.