Big Data 17 min read

Real‑time Data Platform (RTDP): Concepts, Architecture and Design Considerations

This article examines the design of a real‑time data platform, discussing its background concepts, modern data‑warehouse perspective, architectural layers, unified data‑collection, streaming, compute and visualization platforms, and the functional, quality, stability, cost and agility considerations required for building an end‑to‑end real‑time pipeline.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Real‑time Data Platform (RTDP): Concepts, Architecture and Design Considerations

The article introduces a two‑part series on a crucial and common big‑data infrastructure called the Real‑time Data Platform (RTDP). The first part focuses on design, viewing RTDP from modern data‑warehouse and typical data‑processing perspectives, while the second part will cover technology selection and component details.

1. Related Concepts

From a modern data‑warehouse viewpoint, RTDP inherits traditional warehouse modules but adds multi‑source ingestion, near‑real‑time processing (T+0), diverse usage patterns, and advanced capabilities such as data real‑timeization, virtualization, democratization, and collaboration.

Key capabilities identified are:

Data real‑timeization (synchronization and stream processing)

Data virtualization (virtual compute and unified services)

Data democratization (visual UI and self‑service)

Data collaboration (multi‑tenant and workflow sharing)

From a typical data‑processing angle, the article contrasts OLTP and OLAP, describing the data pipeline (ETL) that moves data from transactional to analytical stores and the challenges of achieving low latency (OLPP – Online Pipeline Processing).

2. Architectural Design

RTDP aims to provide end‑to‑end real‑time processing (millisecond/second/minute latency), supporting multi‑source ingestion, real‑time consumption, and the four core capabilities mentioned above.

The overall conceptual architecture is divided into four unified layers:

Unified Data Collection Platform

Unified Stream Processing Platform

Unified Compute Service Platform

Unified Data Visualization Platform

Each layer is described in detail:

2.1 Unified Data Collection Platform supports full and incremental extraction, uses a custom Unified Message Schema (UMS) to decouple messages from physical transports, and offers multi‑tenant, configurable cleaning.

2.2 Unified Stream Processing Platform consumes UMS or JSON messages, provides visual/configurable/SQL‑based development, idempotent writes to heterogeneous targets, and multi‑tenant isolation.

2.3 Unified Compute Service Platform implements data virtualization/federation, supports push‑down and pull‑up computation across heterogeneous sources, and exposes unified JDBC/REST interfaces with SQL.

2.4 Unified Data Visualization Platform adds multi‑tenant governance and enables cross‑department collaboration through visual tools.

3. Specific Issues and Design Thoughts

Functional considerations discuss which ETL operators are supported in streaming (e.g., left‑join, union, filter, map, project) and the need for hybrid processing to handle complex logic.

Quality considerations mention Lambda and Kappa architectures, data consistency, and future discussions on data‑quality mechanisms.

Stability considerations cover high availability, SLA guarantees, elastic resilience, monitoring, automated operations, and metadata change resistance.

Cost considerations address labor, resource utilization, operational expenses, and reducing trial‑and‑error through agile development.

Agility emphasizes configuration, SQL‑based development, and democratization, while management focuses on metadata and data‑security governance.

The article concludes that the presented RTDP design offers a comprehensive, modular, and extensible blueprint for building modern real‑time data platforms, with a forthcoming technical part that will detail concrete technology choices and open‑source implementations.

Author: Lu Shanwei

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architecturestream processingData Platformreal-time datadata virtualizationData Democratization
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.