Big Data 16 min read

From BI to Kappa: How Data Architecture Evolved in the Big Data Era

This article traces the evolution of data architecture from early BI systems through traditional big‑data stacks, streaming, Lambda and Kappa designs, and explains how a unified stream‑batch model simplifies development while keeping logic consistent across data‑analysis and pipeline applications.

Alibaba Cloud Developer

Jul 31, 2023

From BI to Kappa: How Data Architecture Evolved in the Big Data Era

1. Pre‑Big Data Era

Before the big‑data era, BI (Business Intelligence) systems were the primary tools for data analysis. Early BI combined data cleaning, analysis, mining, and reporting, often built on relational databases and using cube models with MDX queries. Limitations included reliance on traditional RDBMS constraints and lack of support for unstructured data.

2. Evolution of Big Data Architecture

Traditional Big Data Architecture

To handle massive data volumes, companies adopted distributed storage and compute, most famously Google’s file system and MapReduce, later open‑sourced as Hadoop. A typical pipeline reads from sources (e.g., MySQL, flat files), passes through ODS, DWD, ADS layers, and finally serves consumers.

Streaming Architecture

Streaming architecture discards the offline ETL chain, letting a stream engine consume incremental data directly from business databases and produce real‑time results. Early implementations struggled to guarantee both low latency and high accuracy, limiting their use to scenarios where precision was less critical.

Lambda Architecture

Lambda combines a real‑time stream layer with a batch layer that processes daily snapshots, merging their outputs to achieve low latency and eventual consistency. However, it requires maintaining two separate codebases and systems, leading to high development and operational costs and potential inconsistencies.

Kappa Architecture

Kappa removes the batch layer, relying solely on a distributed message queue (e.g., Kafka) and a single stream processing engine. When data errors or logic changes occur, the system replays the queue to recompute results. While it simplifies the stack, its applicability is limited by the short lifecycle of queued data.

3. Stream‑Batch Unified Model and Data Architecture

The stream‑batch unified model lets a single codebase (Java or SQL) be executed either as a streaming job (incremental) or a batch job (full snapshot) based on configuration or automatic detection, producing identical results. This capability is valuable for both data‑analysis and data‑pipeline applications.

Data‑analysis Applications

Most data‑analysis workloads benefit from combining real‑time and batch processing, effectively implementing a refined Lambda architecture where the same logic runs in both modes, reducing duplication and ensuring consistent semantics.

Data‑pipeline Applications

For data synchronization tasks, the unified engine can first perform a batch load and then switch to a streaming mode to capture ongoing changes, using connectors (e.g., Flink CDC) to keep source and target in sync without separate tools.

4. Summary

Early BI → Traditional big‑data architecture (solves volume, high latency)

Traditional → Streaming (low latency, lower accuracy)

Streaming → Lambda (adds batch for accuracy, but complex)

Lambda → Kappa (removes redundancy, but limited by message‑queue lifespan)

By adopting a stream‑batch unified engine on top of Lambda‑style designs, system complexity is reduced and computation logic remains consistent across real‑time and batch workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data stream processing Data Architecture data pipelines Lambda architecture Kappa architecture BI systems

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.