Why Lambda, Kappa, and Lambda+ Are Shaping Modern Big Data Architecture
This article examines the technical challenges of large‑scale data processing, compares the classic Lambda and Kappa architectures, introduces the unified stream‑batch Lambda+ design built on Tablestore and Blink, and outlines suitable scenarios and practical solutions for modern big‑data systems.
Big Data Processing Challenges
Many industries now require big‑data analysis systems, such as finance (risk modeling), retail (sales decision support), IoT (time‑series aggregation), and tech companies (analytics platforms). The common technical challenges include:
Handling both low‑latency real‑time data and petabyte‑scale historical data.
Ensuring reliability and scalability while keeping costs under control.
Integrating a deep tech stack of streaming, storage, and compute components.
Maintaining high operability for complex architectures.
Evolution of Big Data Architectures
Lambda Architecture
The Lambda architecture separates immutable data into batch and streaming layers, runs the same logic on both, and merges the results at query time. It assumes batch processing is simple and reliable, while streaming can use approximate algorithms for fast updates.
All data is written to both batch and streaming layers.
The batch layer manages the master dataset (immutable, append‑only) and pre‑computes a batch view.
The service layer indexes the batch view for low‑latency ad‑hoc queries.
The streaming layer creates an approximate real‑time view to complement the batch view.
Queries merge the batch view and the real‑time view.
The Lambda design promotes immutable event streams and view recomputation, satisfying evolving historical and real‑time analysis needs.
Four Challenges of Lambda Architecture
Write‑side consistency is pushed to the upstream application.
HDFS‑based master datasets do not support updates, leading to high latency and cost.
Separate development, debugging, and troubleshooting for stream and batch frameworks are complex.
Result views must support low‑latency queries, often requiring additional column‑store systems.
Unified Stream‑Batch Lambda
Frameworks like Spark and Flink aim to unify stream and batch processing, requiring:
Same engine for real‑time and historical replay.
Exactly‑once semantics.
Event‑time windowing.
Kappa Architecture
Kappa, proposed by Jay Kreps, processes data only through a streaming engine. Historical analysis is achieved by re‑processing the immutable log stream.
Kappa simplifies the data pipeline but still faces storage and serving challenges; long‑term log storage (e.g., Kafka) is costly, and layered storage solutions (e.g., Pulsar) only help back‑fill jobs.
When to Use Lambda vs. Kappa
Kappa excels for append‑only, time‑series workloads where streaming alone satisfies both real‑time and historical needs.
Lambda is better for ad‑hoc exploratory analysis on large historical datasets that also require low latency.
Hybrid Kappa (Kappa+)
Uber’s Kappa+ reads directly from a data‑warehouse (e.g., Hudi) to perform both real‑time and back‑fill calculations without persisting logs for back‑fill.
Lambda Plus: Tablestore + Blink
Lambda plus combines Alibaba Cloud’s Tablestore (a serverless NoSQL store) with Blink (an enhanced Flink‑based real‑time engine) to create a fully serverless, low‑maintenance big‑data solution.
Tablestore provides PB‑scale structured storage, millions of TPS, millisecond latency, and multi‑model indexing. Blink offers unified stream‑batch SQL with near‑native Flink performance.
In this architecture, Tablestore serves as the master dataset, batch view, and stream view. Blink reads directly from Tablestore for batch processing and consumes real‑time data via the TunnelService API.
Component Breakdown
Lambda batch layer: Tablestore is the master dataset; Blink pushes SQL to Tablestore to compute the batch view and writes it back.
Streaming layer: Blink reads real‑time data from Tablestore via TunnelService; Kappa‑style back‑fill can re‑process stored data.
Serving layer: Global secondary and multi‑model indexes enable low‑latency ad‑hoc queries on both batch and stream views.
How Lambda Plus Solves the Four Lambda Issues
Data is written only to Tablestore; Blink reads real‑time data directly, eliminating double‑write queues.
Tablestore’s low‑latency reads/writes and built‑in indexing keep storage costs controllable while supporting high throughput.
Blink provides a unified stream‑batch engine, simplifying code development.
Tablestore’s indexing capabilities give flexible query options for the serving layer.
Tablestore’s Full‑Feature Support
High‑concurrency, low‑latency storage that scales TPS horizontally for batch and back‑fill workloads.
Channel service enables ordered, streaming consumption of both historical and real‑time data without external message queues.
Secondary and multi‑model indexes allow ad‑hoc queries and aggregations directly on stored views.
Applicable Scenarios for Lambda Plus
Ideal for big‑data analytics on distributed NoSQL stores, such as IoT telemetry, time‑series logs, web crawlers, and user‑behavior data at TB scale. Example: a big‑data sentiment analysis system (diagram omitted).
References
https://yq.aliyun.com/articles/704171
http://lambda-architecture.net
http://shop.oreilly.com/product/0636920032175.do
https://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-telco-industry
https://www.oreilly.com/ideas/questioning-the-lambda-architecture
http://milinda.pathirage.org/kappa-architecture.com/
https://eng.uber.com/hoodie/
https://eng.uber.com/uber-big-data-platform/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
