Tagged articles
9 articles
Page 1 of 1
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 2, 2020 · Big Data

Structured Streaming: Design, Challenges, Programming Model, and Performance Evaluation

This article provides a comprehensive overview of Apache Spark Structured Streaming, describing its declarative API, the challenges of stream processing, the programming model with code examples, query planning, execution modes, production use cases, and performance benchmarks compared with other streaming systems.

Big DataSparkStreaming
0 likes · 42 min read
Structured Streaming: Design, Challenges, Programming Model, and Performance Evaluation
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 19, 2019 · Big Data

Understanding Spark Structured Streaming StateStore: Architecture, Operations, and Fault Recovery

This article explains the design and implementation of Spark Structured Streaming's StateStore module, covering its distributed architecture, state sharding, versioning, batch read/write, migration, update/query APIs, maintenance compaction, and fault‑tolerance mechanisms that enable incremental continuous queries with exactly‑once guarantees.

Big DataSparkStateStore
0 likes · 8 min read
Understanding Spark Structured Streaming StateStore: Architecture, Operations, and Fault Recovery
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 5, 2019 · Big Data

Real-Time Advertising Click Counting with Spark Structured Streaming and Redis Streams

This article presents a complete solution for real‑time advertising click counting using Spark Structured Streaming combined with Redis Streams, detailing the business scenario, data flow, input/output formats, and step‑by‑step implementation including data extraction, processing, storage, and query via Spark‑SQL.

Big DataRedis StreamScala
0 likes · 11 min read
Real-Time Advertising Click Counting with Spark Structured Streaming and Redis Streams
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 1, 2019 · Big Data

Insights from the Real-Time Big Data Meetup: Spark Structured Streaming Overview

The meetup on September 8, co‑hosted by InfoQ and Huawei Cloud, featured Databricks engineer Tathagata Das explaining Spark Structured Streaming’s concepts, fault‑tolerance, performance, event‑time handling, and real‑world use cases such as Apple’s security platform, highlighting its scalability and integration with various data sources.

Big DataSparkStructured Streaming
0 likes · 8 min read
Insights from the Real-Time Big Data Meetup: Spark Structured Streaming Overview
Qunar Tech Salon
Qunar Tech Salon
Mar 9, 2018 · Big Data

New Features in Apache Spark 2.3: Continuous Streaming, Kubernetes Scheduler, Pandas UDFs, and MLlib Enhancements

Apache Spark 2.3 introduces major upgrades such as millisecond‑latency continuous streaming, stream‑to‑stream joins, a native Kubernetes scheduler backend, accelerated Pandas UDFs, and several MLlib improvements, all aimed at making big‑data processing faster, easier, and smarter.

Apache SparkBig DataContinuous Processing
0 likes · 7 min read
New Features in Apache Spark 2.3: Continuous Streaming, Kubernetes Scheduler, Pandas UDFs, and MLlib Enhancements
High Availability Architecture
High Availability Architecture
May 19, 2016 · Big Data

Comprehensive Overview of Apache Spark: Architecture, RDD Principles, Execution Modes, and Spark 2.0 Features

This article provides an in‑depth technical overview of Apache Spark, covering its core concepts such as RDDs, transformation and action operations, execution models, Spark 2.0 enhancements like unified DataFrames/Datasets, whole‑stage code generation, Structured Streaming, and practical performance‑tuning guidance.

DataFramesPerformance OptimizationRDD
0 likes · 20 min read
Comprehensive Overview of Apache Spark: Architecture, RDD Principles, Execution Modes, and Spark 2.0 Features