Big Data 14 min read

Streaming 102: The World Beyond Batch

This article extends the concepts introduced in Streaming 101 by deeply exploring data processing patterns for unbounded data, covering windowing, watermarks, triggers, accumulation modes, and their practical implications for building robust low‑latency streaming pipelines.

Byte Quality Assurance Team
Byte Quality Assurance Team
Byte Quality Assurance Team
Streaming 102: The World Beyond Batch

The article revisits the three core topics from Streaming 101—precise terminology, the comparison between batch and stream processing, and data processing patterns for bounded and unbounded data—before diving into more advanced streaming concepts.

It first reviews the distinction between event time and processing time, and introduces windowing as a method to partition unbounded data streams, describing fixed, sliding, and session windows.

Next, it expands on three additional concepts essential for unbounded data handling: watermarks (which measure event‑time completeness), triggers (which dictate when window results are emitted based on watermarks, processing‑time progress, data count, or special markers), and accumulation modes (discarding, accumulating, and accumulating with retraction).

The article then answers the classic "What/Where/When/How" questions for streaming pipelines:

What : transformations such as sums, histograms, or model training.

Where : results are calculated within specific event‑time windows (fixed, sliding, session).

When : results are materialized using watermarks and triggers, allowing early and late outputs.

How : accumulation modes determine how successive window panes relate, ranging from discarding previous state to retaining and retracting prior results.

Finally, the article discusses allowed lateness and garbage collection, explaining how defining a maximum delay after a watermark helps bound the lifetime of window state and prevents resource exhaustion.

In summary, the piece equips readers with the principles and tools needed to design robust, low‑latency streaming systems that go beyond traditional batch processing.

Big DatastreamingWindowingaccumulationTriggerswatermarks
Byte Quality Assurance Team
Written by

Byte Quality Assurance Team

World-leading audio and video quality assurance team, safeguarding the AV experience of hundreds of millions of users.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.