Big Data 8 min read

How to Handle Data Delay in Flink: Watermarks, Late Events, and Window Strategies

This article explains why out‑of‑order events cause delayed data in Flink, outlines their impact on computation accuracy and timeliness, identifies root causes such as network latency and watermark misconfiguration, and provides concrete watermark settings, allowed lateness, and step‑by‑step window‑triggering procedures with examples.

JavaEdge
JavaEdge
JavaEdge
How to Handle Data Delay in Flink: Watermarks, Late Events, and Window Strategies

1 Introduction

In streaming systems such as Apache Flink, events may arrive out of order. A delayed event (e.g., Data‑1) occurs earlier in real time than later‑arriving events (Data‑4, Data‑5) but is received after them. The event is not lost; it is merely late.

2 Impact of Data Delay

2.1 Incorrect Computation Results

When watermarks lag behind the true event time, windows can close before all relevant events have arrived. Consequently, a late event may be assigned to a wrong window or be excluded, producing inaccurate aggregates.

2.2 Degraded Real‑time Responsiveness

Late events postpone the availability of complete results, reducing the timeliness of downstream decisions that depend on up‑to‑date analytics.

2.3 Potential Data Loss

If a window has already been finalized and a late event arrives after the allowed lateness period, the event cannot be re‑processed and is effectively lost.

3 Common Causes of Data Delay

Network transmission latency : congestion, packet loss, or retransmission.

Source‑side generation delay : slow database queries, sensor sampling intervals, or batch extraction.

Flink job processing bottlenecks : insufficient parallelism, CPU/memory pressure, or back‑pressure.

Improper watermark configuration : watermark strategy that does not reflect the maximum out‑of‑order bound.

4 Mitigation Strategies

Use **event time** as the time basis for all window calculations.

Define a watermark generation strategy that reflects the maximum expected out‑of‑order delay (e.g., max(eventTime) - allowedLateness).

Configure an **allowed lateness** interval so that windows remain open for a configurable period after the watermark passes the window end.

5 Implementation Steps

Determine the window size that matches the business semantics (e.g., 10 seconds).

Choose an allowed lateness value based on the observed maximum delay (e.g., 3.5 seconds).

Implement a watermark generator that emits watermark = maxObservedEventTime - allowedLateness.

Configure the window operator with the chosen size and allowed lateness.

5.1 Trigger Conditions for Window Computation

The watermark must exceed the window’s end timestamp ( watermark > windowEnd) to guarantee that most events for that window have arrived.

The window must contain at least one element; empty windows are not emitted even if the watermark has passed.

5.2 Concrete Example

Assume the following parameters:

Window size = 10 seconds

Allowed lateness = 3.5 seconds

Watermark formula = max(eventTime) - allowedLateness Trigger logic:

Compute watermark after each incoming event.

If watermark > windowEnd **and** the window holds data, emit the window result.

Illustration of three events:

Event 1 : event time = 8 s. Watermark = 8 – 3.5 = 4.5 s, which is less than the window end (10 s). No trigger.

Event 2 : also arrives late; watermark remains below 10 s, so the window is still not emitted.

Event 3 : arrives with a later event time such that watermark > 10 s. Both trigger conditions are satisfied and the window result is emitted.

5.3 Handling Extremely Late Data

If an event arrives after the allowed lateness period, it cannot be processed by the original window. Typical handling approaches are:

Collect such events in a side output for later batch processing.

Discard them if they are not critical to the application.

FlinkWindowData Delay
JavaEdge
Written by

JavaEdge

First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.