Big Data 8 min read

Master Flink Optimizations: TTL, Mini‑Batch, Two‑Phase Aggregation, Lookup Join & More

This article reviews the most effective Flink optimization techniques since 2022, including operator‑level TTL, mini‑batch processing, two‑phase aggregation, multi‑dimensional DISTINCT with FILTER, lookup join caching strategies, and TopN implementations, each rated with recommendation stars for production use.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Master Flink Optimizations: TTL, Mini‑Batch, Two‑Phase Aggregation, Lookup Join & More

Since its inception, Flink has evolved through several major versions, and its optimization methods have continuously advanced. Based on community-shared documents from 2022 onward, we summarize the most commonly used optimizations and rate them with recommendation stars for practical production guidance.

Operator‑Level TTL Setting (⭐️⭐️⭐️)

Starting from Flink 1.18, Table API & SQL support fine‑grained state TTL configuration to reduce state size. This can be set via hints on cloud platforms. For example, a job with a sorting operator and an aggregation operator can keep task‑level state at 4 h while assigning 24 h TTL to the sorting operator to prevent disorder.

Mini‑Batch & Two‑Phase Aggregation (⭐️⭐️⭐️⭐️⭐️)

When latency requirements are modest (e.g., minute‑level updates), enabling mini‑batch reduces state access and update frequency. Suitable scenarios include:

Job has no strict latency requirement (mini‑batch adds delay);
Aggregate operator state access is a bottleneck;
Downstream operators have limited processing capacity (mini‑batch reduces output volume).

Key configuration:

table.exec.mini-batch.enabled: true
table.exec.mini-batch.allow-latency: 5s   // user‑defined
table.exec.mini-batch.size: 2000          // user‑defined

Note that mini‑batch mainly benefits state‑intensive operators (e.g., aggregation, join) and may increase memory usage. It can also conflict with checkpointing, causing duplicate computation, and changes the DAG, potentially leading to state incompatibility.

Flink SQL’s two‑phase aggregation (Two‑Phase Aggregation) mitigates data skew. Enable it with:

'table.optimizer.agg-phase-strategy': 'TWO_PHASE';
'table.optimizer.distinct-agg.split.enabled' = 'true';
'table.optimizer.distinct-agg.split.bucket-num' = '2048';  // increase bucket count as needed

When the execution plan contains both LocalAggregate and GlobalAggregate, the two‑phase aggregation is active.

Multi‑Dimensional DISTINCT Using FILTER (⭐️⭐️)

Instead of multiple CASE WHEN clauses, FILTER can simplify distinct counts across dimensions, allowing Flink to share a single state instance and reduce state size and access. Improper use may cause data hotspots, so apply cautiously.

SELECT a,
       COUNT(DISTINCT b) AS b,
       COUNT(DISTINCT b) FILTER (WHERE c IN ('A', B)) AS aa,
       COUNT(DISTINCT b) FILTER (WHERE c IN ('C', D)) AS bb
FROM T
GROUP BY a;

Lookup Join Optimization (⭐️⭐️⭐️⭐️⭐️)

Flink offers three cache strategies for lookup joins:

Full Caching – cache all data in memory (suitable for small datasets).

Partial Caching – use an LRU cache for large datasets.

No Caching – disable caching.

Full and Partial caching are highly recommended; configuring appropriate TTL and size can greatly reduce pressure on dimension tables.

TopN Optimization (⭐️⭐️⭐️⭐️⭐️)

Flink implements TopN via the OVER clause with three methods: AppendRank, UpdateFastRank, and RetractRank, whose performance decreases in that order. Key considerations:

Avoid outputting the row_number field itself to reduce downstream data volume. row_number supports mini‑batch and can be combined.

Process row_number at the earliest ODS layer to minimize disorder and downstream load.

Other Optimizations

Additional tweaks include primary‑key optimization for dual‑stream joins, adjusting multi‑stream join order to mitigate state explosion, and DAG sub‑graph reuse. Their impact varies and should be evaluated per workload.

OptimizationBig DataFlinkTTLTwo-Phase AggregationMini-BatchLookup Join
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.