Big Data 19 min read

Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices

Tencent Cloud Oceanus, a computing service powering internal apps like WeChat and external partners such as Bilibili, scales to over 30,000 cores handling 5 PB daily and 500,000 jobs, and tackles Flink SQL’s syntax, function and operational limits with table‑valued functions, incremental and enhanced tumble windows, and caching‑based retraction optimization that cuts downstream data volume up to 30× and improves join performance by about 20 %.

Tencent Cloud Developer

May 21, 2021

Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices

This article introduces Tencent Cloud's real-time computing service Oceanus and its Flink SQL optimization work. Currently serving both internal clients (WeChat, QQ, QQ Music, Tencent Video) and external clients (Bilibili, Dingdong Maicai), Oceanus has scaled to over 30,000 cores, with daily data intake exceeding 5PB and real-time computation surpassing 500,000 jobs.

The service provides a one-stop development platform with integrated testing, deployment to TKE containers, and comprehensive operational tools. The ecosystem supports connectivity with major big data components and Tencent Cloud services.

Flink SQL Pain Points: The article identifies three main challenges: (1) SQL syntax support - Flink SQL hasn't fully covered standard SQL features and has inconsistent syntax across versions; (2) Function coverage - SQL cannot fully cover DataStream/DataSet API capabilities, and complex DAGs are difficult to express in SQL; (3) Operational support - limited optimization capabilities for business logic, and difficulty in problem diagnosis due to the opaque nature of translated jobs.

Optimization Solutions: The team implemented several key enhancements: (1) Table-Valued Function syntax based on SQL2016 standard, enabling window operations directly in FROM clause and supporting multi-stream JOINs with window semantics; (2) Incremental windows with custom triggers for real-time metrics like PV curves; (3) Enhanced Tumble windows for handling late-arriving data; (4) Retraction optimization through caching mechanisms, reducing downstream data volume by up to 30x for idempotent sinks and improving performance by ~20% for Inner Join scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Big Data Real-time Processing stream computing Tencent Cloud Oceanus Flink SQL

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.