Big Data 19 min read

Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices

Tencent Cloud Oceanus, a computing service powering internal apps like WeChat and external partners such as Bilibili, scales to over 30,000 cores handling 5 PB daily and 500,000 jobs, and tackles Flink SQL’s syntax, function and operational limits with table‑valued functions, incremental and enhanced tumble windows, and caching‑based retraction optimization that cuts downstream data volume up to 30× and improves join performance by about 20 %.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices

This article introduces Tencent Cloud's real-time computing service Oceanus and its Flink SQL optimization work. Currently serving both internal clients (WeChat, QQ, QQ Music, Tencent Video) and external clients (Bilibili, Dingdong Maicai), Oceanus has scaled to over 30,000 cores, with daily data intake exceeding 5PB and real-time computation surpassing 500,000 jobs.

The service provides a one-stop development platform with integrated testing, deployment to TKE containers, and comprehensive operational tools. The ecosystem supports connectivity with major big data components and Tencent Cloud services.

Flink SQL Pain Points: The article identifies three main challenges: (1) SQL syntax support - Flink SQL hasn't fully covered standard SQL features and has inconsistent syntax across versions; (2) Function coverage - SQL cannot fully cover DataStream/DataSet API capabilities, and complex DAGs are difficult to express in SQL; (3) Operational support - limited optimization capabilities for business logic, and difficulty in problem diagnosis due to the opaque nature of translated jobs.

Optimization Solutions: The team implemented several key enhancements: (1) Table-Valued Function syntax based on SQL2016 standard, enabling window operations directly in FROM clause and supporting multi-stream JOINs with window semantics; (2) Incremental windows with custom triggers for real-time metrics like PV curves; (3) Enhanced Tumble windows for handling late-arriving data; (4) Retraction optimization through caching mechanisms, reducing downstream data volume by up to 30x for idempotent sinks and improving performance by ~20% for Inner Join scenarios.

Performance OptimizationReal-time Processingstream computingBig DataTencent CloudOceanusFlink SQL
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.