Big Data 14 min read

How Flink SQL Powered Real‑Time Learning Analytics at Zuoyebang

Zuoyebang’s big‑data team shares how they evolved from SparkStreaming to a Flink‑SQL‑centric real‑time platform, detailing three development stages, challenges in DAG optimization, Redis‑based table design, and platform features for unified deployment, ease of use, and operational governance.

Zuoyebang Tech Team

May 9, 2022

How Flink SQL Powered Real‑Time Learning Analytics at Zuoyebang

Development History

Zuoyebang uses AI and big‑data technologies to provide efficient learning solutions. Data from student attendance and knowledge mastery is collected, written to Kafka, and processed by both real‑time and batch jobs before being served via OLAP‑based tools.

The real‑time computing stack is primarily based on Flink and has gone through three stages:

2019: A few SparkStreaming jobs were used, but development efficiency was low and data reuse was poor.

2020: Flink JARs were gradually adopted, then Flink SQL was introduced. By the end of 2020, over 90% of real‑time jobs were implemented with Flink SQL.

Nov 2020: Hundreds of Flink jobs were deployed across multiple cloud clusters, completing the transition from 0 to 1 for the real‑time platform.

Flink SQL Practice

The complete data‑flow architecture based on Flink SQL starts with binlog/log ingestion into Kafka, which automatically registers as a metadata table. Users can query this table directly in SQL jobs without defining complex DDL.

Two typical problems were encountered when adding trace capabilities to SQL jobs:

SQL DAGs were split into unrelated parts, causing duplicate source reads and multiple UDF invocations.

Performance bottlenecks in source pressure and computation.

Optimization involved merging transformations into a single StreamGraph, resulting in a DAG that matches the logical SQL plan and improves performance.

Trace can be added to view fields via a simple statement, e.g., prepare_data.trace.fields=f0,f1, providing richer observability than traditional logs.

Table Design and Redis Connector

To improve developer efficiency, tables need good layering, reuse, and templating. A Redis‑based solution was chosen, offering high QPS, low latency, automatic TTL handling, and reduced memory pressure through protobuf serialization.

Key features of the Redis table include:

Primary key definition (e.g., uid+lesson_id).

Secondary index fields for fast lookup (e.g., lesson_id).

Kafka connector for triggering messages, ensuring consistency and ordering between Redis and Kafka.

Platform Construction

After the data‑flow architecture was in place, the number of real‑time jobs grew to several hundred by late 2020. The platform was built with three main goals:

Unified : Provide a single entry point for task submission across different cloud providers, Flink versions, and clusters, reducing operational risk and migration cost.

Easy to Use : Offer debugging, semantic checks, version history, and rollback capabilities to improve developer productivity.

Standardized : Enforce permission control, workflow approval, and coding standards to ensure long‑term maintainability.

Monitoring and Alerting

Flink jobs run on YARN; Prometheus scrapes YARN containers for metrics. Two challenges were solved:

Dynamic discovery of Prometheus reporters via ZooKeeper registration.

Unified Kafka lag monitoring by exposing records‑lag metric consistently across Kafka versions.

Summary and Outlook

The adoption of Flink SQL dramatically accelerated real‑time job development, but it also obscured low‑level details, making troubleshooting harder. Future work includes supporting elastic scaling, easing upgrades to newer Flink versions, and exploring unified batch‑stream processing in production scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Real-time Flink SQL streaming Platform

Written by

Zuoyebang Tech Team

Sharing technical practices from Zuoyebang

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.