Big Data 27 min read

Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example

This article provides an in‑depth overview of Flink SQL, covering its origins, the latest 1.7.0 and 1.8.0 enhancements, the underlying programming model, common operators and built‑in functions, and a complete end‑to‑end example that analyzes NBA scoring‑leader data using Flink SQL.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example

Flink SQL is a SQL‑compatible API built on Apache Flink to simplify real‑time stream and batch processing, lowering the entry barrier for users by offering a declarative, optimizable language.

The framework originated from Alibaba's research starting in 2015, leading to the Blink project and the first open‑source implementation of Flink SQL in early 2019.

Key reasons for choosing SQL include its declarative nature, built‑in optimizers, low learning curve, stability, and ability to unify stream and batch processing.

Latest Features (Flink 1.7.0 & 1.8.0)

Version 1.7.0 adds Scala 2.12 support, state evolution, MATCH_RECOGNIZE streaming SQL, temporal tables, new built‑in functions (TO_BASE64, LOG2, etc.), a Kafka 2.0 connector, and local recovery for faster restarts.

Version 1.8.0 introduces state TTL cleanup for RocksDB and heap backends, new CSV format descriptors, deprecation of static TableEnvironment factories, updated Table API Maven modules, and an enhanced KafkaDeserializationSchema.

Programming Model

A Flink program consists of a source operator (e.g., MySQL, Kafka), transformation operators (e.g., SELECT, JOIN, WINDOW), and a sink operator (e.g., Kafka sink). The classic WordCount example demonstrates the difference between DataSet/DataStream API and Flink SQL.

public class WordCount {
    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        DataSet<String> text = env.fromElements("Hello", "Flink", "Hello", "Blink");
        DataSet<Tuple2<String, Integer>> counts =
                text.flatMap(new LineSplitter())
                    .groupBy(0)
                    .sum(1);
        counts.print();
    }
    public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
        @Override
        public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
            String[] tokens = value.toLowerCase().split("\\W+");
            for (String token : tokens) {
                if (token.length() > 0) {
                    out.collect(new Tuple2<String, Integer>(token, 1));
                }
            }
        }
    }
}

A simple SQL query shows the convenience of the API:

SELECT word, COUNT(word) FROM table GROUP BY word;

SQL Syntax and Operators

Flink SQL follows ANSI‑SQL standards and supports INSERT, SELECT, WHERE, DISTINCT, GROUP BY, UNION/UNION ALL, and various JOIN types (INNER, LEFT, RIGHT, FULL). Window functions such as TUMBLE, HOP, and SESSION are also available.

SELECT user, TUMBLE_START(rowtime, INTERVAL '1' DAY) AS wStart,
       SUM(amount) FROM Orders
GROUP BY TUMBLE(rowtime, INTERVAL '1' DAY), user;

Built‑in Functions

Functions are grouped into comparison, logical, arithmetic, string, and time categories. Examples include value1=value2, A OR B, ABS(n), UPPER(str), DATE(string), and CURRENT_DATE.

Practical Application: NBA Scoring Leaders

The article walks through a complete Flink SQL project that reads a CSV file of NBA scoring‑leader statistics, maps each line to a PlayerData POJO, registers it as a table, and runs a SQL query to find the top three players with the most scoring‑leader titles.

DataSet<String> input = env.readTextFile("score.csv");
DataSet<PlayerData> topInput = input.map(new MapFunction<String, PlayerData>() {
    @Override
    public PlayerData map(String s) throws Exception {
        String[] split = s.split(",");
        return new PlayerData(split[0], split[1], split[2],
            Integer.valueOf(split[3]), Double.valueOf(split[4]),
            Double.valueOf(split[5]), Double.valueOf(split[6]),
            Double.valueOf(split[7]), Double.valueOf(split[8]));
    }
});
Table topScore = tableEnv.fromDataSet(topInput);
tableEnv.registerTable("score", topScore);
Table queryResult = tableEnv.sqlQuery(
    "SELECT player, COUNT(season) AS num FROM score GROUP BY player ORDER BY num DESC LIMIT 3");
DataSet<Result> result = tableEnv.toDataSet(queryResult, Result.class);
result.print();

The Maven dependencies required for the project are listed, and an alternative CSV sink is shown for persisting results.

<properties>
    <flink.version>1.7.1</flink.version>
    <scala.binary.version>2.11</scala.binary.version>
</properties>
<dependencies>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-core</artifactId>
        <version>${flink.version}</version>
    </dependency>
    ...
</dependencies>

Running the program prints the top three players (e.g., Michael Jordan, Kevin Durant, Allen Iverson) and demonstrates how Flink SQL dramatically reduces development effort for stream processing tasks.

Conclusion

The guide summarizes Flink SQL’s background, core capabilities, programming model, operators, and functions, and illustrates its practical use with a real‑world example, emphasizing that Flink SQL is an essential, easy‑to‑use tool for solving streaming analytics problems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data engineeringBig Datastream processingSQLApache FlinkFlink SQL
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.