Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example
This article provides an in‑depth overview of Flink SQL, covering its origins, the latest 1.7.0 and 1.8.0 enhancements, the underlying programming model, common operators and built‑in functions, and a complete end‑to‑end example that analyzes NBA scoring‑leader data using Flink SQL.
Flink SQL is a SQL‑compatible API built on Apache Flink to simplify real‑time stream and batch processing, lowering the entry barrier for users by offering a declarative, optimizable language.
The framework originated from Alibaba's research starting in 2015, leading to the Blink project and the first open‑source implementation of Flink SQL in early 2019.
Key reasons for choosing SQL include its declarative nature, built‑in optimizers, low learning curve, stability, and ability to unify stream and batch processing.
Latest Features (Flink 1.7.0 & 1.8.0)
Version 1.7.0 adds Scala 2.12 support, state evolution, MATCH_RECOGNIZE streaming SQL, temporal tables, new built‑in functions (TO_BASE64, LOG2, etc.), a Kafka 2.0 connector, and local recovery for faster restarts.
Version 1.8.0 introduces state TTL cleanup for RocksDB and heap backends, new CSV format descriptors, deprecation of static TableEnvironment factories, updated Table API Maven modules, and an enhanced KafkaDeserializationSchema.
Programming Model
A Flink program consists of a source operator (e.g., MySQL, Kafka), transformation operators (e.g., SELECT, JOIN, WINDOW), and a sink operator (e.g., Kafka sink). The classic WordCount example demonstrates the difference between DataSet/DataStream API and Flink SQL.
public class WordCount {
public static void main(String[] args) throws Exception {
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<String> text = env.fromElements("Hello", "Flink", "Hello", "Blink");
DataSet<Tuple2<String, Integer>> counts =
text.flatMap(new LineSplitter())
.groupBy(0)
.sum(1);
counts.print();
}
public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
String[] tokens = value.toLowerCase().split("\\W+");
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<String, Integer>(token, 1));
}
}
}
}
}A simple SQL query shows the convenience of the API:
SELECT word, COUNT(word) FROM table GROUP BY word;SQL Syntax and Operators
Flink SQL follows ANSI‑SQL standards and supports INSERT, SELECT, WHERE, DISTINCT, GROUP BY, UNION/UNION ALL, and various JOIN types (INNER, LEFT, RIGHT, FULL). Window functions such as TUMBLE, HOP, and SESSION are also available.
SELECT user, TUMBLE_START(rowtime, INTERVAL '1' DAY) AS wStart,
SUM(amount) FROM Orders
GROUP BY TUMBLE(rowtime, INTERVAL '1' DAY), user;Built‑in Functions
Functions are grouped into comparison, logical, arithmetic, string, and time categories. Examples include value1=value2, A OR B, ABS(n), UPPER(str), DATE(string), and CURRENT_DATE.
Practical Application: NBA Scoring Leaders
The article walks through a complete Flink SQL project that reads a CSV file of NBA scoring‑leader statistics, maps each line to a PlayerData POJO, registers it as a table, and runs a SQL query to find the top three players with the most scoring‑leader titles.
DataSet<String> input = env.readTextFile("score.csv");
DataSet<PlayerData> topInput = input.map(new MapFunction<String, PlayerData>() {
@Override
public PlayerData map(String s) throws Exception {
String[] split = s.split(",");
return new PlayerData(split[0], split[1], split[2],
Integer.valueOf(split[3]), Double.valueOf(split[4]),
Double.valueOf(split[5]), Double.valueOf(split[6]),
Double.valueOf(split[7]), Double.valueOf(split[8]));
}
});
Table topScore = tableEnv.fromDataSet(topInput);
tableEnv.registerTable("score", topScore);
Table queryResult = tableEnv.sqlQuery(
"SELECT player, COUNT(season) AS num FROM score GROUP BY player ORDER BY num DESC LIMIT 3");
DataSet<Result> result = tableEnv.toDataSet(queryResult, Result.class);
result.print();The Maven dependencies required for the project are listed, and an alternative CSV sink is shown for persisting results.
<properties>
<flink.version>1.7.1</flink.version>
<scala.binary.version>2.11</scala.binary.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>${flink.version}</version>
</dependency>
...
</dependencies>Running the program prints the top three players (e.g., Michael Jordan, Kevin Durant, Allen Iverson) and demonstrates how Flink SQL dramatically reduces development effort for stream processing tasks.
Conclusion
The guide summarizes Flink SQL’s background, core capabilities, programming model, operators, and functions, and illustrates its practical use with a real‑world example, emphasizing that Flink SQL is an essential, easy‑to‑use tool for solving streaming analytics problems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
