Big Data 16 min read

Integrating Apache Flink 1.12 with Hive: Configuration, Catalog, Planner, and UDF Usage

This guide explains how to integrate Flink 1.12 with Hive using HiveCatalog, covering required dependencies, Blink planner configuration, SQL dialect switching, Hive UDF support, temporal table joins, and provides complete code snippets for a streaming‑batch unified data warehouse solution.

Big Data Technology & Architecture

Jan 10, 2021

Integrating Apache Flink 1.12 with Hive: Configuration, Catalog, Planner, and UDF Usage

Previously, real‑time data platforms built with Flink relied on periodic batch sync to offline warehouses; with Flink 1.12, the new HiveCatalog enables true batch‑and‑stream integration, allowing Flink to read and write Hive tables both as a batch engine and as a streaming source.

Integration consists of two layers: using Hive Metastore as a persistent catalog so that Flink metadata (e.g., Kafka or Elasticsearch tables) can be stored and later reused, and enabling Flink to directly read/write Hive tables.

HiveCatalog is designed for out‑of‑the‑box compatibility; no changes to the existing Hive Metastore, table locations, or partitions are required.

Flink 1.12 Support for Hive

Since Flink 1.11, Hive dialect SQL is supported, allowing users to write Hive‑compatible SQL statements. Compatibility depends on the Hive version (e.g., built‑in functions require Hive ≥ 1.2.0, primary‑key constraints need Hive ≥ 3.1.0, etc.).

To enable Hive integration, add the necessary JARs to Flink's /lib directory and set the Hadoop classpath: export HADOOP_CLASSPATH=`hadoop classpath` Two ways to add Hive dependencies: use the Hive JARs provided by Flink matching the Metastore version, or manually add each required JAR. The recommended approach is to prefer Flink‑provided Hive JARs.

Example Maven dependencies:

<!-- Flink Dependency -->
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-connector-hive_2.11</artifactId>
  <version>1.12.0</version>
  <scope>provided</scope>
</dependency>

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-table-api-java-bridge_2.11</artifactId>
  <version>1.12.0</version>
  <scope>provided</scope>
</dependency>

<!-- Hive Dependency -->
<dependency>
  <groupId>org.apache.hive</groupId>
  <artifactId>hive-exec</artifactId>
  <version>${hive.version}</version>
  <scope>provided</scope>
</dependency>

Using Blink Planner to Connect Hive

Flink 1.12 requires the Blink planner for Hive table read/write. Example Java code to create a Blink‑planner environment and register a HiveCatalog:

EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().build();
TableEnvironment tableEnv = TableEnvironment.create(settings);

String name = "myhive";
String defaultDatabase = "mydatabase";
String hiveConfDir = "/opt/hive-conf";

HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("myhive", hive);

Configuration of sql-client-defaults.yaml must set the planner to Blink and define the current catalog and database.

Switching to Hive SQL Dialect

Set table.sql-dialect=hive in sql-client-defaults.yaml or via the SQL client: Flink SQL> set table.sql-dialect=hive; In Table API, the dialect can be changed with:

tableEnv.getConfig().setSqlDialect(SqlDialect.HIVE);

Using Hive UDFs in Flink

Flink can load Hive built‑in functions through HiveModule. Supported UDF types include UDF, GenericUDF, GenericUDTF, UDAF, and GenericUDAFResolver2, which are automatically converted to Flink scalar, table, or aggregate functions.

Example of loading a Hive module:

String name = "myhive";
String version = "2.3.4";
tableEnv.loadModule(name, new HiveModule(version));

Temporal Table Join with Hive

Flink 1.12 introduces processing‑time temporal joins with Hive partitions, allowing automatic reloading of the latest Hive partition as a dimension table. Example DDL and query illustrate creating a partitioned Hive table, enabling streaming source options, and performing a temporal join.

SET table.sql-dialect=hive;
CREATE TABLE dimension_table (
  product_id STRING,
  product_name STRING,
  unit_price DECIMAL(10,4),
  ...
) PARTITIONED BY (pt_year STRING, pt_month STRING, pt_day STRING)
TBLPROPERTIES (
  'streaming-source.enable'='true',
  'streaming-source.partition.include'='latest',
  'streaming-source.monitor-interval'='12 h',
  'streaming-source.partition-order'='partition-name'
);

SET table.sql-dialect=default;
CREATE TABLE orders_table (
  order_id STRING,
  order_amount DOUBLE,
  product_id STRING,
  log_ts TIMESTAMP(3),
  proctime as PROCTIME()
) WITH (...);

SELECT * FROM orders_table AS o
JOIN dimension_table FOR SYSTEM_TIME AS OF o.proctime AS d
ON o.product_id = d.product_id;

Demo Application

A complete Scala demo shows environment initialization, Blink planner setup, Hive catalog registration, Kafka source table creation, and subsequent Hive read/write operations, illustrating a typical streaming‑batch unified architecture.

val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
streamEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
streamEnv.setParallelism(3)

val tableEnvSettings = EnvironmentSettings.newInstance()
    .useBlinkPlanner()
    .inStreamingMode()
    .build()
val tableEnv = StreamTableEnvironment.create(streamEnv, tableEnvSettings)
// ... register HiveCatalog, create Kafka source table, etc.

The article concludes that Flink 1.12 dramatically simplifies real‑time data warehouse construction by providing seamless batch‑stream integration with Hive.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink SQL Streaming Hive Table API Blink Planner

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.