Big Data 13 min read

Feature Overview of Apache Kyuubi (Incubating) v1.5.0

The article presents a detailed technical walkthrough of Apache Kyuubi 1.5.0, covering its service‑oriented architecture, high‑availability design, multi‑engine extensions for Spark, Flink, Trino and Hive, enhanced engine‑sharing policies, POOL mode configuration, and the project’s future roadmap.

DataFunSummit

Oct 18, 2022

Feature Overview of Apache Kyuubi (Incubating) v1.5.0

Apache Kyuubi (Incubating) 1.5.0 introduces major architectural and functional updates for serving SQL workloads on Spark, Flink, Trino and upcoming Hive engines.

Architecture Design

The early “fat client” model (e.g., Hive, Spark‑SQL) launches a heavyweight driver on the client node, providing strong isolation but low resource utilization. Kyuubi evolves toward a “thin client” service model where an API layer exposes a unified endpoint (compatible with Hive Thrift) and delegates execution to shared engine processes, improving resource efficiency while adding HA via Zookeeper.

Separating the API service from the engine process reduces resource demands, enables RPC‑based communication, supports multiple engine versions (Spark, Flink, etc.), and allows the API to control engine lifecycles—starting engines on demand and shutting them down after idle periods.

High‑Availability Elastic Architecture

By defining routing rules in Zookeeper, Kyuubi can balance isolation and sharing. For ad‑hoc queries, a USER‑level rule routes a user’s sessions to a dedicated engine; for batch jobs, a CONNECTION‑level rule gives each query its own engine, ensuring strong isolation.

Engine Extensions

1. Spark Engine – Full lifecycle management across deployment modes, compatibility with Spark 3.0+, and enterprise‑grade plugins such as enhanced AQE, automatic small‑file merging, partition‑scan limits, result‑size caps, and Z‑Order optimization.

2. Flink Engine – Added in 1.5 with support for Flink 1.14. Differences include deployment models (Flink Session vs. per‑job) and the role of JobManager versus Spark Driver. Kyuubi now supports Flink Application mode, aligning its lifecycle with Kyuubi’s engine management.

3. Trino and Hive Engines – Community contributions add Trino support (treated as a long‑running service) and plans for dynamic HiveServer2 launch and release.

Feature Enhancements

Kyuubi now offers four engine‑sharing levels (CONNECTION, USER, SERVER, GROUP) plus a new GROUP level in 1.5, each with idle‑release policies. The POOL mode can be combined with any sharing level to create a pool of engines for a group, improving concurrency for workloads such as frequent dashboard refreshes.

Configuration example for POOL mode:

kyuubi.engine.share.level=GROUP
kyuubi.engine.pool.name=normal-pool
kyuubi.engine.pool.size=3

Routing examples illustrate how users A‑D are directed to specific subdomains (e.g., normal-pool-1, sla-pool-2) based on parameters like pool_name and subdomain, enabling fine‑grained isolation or queue‑based routing.

Future Outlook

Since joining the Apache Incubator in June 2021, Kyuubi has released three feature versions, grown to 71 contributors, and shifted its vision from “Serverless Spark” to “Serverless SQL on Lakehouse.” Ongoing community activity and upcoming engine support signal a robust roadmap.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink Spark Engine Architecture Apache Kyuubi SQL Gateway

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.