Feature Overview of Apache Kyuubi (Incubating) v1.5.0
The article presents a detailed technical walkthrough of Apache Kyuubi 1.5.0, covering its service‑oriented architecture, high‑availability design, multi‑engine extensions for Spark, Flink, Trino and Hive, enhanced engine‑sharing policies, POOL mode configuration, and the project’s future roadmap.
Apache Kyuubi (Incubating) 1.5.0 introduces major architectural and functional updates for serving SQL workloads on Spark, Flink, Trino and upcoming Hive engines.
Architecture Design
The early “fat client” model (e.g., Hive, Spark‑SQL) launches a heavyweight driver on the client node, providing strong isolation but low resource utilization. Kyuubi evolves toward a “thin client” service model where an API layer exposes a unified endpoint (compatible with Hive Thrift) and delegates execution to shared engine processes, improving resource efficiency while adding HA via Zookeeper.
Separating the API service from the engine process reduces resource demands, enables RPC‑based communication, supports multiple engine versions (Spark, Flink, etc.), and allows the API to control engine lifecycles—starting engines on demand and shutting them down after idle periods.
High‑Availability Elastic Architecture
By defining routing rules in Zookeeper, Kyuubi can balance isolation and sharing. For ad‑hoc queries, a USER‑level rule routes a user’s sessions to a dedicated engine; for batch jobs, a CONNECTION‑level rule gives each query its own engine, ensuring strong isolation.
Engine Extensions
1. Spark Engine – Full lifecycle management across deployment modes, compatibility with Spark 3.0+, and enterprise‑grade plugins such as enhanced AQE, automatic small‑file merging, partition‑scan limits, result‑size caps, and Z‑Order optimization.
2. Flink Engine – Added in 1.5 with support for Flink 1.14. Differences include deployment models (Flink Session vs. per‑job) and the role of JobManager versus Spark Driver. Kyuubi now supports Flink Application mode, aligning its lifecycle with Kyuubi’s engine management.
3. Trino and Hive Engines – Community contributions add Trino support (treated as a long‑running service) and plans for dynamic HiveServer2 launch and release.
Feature Enhancements
Kyuubi now offers four engine‑sharing levels (CONNECTION, USER, SERVER, GROUP) plus a new GROUP level in 1.5, each with idle‑release policies. The POOL mode can be combined with any sharing level to create a pool of engines for a group, improving concurrency for workloads such as frequent dashboard refreshes.
Configuration example for POOL mode:
kyuubi.engine.share.level=GROUP
kyuubi.engine.pool.name=normal-pool
kyuubi.engine.pool.size=3Routing examples illustrate how users A‑D are directed to specific subdomains (e.g., normal-pool-1 , sla-pool-2 ) based on parameters like pool_name and subdomain , enabling fine‑grained isolation or queue‑based routing.
Future Outlook
Since joining the Apache Incubator in June 2021, Kyuubi has released three feature versions, grown to 71 contributors, and shifted its vision from “Serverless Spark” to “Serverless SQL on Lakehouse.” Ongoing community activity and upcoming engine support signal a robust roadmap.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.