Big Data 12 min read

Apache Kyuubi 1.6.0 Feature Overview and Enhancements

The article provides a comprehensive walkthrough of Apache Kyuubi 1.6.0, detailing server‑side enhancements such as batch (JAR) task submission, metadata store and unified API/authentication, client‑side improvements to the built‑in JDBC driver and Beeline, as well as engine plugins for Spark, Flink, Trino and Hive, and concludes with the community’s roadmap and statistics.

DataFunTalk
DataFunTalk
DataFunTalk
Apache Kyuubi 1.6.0 Feature Overview and Enhancements

Introduction – Apache Kyuubi is an open‑source, enterprise‑grade data‑lake exploration platform that acts as a multi‑tenant gateway for Spark, Flink, Trino and other engines, offering SQL query services for ETL, BI, interactive analytics and batch processing.

1. Server‑Side Enhancements

• Batch (JAR) task submission : Kyuubi 1.6.0 adds a RESTful API to submit batch JAR jobs, returning a BatchId that is propagated to Spark and Yarn for tracking, log retrieval, and termination.

• Metadata Store : Stores batch metadata (BatchId, configuration, creator node) and enables HA by allowing any Kyuubi node to query or forward requests via a load‑balanced service discovery mechanism.

• HA & Restart Recovery : Metadata Store allows re‑submission of unfinished batches after a server restart and falls back to Yarn for status when the store is unavailable.

• Unified API & Authentication : Supports Thrift, REST, JDBC and ODBC APIs with both Kerberos and password authentication, unifying access methods across protocols.

2. Client‑Side Enhancements

• Improved JDBC driver : Decoupled from Hive/Hadoop dependencies and adds Kerberos keytab support.

• Enhanced Beeline : Displays Spark stage progress bars, giving users clear visibility into job execution.

• Restful CLI & SDK : Provides a kyuubi‑ctl command‑line tool (create, get, logs, delete, submit) and a programmable SDK for easier integration.

3. Engine Plugins

• Kyuubi Spark Engine : Supports Spark 3.0‑3.3, all deployment modes (local, standalone, Yarn, K8s), includes enterprise plugins for small‑file merging, partition limits, result size caps, Z‑Order optimization, TPC‑DS/TPC‑H connectors and authz plugins.

• Kyuubi Flink Engine : Supports Flink 1.14‑1.15 with local and Yarn (per‑job/session) modes; K8s application mode planned for 1.7.0.

• Kyuubi Trino and Hive/JDBC Engines : Provide production‑ready Trino support and a beta Hive/JDBC engine.

4. Community Outlook

Kyuubi has released four major versions since entering Apache incubation, evolving from a serverless Spark project to a serverless SQL‑on‑Lakehouse solution. The community now has 12 PPMC members, 17 committers, over 96 contributors, and has hosted multiple meetups and participated in events such as ApacheCon.

Statistics: 8 released versions, >1400 merged PRs, >900 resolved issues, and ongoing development toward graduation from the Apache incubator.

For more information, users are encouraged to follow the DataFun public account, watch the video replay, and participate in community surveys or collaborations.

Big DataFlinkbatch processingSparkTrinoApache KyuubiSQL Gateway
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.