Big Data 12 min read

Apache Kyuubi 1.6.0 Feature Overview and Enhancements

The article provides a comprehensive walkthrough of Apache Kyuubi 1.6.0, detailing server‑side enhancements such as batch (JAR) task submission, metadata store and unified API/authentication, client‑side improvements to the built‑in JDBC driver and Beeline, as well as engine plugins for Spark, Flink, Trino and Hive, and concludes with the community’s roadmap and statistics.

DataFunTalk

Mar 12, 2023

Apache Kyuubi 1.6.0 Feature Overview and Enhancements

Introduction – Apache Kyuubi is an open‑source, enterprise‑grade data‑lake exploration platform that acts as a multi‑tenant gateway for Spark, Flink, Trino and other engines, offering SQL query services for ETL, BI, interactive analytics and batch processing.

1. Server‑Side Enhancements

• Batch (JAR) task submission : Kyuubi 1.6.0 adds a RESTful API to submit batch JAR jobs, returning a BatchId that is propagated to Spark and Yarn for tracking, log retrieval, and termination.

• Metadata Store : Stores batch metadata (BatchId, configuration, creator node) and enables HA by allowing any Kyuubi node to query or forward requests via a load‑balanced service discovery mechanism.

• HA & Restart Recovery : Metadata Store allows re‑submission of unfinished batches after a server restart and falls back to Yarn for status when the store is unavailable.

• Unified API & Authentication : Supports Thrift, REST, JDBC and ODBC APIs with both Kerberos and password authentication, unifying access methods across protocols.

2. Client‑Side Enhancements

• Improved JDBC driver : Decoupled from Hive/Hadoop dependencies and adds Kerberos keytab support.

• Enhanced Beeline : Displays Spark stage progress bars, giving users clear visibility into job execution.

• Restful CLI & SDK : Provides a kyuubi‑ctl command‑line tool (create, get, logs, delete, submit) and a programmable SDK for easier integration.

3. Engine Plugins

• Kyuubi Spark Engine : Supports Spark 3.0‑3.3, all deployment modes (local, standalone, Yarn, K8s), includes enterprise plugins for small‑file merging, partition limits, result size caps, Z‑Order optimization, TPC‑DS/TPC‑H connectors and authz plugins.

• Kyuubi Flink Engine : Supports Flink 1.14‑1.15 with local and Yarn (per‑job/session) modes; K8s application mode planned for 1.7.0.

• Kyuubi Trino and Hive/JDBC Engines : Provide production‑ready Trino support and a beta Hive/JDBC engine.

4. Community Outlook

Kyuubi has released four major versions since entering Apache incubation, evolving from a serverless Spark project to a serverless SQL‑on‑Lakehouse solution. The community now has 12 PPMC members, 17 committers, over 96 contributors, and has hosted multiple meetups and participated in events such as ApacheCon.

Statistics: 8 released versions, >1400 merged PRs, >900 resolved issues, and ongoing development toward graduation from the Apache incubator.

For more information, users are encouraged to follow the DataFun public account, watch the video replay, and participate in community surveys or collaborations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink Batch Processing Spark Trino Apache Kyuubi SQL Gateway

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.