Apache Kyuubi: Architecture, Use Cases, Community, and Mobile Cloud Deployment
This article introduces Apache Kyuubi—a multi‑tenant Thrift JDBC/ODBC service built on Spark—detailing its architecture, advantages over Spark Thrift Server, real‑world use cases, open‑source community progress, and practical deployment strategies on mobile cloud, Kubernetes, and with Trino.
Kyuubi (named after the Chinese "nine‑tailed fox") is a Thrift JDBC/ODBC service that wraps Apache Spark, providing standard JDBC interfaces, multi‑tenant support, and distributed execution for ETL, BI, and other big‑data workloads.
The service layer abstracts the underlying compute framework and storage, allowing users to run SQL without writing Spark code. Kyuubi’s engine layer exposes full Spark SQL capabilities and can run on YARN or Kubernetes, enabling lake‑house integration and adaptive query optimization.
Compared with the traditional Spark Thrift Server, Kyuubi decouples the service from the Spark driver, introduces a persistent KyuubiServer process, and adds a two‑layer service‑discovery module (supporting Zookeeper or etcd) to achieve high availability, load balancing, and multi‑application support.
Key architectural improvements include:
Separation of service and compute engine, allowing independent lifecycle management of engines.
Engine discovery via Engine Space KV store, supporting isolation by server, version, share level, engine type, user, and sub‑domain.
Multiple resource‑isolation strategies (connection‑level, user‑level, group‑level, server‑level) to balance isolation and sharing.
Dynamic resource allocation and executor TTL for elastic resource usage.
Kyuubi’s use cases span replacing HiveServer2 for better performance, enabling Spark‑on‑K8s lake‑house solutions, and integrating with data‑lake formats such as Hudi and Iceberg. Notable adopters include Bilibili, iQIYI, Tencent Cloud, and China Mobile.
The open‑source community started in 2018, entered Apache incubation in 2021, and has released versions 1.3.0, 1.3.1, and 1.4.0. The community encourages contributions, bug reports, documentation improvements, and offers Chinese guides via a public WeChat account.
In the mobile‑cloud scenario, Kyuubi is deployed on the eCloud Lakehouse platform with the following customizations:
Unified resource scheduling with Lakehouse, prohibiting user‑specified engine parameters.
Authentication based on AccessKey/SecretKey.
SQL audit integration with the Lakehouse audit platform.
SQL pre‑analysis and object‑store dynamic loading via Kyuubi plugins.
Deployment on Kubernetes uses Helm3 for managing Kyuubi services, Deployment resources for the server, LoadBalancer services for high availability, and etcd for service discovery instead of Zookeeper.
Kyuubi can also act as a gateway to Trino, providing unified JDBC access, centralized authentication, SQL preprocessing, and audit while delegating query execution to the Trino engine.
Overall, Kyuubi offers a service‑oriented, multi‑tenant, and highly available big‑data SQL platform suitable for cloud‑native lakehouse architectures.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.