Big Data 33 min read

Building a Unified Data Empowerment Layer with Apache Kyuubi at GF Securities

The article describes how GF Securities designed and implemented a unified big‑data empowerment layer based on Apache Kyuubi to address data‑centric challenges, improve efficiency, ensure controllable governance, and support agile data scenarios across ingestion, processing, storage, and security.

DataFunTalk
DataFunTalk
DataFunTalk
Building a Unified Data Empowerment Layer with Apache Kyuubi at GF Securities

GF Securities became one of the first brokerages to achieve the national DCMM data‑management maturity level in November 2023, and now operates tens of thousands of Kyuubi jobs as the core of its data‑governance and key‑data system.

The author, a senior big‑data platform architect, explains the strategic background of the “digital‑middle‑platform” initiative, the four transformation goals of “efficiency and controllability”, and how Apache Kyuubi is used to build a unified data‑empowerment layer.

Current platform bottlenecks include inconsistent query and processing interfaces, overly complex component integration, difficulty evolving service versions, and fragmented permission controls that hinder financial‑industry compliance.

To solve these issues, the proposed empowerment layer targets six macro challenges—data ingestion, output, collaborative computing, complex association, governance, and systematic processing—while also addressing micro‑level needs such as hybrid data shapes, metric‑oriented outputs, and elastic workloads.

Apache Kyuubi serves as a multi‑tenant, server‑less SQL gateway that unifies access via Hive Thrift/JDBC, supports SparkSQL (compatible with HiveSQL), Iceberg DML, dynamic resource allocation, and fine‑grained Ranger‑based authentication and authorization, including row‑level filtering and column masking.

The implementation follows a four‑stage rollout: replace ad‑hoc queries, pilot batch jobs, mature large‑scale data processing, and finally controlled open access with integrated Ranger policies.

Measured benefits include up to 50% runtime reduction, 100% improvement in development speed, and 100% enhancement in operations and governance, thanks to standardized SQL interfaces, dynamic resource management, and unified permission services.

Future work will explore adding Flink support, extending Kyuubi’s JDBC capabilities, improving FE protocols (MySQL, PostgreSQL, FlightSQL), and advancing the Authz plugin to handle multi‑catalog policies and additional data‑lake commands.

Big Datadata platformData GovernancerangerSparkApache KyuubiData Empowerment
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.