Big Data 14 min read

Architecture Evolution and Capability Building of the Smart Acceleration Engine in the 58 Big Data Platform

The article details the background, architectural challenges, and comprehensive redesign of the Smart Acceleration Engine—including multi‑tenant support, cross‑datacenter scheduling, enriched engine selection, parsing and forwarding enhancements, compatibility adaptations, stability fixes, containerized deployment, and performance gains—demonstrating significant operational improvements and future directions for the platform.

58 Tech

Dec 18, 2024

Architecture Evolution and Capability Building of the Smart Acceleration Engine in the 58 Big Data Platform

Smart Acceleration Engine is a self‑developed complex computing component of the 58 Big Data Platform, playing a crucial role in supporting business growth and platform stability. With the maturity of big data technologies and rapid development of AIGC, the engine is being iteratively upgraded to achieve notable cost‑reduction and efficiency gains.

Architecture Analysis

The platform’s systematic architecture (see Fig. 1) includes the Smart Acceleration Engine, which provides efficient parsing, flexible forwarding, and strong execution capabilities for ad‑hoc query scenarios. However, increasing data volume and business scale expose several issues:

High code coupling across modules, leading to complex maintenance.

Strong coupling with Hive source code, limiting multi‑engine extensibility and hindering unified SQL entry.

Gateway service also handling compute tasks, causing node overload and long query times.

Resource and business isolation needs across data‑center pools, with cross‑datacenter scheduling bandwidth constraints.

To address these, the architecture is upgraded to use Apache Kyuubi as an independent gateway and introduce StarRocks as an additional compute engine (Fig. 3).

Capability Building

3.1 Multi‑tenant Architecture Refactor

Instead of proxy‑user authentication, Kyuubi’s engine startup now uses doAs to simulate real user sessions, enabling true multi‑tenant support.

3.2 Group Isolation and Multi‑engine Support

Namespaces provide physical isolation for different data‑centers, while logical refactoring allows multiple nodes and services within the same namespace.

3.3 Enhanced Cross‑Datacenter Scheduling

SQL dispatch logic now considers account and data volume, routing queries to appropriate data‑centers and engines (Fig. 7).

3.4 Rich Engine Selection Strategies

Beyond the fixed engine choice, strategies such as RANDOM, LEASTACTIVE, and WEIGHT are added to balance load and avoid single‑node bottlenecks (Fig. 8).

3.5 Parsing and Forwarding Improvements

SQL parsing now fully supports StarRocks syntax; SQLGlot provides dialect rewriting; HBO leverages historical run‑time data for optimal plan selection; an AI matrix predicts the best execution plan using machine‑learning models.

Compatibility Adaptation

Extensive adaptations ensure Spark‑StarRocks compatibility across syntax, metadata binding, query optimization, and execution phases, including Java UDF support, Hive catalog cache disabling, and handling of StarRocks‑unsupported functions.

Stability Fixes

Issues such as FE node hangs, BE crashes, and CBO‑generated massive SQL causing memory overflow are mitigated, and Hive statistics collection is bypassed for better stability (Fig. 11).

Usability Enhancements

Java UDFs can now be fetched from HDFS, and SQL black‑list persistence is moved to metadata for reliability.

Containerized Deployment Exploration

A hybrid cloud‑on‑premise deployment is adopted: FE remains on physical machines, BE partially in the cloud with local storage, and CN as stateless compute nodes with resource isolation and disk‑spilling to address container memory limits (Fig. 14).

Performance Improvements

Data Cache activation and slow HDFS DataNode avoidance significantly improve query latency, with measurable reductions in long‑tail queries (Fig. 15).

Landing Results

Smart Acceleration Engine processes over 100k SQLs daily.

StarRocks lake‑warehouse architecture drives ad‑hoc query growth, surpassing 60k daily SQLs.

ETL instances exceed 10k, with average efficiency gains of 41%.

HiveSQL migration reaches 92.4%, with P95 latency improved by 43.8%.

AI‑driven HBO models increase accuracy by 82% and halve failover rates.

CPU consumption reduced by over 15,000 cores through resource‑efficient architecture.

Summary and Outlook

The project benefited greatly from the Apache Kyuubi and StarRocks communities. Future work will focus on continuous Kyuubi/StarRocks iteration, exploiting Spark 3.5 capabilities, advancing vectorized processing, and exploring AI/algorithm innovations for smarter data‑driven decisions.

Authors: Ma Ruili, Wang Shifa, Liu Kai, Zhou Heming, Wu Yanxing, and the Data Architecture Computing Team.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Big Data StarRocks Multi‑tenant Apache Kyuubi Smart Acceleration Engine

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.