Big Data 19 min read

How Kuaishou Scales SQL on Hadoop: Architecture, Optimizations, and Lessons Learned

This article explains the SQL‑on‑Hadoop ecosystem—including Hive, Spark, SparkSQL, Presto and other solutions—then details Kuaishou's large‑scale platform architecture, performance bottlenecks, routing logic, high‑availability mechanisms, and a series of concrete optimizations that improve query speed, resource utilization, and operational stability.

dbaplus Community
dbaplus Community
dbaplus Community
How Kuaishou Scales SQL on Hadoop: Architecture, Optimizations, and Lessons Learned

SQL on Hadoop Overview

SQL on Hadoop provides a set of SQL engines that translate queries into execution plans for Hadoop‑compatible runtimes such as MapReduce, Spark, Tez, or native in‑memory engines. The most common implementations are:

Hive

Hive is a data‑warehouse system that compiles SQL into tasks for MR, Spark, or Tez. It exposes HiveServer2 and MetaStoreServer via Thrift for remote query submission and metadata management.

Spark & SparkSQL

Spark is a DAG‑based unified analytics engine with modules for SQL, streaming, ML, and graph processing. SparkSQL builds on Spark’s compute engine, parses SQL, creates a logical plan, optimizes it, and finally generates a physical RDD lineage for execution.

Presto

Presto is an open‑source, fully in‑memory distributed SQL engine that delivers lower latency than MR‑based engines and is widely used in large‑scale production.

Kuaishou SQL on Hadoop Platform

Scale

The platform processes ~700 k SQL queries per day, of which ~180 k are DQL. The AdHoc cluster serves interactive analysis (average DQL latency ≈300 s) while the ETL cluster handles batch jobs (average DQL latency ≈1 000 s, P50 = 100 s, P90 = 4 000 s).

Service Layers

Four upper‑level modules—synchronization service, ETL platform, AdHoc platform, and user programs—feed into a service layer that collects logs, metrics, and metadata into HDFS or a dedicated metadata store. Web‑crawled data is first stored in HBase and then cleaned.

Key Components

HUE/Notebook for interactive queries.

Reporting/BI systems for ETL and reporting.

Metadata service.

HiveServer2 for engine selection (supports MR, Presto, Spark).

HiveServer2 Multi‑Cluster Architecture

Multiple HiveServer2 clusters (AdHoc, ETL, and smaller auxiliary clusters) are registered in ZooKeeper. Clients resolve the appropriate cluster via ZooKeeper, enabling priority‑based routing (core vs. general clusters).

BeaconServer Extension

BeaconServer runs as a stateless hook service attached to HiveServer2. It provides:

SQL routing based on configurable rules.

Auditing, SQL rewriting, task control, error analysis, and optimization suggestions.

Dynamic module loading and graceful backend upgrades without affecting HiveServer2.

Intelligent Engine Solution

To reduce the operational cost of maintaining multiple engines, Kuaishou implemented a custom execution engine inside HiveServer2 with automatic routing:

Two integration modes: JDBC (SQL sent to a remote accelerated cluster) and PROXY (SQL pushed to a local accelerated client).

Both modes share YARN resources, allowing night‑time AdHoc resources to be reclaimed for batch workloads.

Routing is performed by HiveServer2 hooks that invoke BeaconServer’s rule engine. Different clusters can have distinct routing policies.

Routing Rule Example

Rules can match SQL patterns, engine capabilities, or cluster load, and can automatically add a LIMIT clause to prevent large scans.

Benefits

Seamless integration with mainstream engines via JDBC or PROXY.

Automatic engine selection reduces user effort and migration risk.

Fail‑back to MR when an accelerated engine is overloaded.

Reuse of existing HiveServer2 modules (lineage, permission, concurrency) lowers development cost.

Dynamic resource sharing improves cluster utilization.

Performance Optimizations

FetchTask Acceleration

Result files are pre‑sorted by size before fetching. When many small files exist, this reduces the number of HDFS round‑trips and can improve latency by orders of magnitude.

DESC Table Optimization

Metadata‑driven partition enumeration is skipped for DESC TABLE on large tables. The operation time becomes linear to the number of sub‑partitions, dramatically reducing latency under heavy metadata load.

Additional Optimizations

Reuse split calculations to skip redundant reduce phases, boosting scheduling speed ~50% for large inputs.

Accelerate ParquetSerDe initialization to avoid repeated column pruning.

Introduce LazyOutputFormat to prevent creation of empty files.

Multi‑threaded statsTask aggregation to avoid slow merges.

Enable parallel compilation for AdHoc queries to reduce overall latency.

High‑Availability Improvements

HiveServer2 Startup

Materialized view initialization is deferred to a background thread, reducing startup time from >5 minutes to <5 seconds.

Hot Configuration Reload

A ThriftServer interface allows configuration changes to be pushed and applied instantly without restarting HiveServer2.

Scratch Directory Optimization

Separate scratch directories for regular queries and CREATE TEMPORARY TABLE operations, with lazy creation for the latter, prevent HDFS NameNode overload and improve connection handling.

Stage Concurrency Fixes

Reset non‑running subtasks to the initialized state to avoid duplicate scheduling.

Add a post‑execution verification step that aborts the job if downstream stages are incomplete, ensuring result completeness.

AB Switching & Dynamic Online/Offline

Using ZooKeeper, an online primary cluster and a standby backup can be toggled without service interruption. Metrics monitoring and a cancel‑request interface enable automatic memory‑pressure handling and graceful task termination.

Management UI

A dedicated UI displays cluster versions, startup times, resource usage, and online/offline status, supporting one‑click gray releases and upgrades.

Future Plans

Upgrade the SQL expert system to provide automatic parameter tuning and query optimization.

Introduce caching acceleration for AdHoc queries.

Research and adopt new execution engines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilityHiveSparkSQL on Hadoop
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.