How HuoLala Accelerated Ad‑hoc Queries with a Hybrid Offline Engine
This article describes how HuoLala identified slow ad‑hoc query performance in its Hive‑on‑Tez stack, surveyed comparable industry solutions, and built a multi‑engine hybrid offline service that dramatically improves query latency, outlines its architecture, key design decisions, production impact, and future roadmap.
1. Background
HuoLala, a data‑intelligent logistics platform, found that its analysts and developers frequently performed ad‑hoc queries using Hive on Tez, which users reported as slow. To improve SQL query efficiency, the company introduced a hybrid offline engine service that combines multiple query engines.
2. Industry Survey
Similar products in the market include Tencent SuperSQL, Alibaba DLA, Huawei HetuEngine, 360 QuickSQL, and NetEase Kyuubi. They differ in engine composition, open‑source status, SQL standard support, data source connectivity, federation capability, and cloud‑native design.
3. Design and Implementation
3.1 Functional Architecture
The system aims to provide multi‑engine intelligent routing, high stability, scalability, ease of use, and correctness. A Master/Worker architecture is adopted, with HA Master for stability, elastic Workers for scalability, and SDK/JDBC/CLI interfaces for easy client integration. A custom data‑reconciliation tool ensures result correctness across engines.
3.2 System Architecture
Roles include Client, Master, and Worker, communicating via gRPC. The Master uses Zookeeper for high‑availability, persisting service state and handling resource lifecycle, load balancing, and node registration.
Clients (SDK/JDBC/CLI) obtain the Master address, request resources, submit SQL to Workers, and receive results.
Workers manage query execution through several components:
QueryManager tracks query metadata and status.
Dispatcher performs feature analysis (e.g., query type, data volume, operator count) to select the most suitable engine.
SQL Converter rewrites the original SQL to the target engine’s dialect.
QueryExecutor runs the query on the chosen engine, collects logs, reports status, and returns results.
If execution fails, QueryManager falls back to a secondary engine (typically Hive) to ensure completion.
3.2.4 Query Lifecycle
The query state machine includes six states: INITIALIZED, RUNNING, FINISHED, CLOSED, TIMEOUT, and ERROR, governing the progression and termination of each query.
3.3 Other Observations
Client behavior is driven by query state, requiring careful handling to avoid performance overhead.
The HBO service caches routing decisions for frequently repeated queries, reducing dispatch latency.
Engine compatibility challenges (e.g., Hive’s batch output vs. Presto’s pipeline streaming) necessitate fallback mechanisms and dynamic result handling.
4. Production Impact
Since its launch in January 2022, the hybrid engine has delivered significant latency reductions: the 65th percentile query time improved by 82% (from 50 s to 8 s) and the 75th percentile by 64.1% (from 78 s to 28 s).
5. Future Plans
Increase query efficiency by refining routing rules to direct more queries to Presto, especially complex ones.
Add Spark engine support to handle ETL workloads within the same service.
Enable multi‑data‑source federation to break data silos across offline warehouses, OLAP, and online stores.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
