Performance Optimization Practices for KwaiBI Big Data Analysis Platform
This article introduces KwaiBI, the internal data analysis product of Kuaishou, outlines its five major functional areas, details the performance challenges of large‑scale analytics, and presents a comprehensive set of optimization techniques—including cache warming, query rewriting, materialized acceleration, and the Bleem lake‑house engine—along with future directions and a brief Q&A.
1. KwaiBI Product Introduction
KwaiBI is Kuaishou's internal data analysis platform that aims to provide a one‑stop solution for data acquisition and analysis. It currently serves over 15,000 monthly active users, supports more than 50,000 reports, 100,000 models, and integrates more than 150 business data sources.
The platform offers five data‑consumption scenarios: data extraction, multi‑dimensional analysis, visualization, push, and portal, with multi‑dimensional analysis and visualization being the core use cases.
1.1 Analysis Capability Overview
KwaiBI connects to a variety of data sources, including big‑data storage engines, traditional relational databases, and local files. Before analysis, data owners model the sources, build tables and relationships, and create datasets that represent business domains. Standardized metrics and dimensions are managed through a metric middle‑platform before being ingested into KwaiBI.
Once data is ingested, the platform provides basic capabilities such as detail queries, aggregation, and cross‑source calculations, as well as advanced analytics like same‑period comparison, proportion, LOD analysis, and table calculations, enabling downstream users to perform sophisticated analyses.
2. Performance Challenges
Difficulty diagnosing performance bottlenecks: users cannot see where time is spent, lack query profiling, and experience unpredictable latency.
High optimization threshold: strong domain knowledge is required, making it hard for novice analysts to optimize queries.
Platform‑side challenges: complex analytical queries (e.g., same‑period, proportion, LOD) account for over 30% of workload; engine queries involve many joins, large data volumes, and wide time ranges; the primary engine ClickHouse is not friendly to join‑heavy workloads and lacks intelligent SQL optimization.
3. Optimization Practices
3.1 Cache Warming
Full‑link tracing is used to pinpoint latency sources and build user query profiles. Based on these profiles, a self‑service performance diagnosis tool is provided. Cache warming is triggered by schedules or peak usage periods, focusing on high‑frequency dashboards and queries. The warming pipeline consists of a trigger, calculator, executor, and monitor, with concurrency control to protect service stability. Monitoring shows first‑screen cache hit rates up to 90% and non‑first‑screen hit rates around 30%.
3.2 Query Optimization
All user queries are expressed in Kwai’s Open Analysis Expression (OAX). The OAX parser extracts high‑level calculations and determines whether they operate on models or physical tables. An AST optimization stage selects the most efficient model or table, performs model search, and translates the optimized AST into engine‑specific SQL with both generic and native optimizations. Over 50 optimization rules have been codified, including complex analysis push‑down, predicate push‑down, aggregation operator tuning, and join order adjustment.
3.3 Materialization Acceleration
Materialized acceleration builds result tables via ETL jobs rather than relying on OLAP engine materialization. High‑frequency metric‑dimension combinations are identified, ranked by query count and latency, and selected for materialization. Generated tasks produce result tables in Hive or ClickHouse, which are automatically ingested back into KwaiBI, yielding up to 50% query performance improvement and streamlined data production.
3.4 Engine Optimization – Bleem Lake‑House
Bleem is Kuaishou’s unified lake‑house engine designed to complement ClickHouse. It introduces multi‑level caching (metadata, data, index), vectorized and multi‑threaded execution, and advanced optimizers (RBO & CBO) with join‑order tuning. Bleem aims to achieve ClickHouse‑level performance while enabling direct analytics on the data lake, reducing data movement and production overhead.
4. Future Outlook
Performance optimization requires cross‑team collaboration among the analysis platform, data‑warehouse, and engine teams. Future work will focus on achieving end‑to‑end automated and intelligent performance tuning, combining software and hardware innovations to continuously push the limits of analytical speed.
5. Q&A
Q1: Are there any community or open‑source plans for Bleem? A1: No public plans yet; development is ongoing.
Q2: Where does Bleem fit in the ecosystem? A2: It is an analysis engine for the data lake.
Q3: Can materialization optimize cross‑table joins? A3: Yes, materialization offers both aggregation and full‑table modes to improve join performance.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.