How Baidu’s DATAPILOT Uses NVIDIA RAPIDS to Supercharge SQL Analytics
Baidu’s DATAPILOT platform combines natural‑language interaction with GPU‑accelerated Spark‑RAPIDS to turn complex, multi‑table SQL queries into seconds‑fast results, boosting ad‑revenue analysis efficiency by up to five‑fold while reducing infrastructure costs.
Background and Challenges
In modern business environments, data analysis is critical for success. Baidu’s advertising data team serves thousands of users—including strategy engineers, product managers, analysts, operations, and sales—who previously had to write intricate SQL queries across more than 200 monthly data updates, consuming significant time and deep domain knowledge.
DATAPILOT Platform Overview
To address these pain points, Baidu built the DATAPILOT platform, which integrates advanced natural‑language processing with high‑performance SQL execution. Users can simply type questions in natural language, and the system instantly interprets intent, generates the appropriate SQL, and returns results within seconds.
Key Use Cases
Scenario 1: A user needs weekly ad‑spending data across multiple tables (traffic, ads, conversions, CPC, CPM, etc.). Traditionally this required strong business knowledge and complex SQL; DATAPILOT automates the entire workflow.
Scenario 2: An analyst investigates a drop in search‑ad revenue. The platform guides the user through multi‑table joins, metric calculations, and attribution logic without manual SQL coding.
Architecture and GPU Acceleration
The platform consists of three layers: an interaction layer that converts text to SQL, a scheduling layer that dispatches SQL tasks to the appropriate hardware engine, and a compute layer that executes the queries. Baidu partnered with NVIDIA RAPIDS for Apache Spark, leveraging the RAPIDS Accelerator and GPU‑accelerated kernels to run Spark workloads entirely on GPUs.
RAPIDS provides cuDF for fast dataframe operations, UCX‑based shuffle for GPU‑to‑GPU data transfer, and optimized parquet sub‑row‑group reading to avoid OOM errors. This heterogeneous environment (CPU + GPU) analyzes each SQL statement, decides if it can be accelerated, and routes it to the GPU when possible.
Performance Gains
With the RAPIDS‑enabled stack, the team achieved:
35 % of ad‑analysis workloads covered with an average >2× speedup (some cases up to 5×).
Complex revenue‑analysis queries reduced from day‑level to minute‑level execution.
Improved A/B test and strategy‑research cycles due to faster SQL response.
Overall, the solution lowered IT costs by an estimated 50 % and increased user productivity.
Future Directions
Baidu plans to further explore custom hardware deployments, optimizing the balance among GPU, CPU, memory, and SSD to maximize performance and cost efficiency, while continuing collaboration with NVIDIA to enhance data‑analysis capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
