Big Data 10 min read

How Baidu’s DATAPILOT Uses NVIDIA RAPIDS to Supercharge SQL Analytics

Baidu’s DATAPILOT platform combines natural‑language interaction with GPU‑accelerated Spark‑RAPIDS to turn complex, multi‑table SQL queries into seconds‑fast results, boosting ad‑revenue analysis efficiency by up to five‑fold while reducing infrastructure costs.

Baidu Geek Talk

Oct 22, 2024

How Baidu’s DATAPILOT Uses NVIDIA RAPIDS to Supercharge SQL Analytics

Background and Challenges

In modern business environments, data analysis is critical for success. Baidu’s advertising data team serves thousands of users—including strategy engineers, product managers, analysts, operations, and sales—who previously had to write intricate SQL queries across more than 200 monthly data updates, consuming significant time and deep domain knowledge.

DATAPILOT Platform Overview

To address these pain points, Baidu built the DATAPILOT platform, which integrates advanced natural‑language processing with high‑performance SQL execution. Users can simply type questions in natural language, and the system instantly interprets intent, generates the appropriate SQL, and returns results within seconds.

Key Use Cases

Scenario 1: A user needs weekly ad‑spending data across multiple tables (traffic, ads, conversions, CPC, CPM, etc.). Traditionally this required strong business knowledge and complex SQL; DATAPILOT automates the entire workflow.

Scenario 2: An analyst investigates a drop in search‑ad revenue. The platform guides the user through multi‑table joins, metric calculations, and attribution logic without manual SQL coding.

Architecture and GPU Acceleration

The platform consists of three layers: an interaction layer that converts text to SQL, a scheduling layer that dispatches SQL tasks to the appropriate hardware engine, and a compute layer that executes the queries. Baidu partnered with NVIDIA RAPIDS for Apache Spark, leveraging the RAPIDS Accelerator and GPU‑accelerated kernels to run Spark workloads entirely on GPUs.

RAPIDS provides cuDF for fast dataframe operations, UCX‑based shuffle for GPU‑to‑GPU data transfer, and optimized parquet sub‑row‑group reading to avoid OOM errors. This heterogeneous environment (CPU + GPU) analyzes each SQL statement, decides if it can be accelerated, and routes it to the GPU when possible.

Performance Gains

With the RAPIDS‑enabled stack, the team achieved:

35 % of ad‑analysis workloads covered with an average >2× speedup (some cases up to 5×).

Complex revenue‑analysis queries reduced from day‑level to minute‑level execution.

Improved A/B test and strategy‑research cycles due to faster SQL response.

Overall, the solution lowered IT costs by an estimated 50 % and increased user productivity.

Future Directions

Baidu plans to further explore custom hardware deployments, optimizing the balance among GPU, CPU, memory, and SSD to maximize performance and cost efficiency, while continuing collaboration with NVIDIA to enhance data‑analysis capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

big data GPU Acceleration Data Analytics Apache Spark SQL Generation Baidu NVIDIA RAPIDS

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.