Big Data 11 min read

Interview with Baidu’s Chief Big Data Architect Ma Ruyue on OLAP, HTAP, and Emerging Big Data Technologies

In this interview, Baidu’s senior big‑data architect Ma Ruyue discusses his career transition from Hadoop to online databases, the design philosophy behind Baidu’s Palo ROLAP system, the future of HTAP, and his views on the evolving big‑data ecosystem including Spark, AI, and containerization.

High Availability Architecture

May 21, 2018

Interview with Baidu’s Chief Big Data Architect Ma Ruyue on OLAP, HTAP, and Emerging Big Data Technologies

From June 1‑2, the GIAC Global Internet Architecture Conference was held in Shenzhen, featuring experts from companies such as Tencent, Alibaba, Baidu, and many others.

Ahead of the conference, the High‑Availability Architecture team interviewed Ma Ruyue, the producer of the GIAC Big Data Forum and Baidu’s chief big‑data architect, about widely‑watched big‑data topics.

Ma explained that at Baidu the technical and management tracks diverge as seniority increases; early in one’s career technical depth is essential, while senior leaders must focus on building technical ladders and long‑term strategy.

He described his move from Hadoop‑based offline processing to online database research, leading the development of Baidu’s Palo system, a ROLAP solution built with a self‑designed storage engine and an Impala‑based query engine, emphasizing simplicity over complex dependencies.

Ma compared OLAP (analysis‑oriented) and OLTP (transaction‑oriented) workloads, noting the growing interest in HTAP (Hybrid Transaction/Analytical Processing) but cautioning that many real‑world scenarios still benefit from keeping the two workloads separate.

Regarding the proliferation of big‑data components, he observed that the field is still immature and that integration solutions have great potential; he recommends using Spark/H2O/TensorFlow for offline analytics, Palo/ELK for online analysis, and watching Apple’s open‑source FoundationDB for NewSQL.

On Hadoop, Ma argued that Spark can replace Hadoop’s functionality and suggested newcomers start with Spark rather than Hadoop.

He evaluated open‑source projects TiDB and Kylin, describing TiDB as a NewSQL/HTAP product and Kylin as a New OLAP solution that suffers from heavy Hadoop dependencies.

Ma noted that containers and serverless technologies are driving micro‑service adoption in big‑data infrastructure, with both AWS and Baidu moving large‑scale data and AI workloads onto container platforms despite some operational challenges.

His current focus is on applying machine learning and AI to data analysis, such as AutoML, to build a next‑generation SAS‑like analytics product.

Finally, Ma said his participation in GIAC aims to share Baidu Cloud’s thinking on big‑data platform architecture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

HTAP OLAP Distributed Computing Data Architecture

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.