Databases 10 min read

Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Database Development

The article presents an interview with Wu Li, a senior R&D engineer at Shanghai Yanhuang Data, discussing how columnar storage, JIT compilation, and push‑mode execution are reshaping database performance in the era of big‑data analytics and evolving hardware constraints.

DataFunSummit

Dec 9, 2023

Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Database Development

The interview begins with an overview of enterprise digitalization driven by database evolution, noting that hardware bottlenecks have shifted from disk and network bandwidth to CPU performance, making efficient data processing increasingly critical.

Wu Li explains the early adoption of columnar storage, contrasting it with row‑based storage: columnar layouts keep values of the same column together, enabling faster aggregation, better compression, and SIMD‑friendly parallelism, which significantly speeds up OLAP queries.

After evaluating formats such as Parquet and Avro, the team selected Apache Arrow for its in‑memory, language‑agnostic, hierarchical data model, and integrated it with the Gandiva expression‑optimization library to accelerate projection and filtering.

Recognizing limitations in Gandiva, Yanhuang Data contributed patches upstream and added custom functions, leveraging open‑source collaboration to improve the toolset.

The discussion then moves to JIT (just‑in‑time) compilation, describing how runtime code generation reduces interpretation overhead and improves expression evaluation, especially when combined with SIMD instructions, while noting the need for careful CPU‑architecture compatibility.

Finally, Wu Li describes the transition from pull‑mode (user‑driven data fetching) to push‑mode (producer‑driven data delivery), highlighting the engineering effort required to replace execution operators and the resulting comprehensive query‑performance gains across the product.

Throughout, the interview emphasizes that these technical advances—columnar storage, JIT compilation, and push‑mode execution—are pursued to deliver faster, more reliable query experiences for end users in large‑scale data environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

OLAP Databases Columnar Storage Apache Arrow JIT Compilation push-mode Gandiva

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.