Databases 10 min read

Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Database Development

The article presents an interview with Wu Li, a senior R&D engineer at Shanghai Yanhuang Data, discussing how columnar storage, JIT compilation, and push‑mode execution are reshaping database performance in the era of big‑data analytics and evolving hardware constraints.

DataFunSummit
DataFunSummit
DataFunSummit
Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Database Development

The interview begins with an overview of enterprise digitalization driven by database evolution, noting that hardware bottlenecks have shifted from disk and network bandwidth to CPU performance, making efficient data processing increasingly critical.

Wu Li explains the early adoption of columnar storage, contrasting it with row‑based storage: columnar layouts keep values of the same column together, enabling faster aggregation, better compression, and SIMD‑friendly parallelism, which significantly speeds up OLAP queries.

After evaluating formats such as Parquet and Avro, the team selected Apache Arrow for its in‑memory, language‑agnostic, hierarchical data model, and integrated it with the Gandiva expression‑optimization library to accelerate projection and filtering.

Recognizing limitations in Gandiva, Yanhuang Data contributed patches upstream and added custom functions, leveraging open‑source collaboration to improve the toolset.

The discussion then moves to JIT (just‑in‑time) compilation, describing how runtime code generation reduces interpretation overhead and improves expression evaluation, especially when combined with SIMD instructions, while noting the need for careful CPU‑architecture compatibility.

Finally, Wu Li describes the transition from pull‑mode (user‑driven data fetching) to push‑mode (producer‑driven data delivery), highlighting the engineering effort required to replace execution operators and the resulting comprehensive query‑performance gains across the product.

Throughout, the interview emphasizes that these technical advances—columnar storage, JIT compilation, and push‑mode execution—are pursued to deliver faster, more reliable query experiences for end users in large‑scale data environments.

OLAPdatabasescolumnar storageApache ArrowJIT compilationpush-modeGandiva
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.