Databases 10 min read

Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode

In this technical interview, Wu Li, a research engineer at Shanghai Yanhuang Data, explains how hardware constraints drive database evolution, why columnar storage and SIMD acceleration are crucial for OLAP, and how JIT compilation and push‑mode processing improve query performance and product experience.

DataFunSummit
DataFunSummit
DataFunSummit
Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode

The interview begins with an overview of digital transformation in enterprises, emphasizing that database evolution is driven by hardware bottlenecks: early limits were disk and network bandwidth, while today CPU performance has become the primary constraint.

Wu Li introduces columnar storage, contrasting it with traditional row‑based storage. He explains that columnar layouts allow efficient aggregation on a single column, reduce random seeks, and enable better compression, making them well‑suited for large‑scale analytical workloads.

After evaluating formats such as Parquet and Avro, the team selected Apache Arrow for its in‑memory, language‑agnostic design, which facilitates fast data exchange and supports SIMD parallelism.

The discussion then moves to JIT (Just‑In‑Time) compilation. Wu Li describes how JIT compiles expression trees at runtime, reducing interpretation overhead and improving filter and projection performance, especially for complex SQL queries.

To further accelerate query execution, the team adopted the Gandiva library from Arrow, extending it with custom functions and contributing improvements back to the open‑source project.

Finally, Wu Li explains the shift from pull‑mode to push‑mode execution. By replacing pull‑mode operators with push‑mode ones, the system achieves broader performance gains across diverse scenarios, though the transition required extensive engineering effort and careful coordination.

The interview concludes with a reflection on product philosophy: technical optimizations—columnar storage, JIT, and push‑mode—are pursued to deliver faster, more reliable query experiences for users handling massive data volumes.

Data EngineeringOLAPdatabasescolumnar storageJIT compilationpush-mode
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.