Databases 10 min read

Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode

In this technical interview, Wu Li, a research engineer at Shanghai Yanhuang Data, explains how hardware constraints drive database evolution, why columnar storage and SIMD acceleration are crucial for OLAP, and how JIT compilation and push‑mode processing improve query performance and product experience.

DataFunSummit

Dec 10, 2023

Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode

The interview begins with an overview of digital transformation in enterprises, emphasizing that database evolution is driven by hardware bottlenecks: early limits were disk and network bandwidth, while today CPU performance has become the primary constraint.

Wu Li introduces columnar storage, contrasting it with traditional row‑based storage. He explains that columnar layouts allow efficient aggregation on a single column, reduce random seeks, and enable better compression, making them well‑suited for large‑scale analytical workloads.

After evaluating formats such as Parquet and Avro, the team selected Apache Arrow for its in‑memory, language‑agnostic design, which facilitates fast data exchange and supports SIMD parallelism.

The discussion then moves to JIT (Just‑In‑Time) compilation. Wu Li describes how JIT compiles expression trees at runtime, reducing interpretation overhead and improving filter and projection performance, especially for complex SQL queries.

To further accelerate query execution, the team adopted the Gandiva library from Arrow, extending it with custom functions and contributing improvements back to the open‑source project.

Finally, Wu Li explains the shift from pull‑mode to push‑mode execution. By replacing pull‑mode operators with push‑mode ones, the system achieves broader performance gains across diverse scenarios, though the transition required extensive engineering effort and careful coordination.

The interview concludes with a reflection on product philosophy: technical optimizations—columnar storage, JIT, and push‑mode—are pursued to deliver faster, more reliable query experiences for users handling massive data volumes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

OLAP Databases Columnar Storage JIT Compilation push-mode

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.