Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Modern Database Systems
The article presents an interview with Wu Li, a senior engineer at Shanghai Yanhuang Data, discussing how columnar storage, JIT compilation, and push‑mode processing are reshaping database performance and product strategy in the era of large‑scale data analytics.
The interview begins by outlining the evolution of database hardware bottlenecks, noting that while earlier limits were disk and network bandwidth, modern constraints have shifted to CPU performance, prompting a move toward distributed and columnar architectures.
Wu Li explains the advantages of columnar storage over row‑based storage, emphasizing faster aggregation, better compression, and more efficient SIMD parallelism, and describes the selection of Apache Arrow as a versatile data format for their use cases.
To further accelerate query execution, Yanhuang Data adopted JIT (Just‑In‑Time) compilation, leveraging the open‑source Gandiva library from Arrow to compile expression DAGs at runtime, while also contributing improvements back to the project.
The team also transitioned from a traditional pull‑mode data consumption model to a push‑mode architecture, highlighting the performance gains in streaming and cache efficiency despite the higher engineering complexity.
Overall, the technical choices—columnar storage, JIT compilation, and push‑mode processing—are driven by the goal of delivering faster, more reliable OLAP query experiences for end users, reflecting a product‑centric approach to database innovation.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.