Databases 22 min read

Vectorization in Apache Doris: Design, Implementation, Current Status, and Future Plans

This article explains how Apache Doris adopts CPU vectorization techniques—such as SIMD, columnar storage, and cache‑friendly designs—to boost query performance, detailing its current vectorized engine architecture, recent benchmarks, ongoing work on JOIN, storage, import, and future enhancements.

DataFunTalk
DataFunTalk
DataFunTalk
Vectorization in Apache Doris: Design, Implementation, Current Status, and Future Plans

Introduction

The article introduces the concept of vectorization, describing how turning single‑value operations into batch operations improves CPU efficiency and has become a major trend in software development, especially for databases.

CPU Vectorization Principles

Modern CPUs support SIMD instructions that can process multiple data elements in a single instruction, e.g., a 128‑bit register can handle four 32‑bit integers simultaneously, providing up to four‑fold speedup compared to scalar execution.

Database Perspective

In a database engine, traditional row‑wise processing handles one tuple at a time, while a vectorized engine processes batches of columnar data, reducing memory accesses and improving cache locality.

Doris Vectorization Design and Implementation

Doris transforms its execution engine from a row‑batch/tuple model to a column‑oriented Block and Column model, redesigns the computation framework to operate on columns, and implements vectorized operators such as aggregation, sorting, and join.

Current Status

Since version 0.15, Doris supports a vectorized engine for single‑wide tables, enabling vectorized scan, sort, agg, and union. Benchmarks show 2‑10× performance gains for typical queries, and the engine can be enabled with set enable_vectorized_engine = true and set batch_size = 4096 .

Future Plans

Key upcoming work includes:

Vectorizing the JOIN operator (currently 30‑40% faster, with further tuning planned).

Vectorizing the storage layer to eliminate row‑wise aggregation and improve memory layout for Date/DateTime, Decimal, and HLL types.

Vectorizing data import to reduce format conversion and leverage SIMD for aggregation during load.

Expanding SIMD support for SQL functions and developing a vectorized UDF framework.

Refactoring basic data types (Date/DateTime, Decimal, String, Array) for better memory usage and SIMD friendliness.

Improving the vectorized execution engine with a cost‑based optimizer to enable deeper inlining and other optimizations.

The team also acknowledges contributions from the ClickHouse community, which provided column‑store and function‑framework code.

Conclusion

The presentation ends with a thank‑you and a call for audience engagement.

SIMDcolumnar storagedatabase performanceVectorizationApache Doris
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.