Databases 10 min read

Columnar Storage vs Row Storage: Overview, Write/Read Comparison, Pros, Cons, and Use Cases

This article explains the differences between row-based and column-based storage, comparing their write and read performance, outlining advantages and disadvantages, and describing suitable scenarios such as OLAP queries, column families, compression, and indexing, to help choose the appropriate storage model.

Architects' Tech Alliance

Nov 20, 2022

Columnar Storage vs Row Storage: Overview, Write/Read Comparison, Pros, Cons, and Use Cases

01. Overview

Currently there are two major big‑data storage approaches: row‑based storage and column‑based storage.

02. What is Columnar Storage?

Column‑based storage is the opposite of traditional row‑based storage in relational databases. The difference lies in how tables are organized.

Ø Row‑based storage stores a table as a sequence of rows.

Ø Column‑based storage stores a table as a sequence of columns.

The figure shows that in row storage the data of a whole row is kept together, whereas in column storage each column is stored separately, leading to distinct trade‑offs.

03. Write‑Side Comparison

1) Row storage writes a whole row in a single operation. When built on a file system, the write is atomic, guaranteeing data integrity.

2) Column storage must split a row into individual columns, resulting in many more write operations (the number of columns times more). This increases disk‑head movements and latency (typically 1 ms–10 ms), so write performance is worse than row storage.

3) Data modification follows the same pattern: row storage updates a single location, while column storage updates multiple column locations, again favoring row storage.

04. Read‑Side Comparison

1) Row storage reads an entire row even if only a few columns are needed, causing redundant data to be transferred and later filtered in memory.

2) Column storage reads only the required columns or column blocks, eliminating redundancy.

3) Because each column contains homogeneous data types, parsing is straightforward. Row storage mixes types within a row, requiring frequent type conversions that consume CPU cycles.

4) Compression and performance advantages of column storage are illustrated in the following figures.

06. Advantages and Disadvantages

Both storage formats have clear pros and cons.

1) Row storage writes quickly and ensures data integrity, but reading can produce redundant data, which may affect performance on large datasets.

2) Column storage has slower writes and weaker integrity guarantees, yet it excels at read‑heavy workloads where only a subset of columns is needed, making it ideal for big‑data analytics.

The characteristics of each format dictate their appropriate use cases.

07. Suitable Scenarios for Columnar Storage

1) OLAP queries often scan millions or billions of rows but only need a few columns (e.g., date, item, sales amount). Columnar databases can read just those columns, dramatically improving query efficiency compared to row‑based systems.

2) Many columnar databases support column families (or locality groups). Storing frequently accessed columns together allows a single read to retrieve multiple columns, reducing I/O.

3) Columns with high redundancy compress very well; for example, Google Bigtable achieves >15× compression on web‑page data.

4) Bitmap indexes can be built on low‑cardinality columns (e.g., gender) to enable fast count queries and further compression.

However, if queries frequently need whole rows or involve small data volumes, columnar storage may not be appropriate.

08. Final Summary

① Data can be stored by rows.

② Without indexes, queries cause massive I/O; indexes accelerate queries.

③ Building indexes and materialized views consumes significant time and resources.

④ To satisfy query demands, databases often need to be heavily scaled.

Key characteristics of columnar databases:

① Data is stored per column, each column isolated.

② Data itself acts as an index.

③ Only columns involved in a query are accessed, greatly reducing I/O.

④ Each column can be processed by a separate thread, offering high concurrency.

⑤ Uniform data types enable efficient compression algorithms (e.g., delta, prefix compression), improving storage and network bandwidth usage.

Source: blog.csdn.nept/Xingxinxinxin/article/details/80939277

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Big Data database OLAP Columnar Storage Row Storage

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.