Row Store vs Column Store: Comparison, Advantages, and Experimental Evaluation in openGauss
The article explains the fundamental differences between row‑based and column‑based storage in databases, compares their write and read performance, presents a practical openGauss experiment showing storage size and insert speed differences, and offers guidance on when to choose each model.
Many beginners learn databases using relational tables stored row‑wise (row‑based store), where each record occupies a contiguous block on disk.
Some databases also support column‑wise storage (column‑based store), which stores each column’s values together.
The article compares the two storage models, illustrating differences with a diagram and describing how row storage keeps all attributes of a record together, while column storage keeps values of the same attribute adjacent.
It discusses write‑time characteristics: row stores write an entire row in a single operation, ensuring atomicity, whereas column stores must split a row into multiple column writes, leading to higher I/O overhead; thus row stores have an advantage for inserts and updates.
Read‑time characteristics are also compared: row stores retrieve whole rows, causing redundant column reads when only a few columns are needed, while column stores read only the required columns, making projections efficient and benefiting analytical (OLAP) workloads.
Advantages and disadvantages are summarized in a table, highlighting that row stores excel at point queries, frequent inserts/updates, and small tables, whereas column stores are suited for large‑width tables, aggregation‑heavy queries, and scenarios where compression improves I/O.
An experiment using openGauss (enterprise edition 3.0.0) on a Huawei Cloud VM demonstrates the differences: a row‑oriented table custom1 and a column‑oriented table custom2 are created, each loaded with 500 000 rows.
# create table custom1 (id integer,name varchar2(20)); # create table custom2 (id integer,name varchar2(20)) with (orientation = column); # insert into custom1 select n,'testtt'||n from generate_series(1,500000) n; # insert into custom2 select * from custom1;Storage size inspection shows the column table occupies roughly one‑seventh of the row table’s space. Insert‑performance tests reveal the column table is slightly slower for single‑row inserts.
# explain analyze insert into custom1 values(1,'zhang3'); # explain analyze insert into custom2 values(1,'zhang3');SQL statements used in the experiment are presented in ... blocks, and the tables are subsequently dropped.
# drop table custom1; # drop table custom2;Finally, the article offers practical selection guidelines (update frequency, insert pattern, column count, query column proportion, compression trade‑offs) and notes constraints of column stores such as lack of support for arrays, generated columns, global temporary tables, and foreign keys.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.