Row vs Column Storage: Comparison, Advantages, and Experimental Evaluation in openGauss
This article explains the fundamental differences between row‑based and column‑based storage in databases, compares their write and read performance, lists pros and cons, presents an openGauss experiment with sample tables, and offers practical guidance on when to choose each storage model.
When beginners learn databases they usually start with relational tables stored row‑by‑row, which is a typical row‑based store. Some databases also support column‑based storage, where each column is stored separately on disk.
Storage comparison
In row storage, the values of a record are stored contiguously; in column storage, all values of a single column are stored together, making column data continuous.
Advantages and disadvantages
Row storage advantages: data kept together, INSERT/UPDATE are simple and atomic, good for frequent updates. Disadvantages: when selecting only a few columns, unnecessary data is read.
Column storage advantages: only the needed columns are read, projection is efficient, any column can be indexed, better compression and query performance for analytical workloads. Disadvantages: INSERT/UPDATE are slower because a row must be split into columns, higher CPU cost for compression, and certain features (arrays, foreign keys, global temporary tables) are not supported.
Experimental evaluation (openGauss)
Environment: Huawei Cloud VM + openGauss Enterprise 3.0.0 + openEuler 20.03. Two tables were created, one row‑oriented (custom1) and one column‑oriented (custom2), each with 500,000 rows.
Table creation and data insertion:
openGauss=# create table custom1 (id integer,name varchar2(20)); CREATE TABLE openGauss=# create table custom2 (id integer,name varchar2(20)) with (orientation = column); CREATE TABLE openGauss=# insert into custom1 select n,'testtt'||n from generate_series(1,500000) n; INSERT 0 500000 openGauss=# insert into custom2 select * from custom1; INSERT 0 500000Storage size comparison:
openGauss=# \d+ public | custom1 | table | omm | 24 MB | {orientation=row,compression=no} public | custom2 | table | omm | 3104 kB | {orientation=column,compression=low}Insert performance test:
openGauss=# explain analyze insert into custom1 values(1,'zhang3'); Total runtime: 0.135 ms openGauss=# explain analyze insert into custom2 values(1,'zhang3'); Total runtime: 0.207 msCleanup:
openGauss=# drop table custom1; DROP TABLE openGauss=# drop table custom2; DROP TABLEFrom the results, column storage uses far less space (about 1/7 of row storage) but incurs slightly higher insert latency. The article then provides selection recommendations (e.g., frequent updates favor row storage, analytical queries favor column storage) and notes on column‑storage limitations.
Recommendation
Choose the storage model based on workload characteristics: point queries and frequent DML operations suit row storage, while OLAP‑style analytical queries, high compression needs, and wide tables with few accessed columns benefit from column storage.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.