Databases 11 min read

Why Columnstore Indexes Supercharge SQL Server Queries

This article explains how columnstore indexes differ from traditional row stores, detailing their batch processing, compression, and storage mechanisms that can boost data‑warehouse query performance by up to ten times while reducing storage size dramatically.

MaGe Linux Operations

Jul 8, 2023

Why Columnstore Indexes Supercharge SQL Server Queries

Traditional row‑store databases keep multiple rows per page, while columnstore stores each column’s data separately in its own set of pages, allowing selective column loading and higher compression.

Columnstore indexes are ideal for data‑warehouse workloads, offering up to ten‑fold query speed improvements and up to seven‑fold data size reduction compared with uncompressed row stores.

Batch processing mode handles many rows at once instead of one row at a time.

Only the required columns are read into memory, avoiding unnecessary I/O.

High similarity within a column enables aggressive compression, speeding data reads.

01. Characteristics of Columnstore

SQL Server reads data by loading entire pages; with row stores each page contains all columns of a row, limiting the number of rows per page. In a columnstore, each page holds many values of a single column, allowing roughly ten times more rows per page and dramatically fewer I/O operations for queries that touch only a few columns.

02. Physical Implementation

Row‑store tables use heap or B‑Tree structures, whereas columnstore indexes store data column‑wise.

1. Advantages

Only selected columns are read into memory, greatly improving query speed for narrow‑column queries.

Columns exhibit higher redundancy, enabling better compression and more rows per page.

Compressed columns increase cache‑hit rates, keeping hot pages in memory.

Batch processing mode provides superior execution performance over traditional row‑by‑row processing.

2. Physical Implementation Steps

Step 1: Columnstore index groups rows into Row Groups.

Step 2: Within each Row Group, each column forms a Column Segment.

Step 3: Each segment is individually compressed and stored.

3. Encoding and Compression

Columnstore uses dictionary‑based encoding for high‑cardinality repeated values and value‑based encoding for numeric types, both applied by the VertiPaq compressor.

Dictionary encoding stores a unique value list and replaces each value with a small index, achieving high compression when many duplicates exist.

Value encoding scales numeric ranges and stores an exponent, which is efficient for tightly distributed integer or decimal data.

03. Columnstore Index

SQL Server 2012 introduced non‑clustered columnstore indexes, which excel at star‑join aggregation queries common in data warehouses.

Typical star‑join query example:

select lt.Grouping_Columns,
       AggregationFunction(bt.Columns)
from dbo.LittleTable lt with(nolock)
inner join dbo.BitTable bt with(nolock)
    on lt.Int_Col1 = bt.Int_Col1
where ...
group by lt.Grouping_Columns

Creating a non‑clustered columnstore index:

CREATE NONCLUSTERED COLUMNSTORE INDEX index_name
ON schema_name.table_name (column [, ...])
WITH (DROP_EXISTING = OFF, MAXDOP = max_degree_of_parallelism)
ON partition_scheme_name(column) | filegroup_name;

Because a table with a columnstore index becomes read‑only, updates require disabling the index, performing the data change, then rebuilding the index:

ALTER INDEX mycolumnstoreindex ON mytable DISABLE;
-- update mytable --
ALTER INDEX mycolumnstoreindex ON mytable REBUILD;

Index creation and rebuild are I/O‑intensive and should be scheduled during low‑usage periods.

04. Space Usage of Columnstore Index

Data is first grouped into Row Groups; each column within a group forms a Segment stored separately. The total storage size equals the sum of all segment sizes.

The system view sys.column_store_segments reports segment information, e.g., for 10 Row Groups with 15 columns each, the view returns 150 rows.

SELECT i.object_id,
       OBJECT_NAME(i.object_id) AS object_name,
       i.name AS index_name,
       i.type_desc AS index_type,
       COL_NAME(i.object_id, ic.column_id) AS index_column_name,
       SUM(s.row_count) AS row_count,
       SUM(s.on_disk_size)/1024/1024 AS on_disk_size_mb
FROM sys.column_store_segments s
JOIN sys.partitions p ON s.partition_id = p.partition_id
JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id
JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id AND s.column_id = ic.index_column_id
GROUP BY i.object_id, i.index_id, i.name, i.type_desc, ic.column_id
ORDER BY i.object_id, i.name, index_column_name;

Each segment occupies minimal disk space, leading to low I/O and memory usage, especially when combined with batch processing, making columnstore indexes highly effective for star‑join aggregation queries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Warehouse compression SQL Server Columnstore Index

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.