Databases 15 min read

Understanding B‑Tree Indexes: Clustered vs Non‑Clustered in Databases

This article explains the fundamentals of database indexing, detailing B‑Tree structures, the differences between clustered and non‑clustered indexes, their storage formats, and how they affect query, insert, and delete operations, including the concept of index covering.

Java Interview Crash Guide

Jan 4, 2021

Understanding B‑Tree Indexes: Clustered vs Non‑Clustered in Databases

1. Introduction

Database indexing is a core topic in database design; this article asks what an index is and how clustered and non‑clustered indexes differ.

2. B‑Tree

Most database systems use B‑Tree or B+Tree for indexes (e.g., MsSql uses B+Tree, Oracle and Sybase use B‑Tree). A B‑Tree differs from a binary tree and satisfies several properties for an M‑order tree:

Each node has at most M children.

Except for the root and leaves, each node has at least M/2 children.

The root has at least two children unless the tree consists of a single node.

All leaf nodes are on the same level and contain no key data.

A non‑leaf node with K keys has exactly K+1 children.

Keys inside a node are stored in ascending order. Below is an example of a B‑Tree with M=4:

Each node contains a key array Key[] and a pointer array Son[]. Searching a B‑Tree proceeds by sequential or binary search of Key[]; if the key K is found, the node address and position are returned, otherwise the search continues in the child pointed to by the appropriate Son[i] until a leaf is reached or the search fails.

Insertion of keys 1‑6 (M=4) causes a split when the node becomes full; the middle key is promoted to become the new root.

3. Database Indexes

1. What is an index

An index is a database object that speeds up data access, similar to a book index. Benefits include avoiding full‑table scans, allowing non‑clustered queries to skip data pages, reducing page splits for clustered inserts, and sometimes eliminating the need for sorting.

Indexes can avoid full‑table scans.

Non‑clustered indexes may allow queries without touching data pages.

Clustered indexes prevent data inserts from always targeting the last page.

Indexes can also avoid sorting operations.

However, maintaining indexes adds overhead to data‑modification operations.

2. Index storage

An index record stores the key value (the indexed column values) and a logical pointer to a data page or another index page.

When an empty table receives its first index, the system allocates a single index page that serves as both root and leaf.

When inserting rows, the root node receives a new index record; if the root becomes full, it is split as follows:

Create two child nodes.

Distribute the original root’s records roughly equally between the children.

Insert pointers from the root to the two children.

Because index records contain only the indexed fields and a small pointer (4‑9 bytes), index pages are much denser than data pages, allowing more records per page and reducing I/O.

3. Types of indexes

A) Clustered index : Table rows are stored in the order of the index; leaf nodes contain the actual data rows.

B) Non‑clustered index : Table row order is independent of the index; leaf nodes store the indexed column values plus a pointer to the data row.

Only one clustered index can exist per table; tables without a clustered index are called “heaps”.

4. Clustered index

In a clustered index, leaf nodes are data nodes, so the storage order of rows matches the index order.

Query example: searching for the name “Green” traverses three index pages and finally the data page.

Insert operation: the system locates the target data page via the index, shifts existing rows if necessary, and inserts the new row. If the page is full, it is split, possibly allocating a new extent, adjusting index pointers, and redistributing roughly half the rows to a new page. Special cases include large rows that require two new pages or auto‑increment columns that may avoid a split.

Delete operation: the row is removed, subsequent rows shift up, and empty pages may be reclaimed. Index entries are also removed, and if a leaf contains only one record it may be merged with a neighboring leaf.

5. Non‑clustered index

Differences from clustered indexes:

Leaf nodes are not data nodes.

Each leaf stores a “key‑pointer” pair for every data row.

Leaf nodes also store a pointer offset to locate the exact row.

Root and intermediate index records contain:

Index field value.

RowId (page pointer + offset).

Pointer to the next lower‑level index page.

Leaf‑level records contain the index field value and RowId.

Query example: searching for “Green” requires reading three index pages plus one data page.

Insert: if a table has only a non‑clustered index, new rows are appended to the last data page and the index is updated; if a clustered index also exists, it determines the insertion point and both indexes are updated.

Delete: the non‑clustered index is used to locate the row; after deletion the corresponding leaf entry is removed, and any other non‑clustered indexes on the table are also updated. Empty pages are reclaimed and pointers in the index tree are adjusted.

6. Index covering

Covering indexes improve performance when all columns required by a query are present in a single index, eliminating the need to read data pages. A composite (multi‑column) index can contain up to 31 columns and up to 600 bytes per record.

If a query’s SELECT, WHERE, ORDER BY, GROUP BY, and HAVING columns are all covered by the index, only index pages are accessed.

Two scan types exist:

Matched index scan : used when the leading column of the index appears in the WHERE clause; it can avoid data‑page access, especially beneficial for range queries.

Non‑matched index scan : used when the leading column is absent; the scan must read all leaf nodes, which is still faster than scanning all data pages.

[1] http://manuals.sybase.com/onlinebooks/group-asarc/asg1200e/aseperf/@Generic__BookTextView/3358 [2] http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.adref.doc/adref235.htm

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

B+Tree SQL Performance database index Clustered Index Non-Clustered Index Index Covering

Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.