Databases 11 min read

Exploring MySQL Database Fundamentals and Indexing Principles

This article examines MySQL database fundamentals, including storage architecture, block-based data access, disk I/O bottlenecks, and detailed explanations of B+Tree and hash indexes in MyISAM and InnoDB, while offering practical guidance on index design, selectivity, left‑most prefix rules, and using EXPLAIN for SQL optimization.

Java Captain

Oct 6, 2018

Exploring MySQL Database Fundamentals and Indexing Principles

Preface

Today I will explore the mysteries of databases from a programmer's perspective, using MySQL as an example.

Database Basic Principles

My understanding of DB

1. Components of a database: storage + instance

Data obviously needs storage; storage alone is insufficient, so an instance is required to encapsulate operations and expose CRUD APIs.

A single storage can serve multiple instances, improving load capacity and high availability; multiple storages can be placed in different data centers for disaster recovery.

2. Reading data by Block or Page

Databases do not read rows directly. They operate on fixed‑size physical blocks (often called pages, e.g., 16 KB). The system locates the block address, then reads the whole block, which reduces I/O operations.

Databases also pre‑fetch neighboring blocks into memory to increase cache hit rates, because nearby blocks are often hot data.

3. Disk I/O is the performance bottleneck

Data resides on disk, so disk I/O is far slower than memory access. Techniques to reduce I/O include increasing memory, using indexes, and employing faster storage.

Loading all data into memory is impractical for large datasets; instead, databases keep hot data in memory (often using hash structures) and rely on indexes to limit disk reads.

4. Thought‑provoking questions

Why is DELETE slower than TRUNCATE? (Row‑by‑row deletion vs. block‑level removal.)

Why do we say a small table should drive a large table? (Because indexes on the small table can efficiently locate rows in the large table.)

Exploring MySQL Index Principles

In most applications the read/write ratio is 10:1 or even 100:1, making SELECT optimization the most critical task, and indexes are essential.

What is an index and what problem does it solve?

MySQL indexes are data structures designed to improve query efficiency by reducing the number of disk I/O operations needed to locate the desired rows.

Hello, B+Tree

Different storage engines implement indexes differently. For MyISAM, the index file stores the physical address of each record (a non‑clustered index). For InnoDB, the leaf nodes store the actual row data (a clustered index).

In InnoDB, the primary key index stores full row data in leaf nodes, while secondary indexes store only the primary key value, requiring a second lookup.

Deep into B+Tree

① Why does B+Tree store actual data only in leaf nodes?

② Why should index columns be as short as possible and preferably monotonic increasing?

③ Why does a composite index follow the left‑most prefix rule?

④ How do range queries affect left‑most prefix matching?

The height h of a B+Tree depends on the number of entries per block (m) and total rows N. Smaller index fields increase m, reducing h and speeding up lookups.

Using a monotonic increasing column as the primary key keeps the tree shallow and reduces frequent node splits.

For a composite index (name, age, sex), the engine first compares the leftmost column (name). If a query starts with a later column, the index cannot be used efficiently.

MySQL will continue matching columns from left to right until it encounters a range condition, after which matching stops.

Keep indexes "clean"

Avoid applying functions to indexed columns, as this prevents the index from being used and forces full scans.

Consider selectivity when creating an index

Calculate selectivity as count(distinct col) / count(*). A selectivity of 1 (as with a primary key) is ideal; low selectivity may not justify an index.

Hash Index

Memory engine uses hash indexes: O(1) lookup by hashing the key, with collisions resolved via linked lists. Hash indexes do not support range queries.

SQL Optimization Tool: EXPLAIN

EXPLAIN shows which indexes are used and how tables are scanned, helping diagnose slow queries.

Demo table student (images omitted) illustrates composite index (age, address) and left‑most prefix matching, as well as cases where the composite index becomes ineffective.

In summary, understanding the underlying data structures of indexes enables effective SQL tuning; use EXPLAIN to verify your assumptions.

PS: If you found this sharing useful, feel free to like and forward.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Indexing MySQL sql-optimization EXPLAIN B+Tree Database Fundamentals

Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.