Databases 28 min read

Understanding MySQL Indexes, B+Tree vs Hash, Partitioning, and Redis Fundamentals

This article explains why auto‑increment primary keys are preferred in MySQL, how B+Tree and hash indexes differ, the benefits of composite and partitioned indexes, isolation levels, MVCC, row‑level locking, triggers, stored procedures, optimization tips, MyISAM vs InnoDB, table design guidelines, and also covers Redis architecture, persistence, replication, and eviction policies.

Java Captain

Jul 7, 2018

Understanding MySQL Indexes, B+Tree vs Hash, Partitioning, and Redis Fundamentals

Why Use an Auto‑Increment Column as Primary Key

InnoDB chooses the primary key as the clustered index; if none is defined it falls back to the first unique non‑NULL index or an internal ROWID. Data rows are stored in the leaf nodes of this B+Tree, so inserting records with a sequential auto‑increment key appends to the end of the current page, minimizing page splits and fragmentation.

Why Indexes Improve Efficiency

Indexes store data in sorted order.

Sorted order allows binary‑search‑like lookups without full scans.

In the best case, lookup cost approaches log₂(N).

B+Tree Index vs Hash Index

B+Tree is a balanced multi‑way tree with ordered leaf nodes linked together, enabling range queries.

Hash index applies a hash function to the key, providing O(1) equality lookups but no ordering.

Hash indexes excel at equality queries when key values are not highly duplicated.

Hash indexes cannot support range queries, index‑based sorting, or left‑most prefix matching.

Difference Between B‑Tree and B+Tree

B‑Tree stores keys and data in every node; leaf nodes have no pointers.

B+Tree stores only keys in internal nodes, while leaf nodes contain full records and are linked for sequential access.

Why B+Tree Is More Suitable for DBMS and OS File Indexes

Internal nodes are smaller, allowing more keys per page and reducing I/O.

All lookups traverse the same number of levels, giving stable performance.

MySQL Composite Index

A composite index (e.g., (a,b,c)) can be used from left to right; queries can use a, a,b, or a,b,c but not b,c alone. It works like a phone‑book where sorting is first by the first column then by the second.

When Not to Build or to Minimize Indexes

Very small tables.

Tables with frequent inserts/updates/deletes.

Columns with low cardinality (e.g., boolean flags).

Columns that are often queried together with a high‑cardinality primary key.

MySQL Partitioning

What Is Table Partitioning?

Partitioning splits a logical table into multiple physical partitions based on a rule, improving manageability and performance.

Difference Between Partitioning and Sharding

Partitioning keeps a single logical table; sharding creates multiple independent tables.

Benefits of Partitioning

Data can reside on different devices, allowing larger datasets.

Queries that include the partition key can scan only relevant partitions, and aggregate operations can run in parallel.

Maintenance tasks like bulk deletes become simple by dropping a partition.

Helps avoid certain bottlenecks such as InnoDB index mutex contention.

Limitations

Maximum 1024 partitions per table.

Older MySQL versions require integer partition expressions.

All primary‑key or unique‑key columns must be part of the partition key.

Foreign keys are not supported on partitioned tables.

Partitioning applies to both data and indexes; you cannot partition only one of them.

Checking Partition Support

Run SHOW VARIABLES LIKE '%partition%'; – the variable have_partitioning should be YES.

Supported Partition Types

RANGE – split by numeric or date ranges.

LIST – split by explicit list values.

HASH – split by a hash of one or more columns.

KEY – similar to HASH but uses MySQL’s internal hash function.

Four Isolation Levels

Serializable – prevents dirty reads, non‑repeatable reads, and phantom reads.

Repeatable Read – prevents dirty and non‑repeatable reads.

Read Committed – prevents dirty reads.

Read Uncommitted – lowest level, no guarantees.

MVCC (Multi‑Version Concurrency Control)

InnoDB uses MVCC to allow lock‑free reads, improving concurrency for read‑heavy workloads.

Snapshot read – reads a visible version without acquiring locks.

Current read – reads the latest version and acquires a lock.

Row‑Level Locking Advantages

Low contention when different threads modify different rows.

Only changed rows are rolled back.

Long‑duration locks can be applied to a single row.

Row‑Level Locking Disadvantages

Higher memory usage than page‑ or table‑level locks.

More locks can degrade performance for large scans or GROUP BY.

Simple MySQL Trigger Example

CREATE TRIGGER trigger_name BEFORE INSERT ON table_name FOR EACH ROW BEGIN ... END;

What Is a Stored Procedure?

A stored procedure is a reusable set of SQL statements that can accept input/output parameters, contain control‑flow statements, and execute faster after the first compilation.

DROP PROCEDURE IF EXISTS `proc_adder`;
DELIMITER ;;
CREATE PROCEDURE `proc_adder`(IN a INT, IN b INT, OUT sum INT)
BEGIN
  DECLARE c INT;
  IF a IS NULL THEN SET a = 0; END IF;
  IF b IS NULL THEN SET b = 0; END IF;
  SET sum = a + b;
END;;
DELIMITER ;

MySQL Optimization Tips

Enable query cache.

Use EXPLAIN to analyze query plans.

Use LIMIT 1 when only one row is needed.

Index searchable columns.

Prefer ENUM over VARCHAR for fixed‑value columns.

Use prepared statements to improve performance and security.

Consider vertical partitioning for wide tables.

Choose the appropriate storage engine.

Key vs Index

A key is a constraint (primary, unique, foreign) and may also act as an index.

An index is solely an auxiliary data structure to speed up lookups.

MyISAM vs InnoDB

InnoDB supports transactions and foreign keys; MyISAM does not.

InnoDB uses clustered indexes; MyISAM uses non‑clustered indexes.

MyISAM stores row count in metadata, making COUNT(*) fast.

MyISAM supports full‑text indexes; InnoDB does not (as of older versions).

Database Table Creation Best Practices

Field Naming and Configuration

Avoid unrelated fields.

Use consistent, meaningful names (no mixed languages or ambiguous abbreviations).

Prefer snake_case and avoid case mixing.

Do not use reserved words.

Match field types to their intended data.

Select numeric types carefully.

Allocate sufficient length for text fields.

Special System Fields

Add soft‑delete flags and audit columns.

Implement versioning if needed.

Structural Considerations

Normalize multi‑valued attributes into separate tables.

Separate large text/blob columns into auxiliary tables.

Prefer VARCHAR over CHAR for variable‑length data.

Define a primary key for every table.

Set sensible default values to avoid NULLs in indexed columns.

Create indexes on unique and non‑NULL columns, but limit their number.

Redis Overview

Single‑Threaded Model

Redis processes all network requests in a single thread, eliminating the need for internal concurrency control for most operations.

Why Redis Is Fast

Most commands are pure in‑memory operations (O(1) access).

Single‑threaded design avoids context switches and lock contention.

Uses non‑blocking I/O with epoll for high throughput.

Internal Implementation

Redis relies on epoll and a custom event loop; read/write/close/accept events are handled as epoll events.

Thread‑Safety

Because all commands run in one thread, Redis is thread‑safe by design, though multi‑command atomicity may require locks or distributed locks.

Benefits of Using Redis

In‑memory speed comparable to a HashMap (O(1) lookups).

Rich data types: strings, lists, sets, sorted sets, hashes.

Atomic transactions.

Use cases: caching, messaging, key expiration, automatic eviction.

Redis vs Memcached

Redis supports richer data structures.

Redis is generally faster.

Redis offers persistence (RDB/AOF).

Redis supports master‑slave replication.

Redis can store values up to 1 GB, whereas Memcached limits to 1 MB.

Redis Master‑Slave Replication

Slave sends SYNC to master.

Master creates an RDB snapshot and buffers incoming writes.

After snapshot, master sends the file and buffered writes to the slave.

Slave loads the snapshot and replays the buffered commands.

Subsequent writes are streamed from master to slave.

Replication can overload the master; a hierarchical master‑slave topology mitigates this.

Persistence Options

RDB – periodic point‑in‑time snapshots (compact, good for backups).

AOF – logs every write command; replayed on restart for full recovery.

Both can be used together; AOF provides more complete data recovery.

Eviction Policies (6 Types)

volatile‑lru – evict least recently used keys with an expiration.

volatile‑ttl – evict keys that are about to expire.

volatile‑random – evict random keys with an expiration.

allkeys‑lru – evict least recently used keys among all keys.

allkeys‑random – evict random keys among all keys.

no‑eviction – never evict (writes fail when memory is full).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database redis mysql indexes Partitioning B+Tree

Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.