Databases 20 min read

Why Your MySQL Queries Are Slow and How to Fix Them with Indexes, ES & HBase

This article explains why MySQL queries become slow—covering index misuse, MDL locks, flush waits, large‑table bottlenecks, and read/write splitting—then shows how ElasticSearch’s inverted index and HBase’s column‑family design can complement MySQL for faster search and scalable storage.

Efficient Ops

Oct 13, 2024

Why Your MySQL Queries Are Slow and How to Fix Them with Indexes, ES & HBase

1. What Is the Experience of a Slow MySQL Query?

Most internet applications are read‑heavy and write‑light, so fast reads are essential. Various factors can cause a query to become painfully slow.

1.1 Indexes

When data volume is modest, most slow queries can be solved with proper indexes, but many slow queries arise from poorly designed indexes.

MySQL indexes are based on B+ trees; interviewers love to ask about left‑most prefix indexes, B+ trees, and related concepts.

Using a composite index (a,b) allows index‑pushdown: after matching a, MySQL can evaluate b within the index, reducing table lookups.

If a query’s columns are fully covered by a composite index, the index becomes a covering index and no table lookup is needed.

1.1.1 Why Do Indexes Fail?

Even with an index, queries can be slow when the index is not used (index‑skip). Common reasons for index‑skip include:

WHERE uses !=, <>, OR, or functions on the indexed column.

LIKE patterns that start with %.

String literals without quotes.

Low‑cardinality columns (e.g., gender).

Not matching the left‑most prefix of a composite index.

1.1.2 Why Do These Practices Break Indexes?

Functions on indexed columns (e.g., LENGTH(a) = 6) force MySQL to evaluate the expression before using the index, effectively losing the ordered nature of the B+ tree.

Implicit type or character‑set conversions can also break index ordering, especially in join queries where column types match but encodings differ.

1.1.3 Why Not Index Low‑Cardinality Fields?

Indexing a low‑cardinality column like gender often leads to more table lookups than a full scan, so InnoDB may ignore such indexes when the column occupies roughly 30% of rows.

1.1.4 Simple and Effective Indexing Techniques

Index push‑down: use composite indexes for multi‑condition queries.

Covering indexes: keep all needed columns in the index to avoid table lookups.

Prefix indexes: index only the first N characters of a string.

Avoid functions on indexed columns.

1.1.5 When MySQL Chooses the Wrong Index

Sometimes MySQL picks a low‑selectivity index, causing excessive scans. This can be due to inaccurate statistics (run ANALYZE TABLE) or optimizer mis‑prediction (use FORCE INDEX or rewrite the query).

1.2 MDL Locks

Since MySQL 5.5, metadata locks (MDL) are taken for CRUD operations (read lock) and schema changes (write lock). A statement waiting for an MDL write lock blocks MDL read locks; you can see such waits with SHOW PROCESSLIST.

1.3 Flush Waits

Flush commands can be blocked by other statements, causing subsequent SELECTs to wait. SHOW PROCESSLIST shows Waiting for table flush status.

1.4 Row Locks

A transaction holding an uncommitted write lock can block other statements.

1.5 Current Read (Repeatable Read)

InnoDB’s default isolation level is REPEATABLE READ. If transaction B commits before transaction A reads, A must walk the undo log to see the value before B’s commit.

1.6 Large‑Table Scenarios

Tables with billions of rows can hit I/O or CPU bottlenecks even with good indexing. InnoDB stores each B+‑tree node in 16 KB pages, typically three levels deep for ~2 million rows.

When the buffer pool cannot hold all index data, cache hit rates drop, and LRU eviction may push hot data out.

1.6.1 Sharding (Database/Table Partitioning)

Choose vertical sharding (different databases for different business domains) for I/O bottlenecks, or horizontal sharding (splitting rows across many tables) for CPU‑bound query latency. Tools include Sharding‑Sphere, TDDL, Mycat.

1.6.2 Read‑Write Splitting

When read traffic far exceeds write traffic, a master‑slave setup can distribute reads, improve availability, and balance load. Challenges include replication lag (stale reads) and routing logic.

2. How to Evaluate ElasticSearch

ElasticSearch (ES) is a real‑time distributed search engine built on Lucene, used for full‑text search, JSON document storage, log monitoring, and analytics.

2.1 What Can ES Do?

Common use cases are full‑text search and log analysis, often paired with Logstash and Kibana (the ELK stack).

2.2 ES Architecture

Before ES 7.0 the hierarchy was index → type → document; after 7.0, type was removed, making index analogous to a table.

2.3 Why Is ES Fast?

ES uses an inverted index: terms are indexed, and document IDs are stored in posting lists. A memory‑resident term index (FST) quickly locates term dictionary offsets, reducing disk random I/O.

2.3.1 Tokenized Search

Because terms are tokenized, queries like %da% that would require a full table scan in MySQL become fast lookups in ES.

2.3.2 Exact Search

For exact matches, ES may be comparable to MySQL’s covering index, but the advantage of the term index diminishes.

2.4 When to Use ES

Full‑text search where MySQL’s LIKE is inefficient.

Large‑scale log or document search with need for fast tokenized queries.

2.5 Sample ES Query

GET yourIndex/_search
{
  "from": 0,
  "size": 10,
  "query": {
    "match_phrase": {
      "log": "xxx"
    }
  }
}

2.6 Integration Tips

In Java projects, Spring Boot provides out‑of‑the‑box support for ES CRUD operations once the ES cluster is reachable.

3. HBase Overview

HBase stores data by column families rather than rows, making it suitable for write‑intensive workloads.

3.1 Storage Model

Rows are sorted by row‑key (lexicographically). Each column family can contain dynamic columns; a cell holds the actual value with a timestamp version.

3.2 OLTP vs. OLAP

HBase is not designed for OLAP; it lacks transactions and is optimized for fast writes and point reads based on row‑key.

3.3 RowKey Design

Only three query patterns are supported: single‑row lookup by row‑key, range scans by row‑key, and full table scans. Good row‑key design is crucial.

3.4 Use Cases

HBase excels in write‑heavy scenarios where low‑latency ingestion is required, while reads are typically limited to small ranges or single rows.

4. Conclusion

Software development should be incremental; technology must serve the project, and suitability outweighs novelty. To speed up queries, first locate and fix bugs, then apply the appropriate optimizations discussed above.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Indexing MySQL HBase database optimization

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.