Databases 21 min read

Why MySQL Queries Go Slow and How to Fix Them with Indexes, ES, and HBase

This article explains why MySQL queries become slow, explores index-related pitfalls and optimization techniques, and then compares ElasticSearch and HBase as complementary solutions for large‑scale data and search scenarios, offering practical tips and code examples.

Efficient Ops

Jun 9, 2021

Why MySQL Queries Go Slow and How to Fix Them with Indexes, ES, and HBase

1. What Does a Slow MySQL Query Feel Like?

Most internet applications are read‑heavy and write‑light, so fast reads are essential. Various factors can cause a query to become painfully slow.

1.1 Index

When data volume is modest, most slow queries can be solved with proper indexes, and many slow queries stem from bad indexes.

MySQL indexes are based on B+ trees – a fact often memorized for interviews – leading to discussions about left‑most prefix indexes and B+ tree variants.

The left‑most prefix rule actually describes how composite indexes work; using a well‑designed composite index can dramatically speed up queries because of index‑pushdown.

Index‑pushdown means that if the query condition is covered by a composite index (e.g., (a,b)), MySQL can evaluate the second column directly inside the index, reducing the need to read the full row.

If the queried columns are fully contained in the composite index, the index becomes a covering index, eliminating the need for a table lookup.

1.1.1 Why Indexes May Fail

Even with an index, queries can remain slow if the index is not used. Common reasons for index failure (detectable with EXPLAIN) include:

Using !=, <>, OR, or functions on the indexed column in the WHERE clause.

LIKE patterns that start with a wildcard (e.g., LIKE '%foo').

String literals without quotes.

Low‑cardinality columns (e.g., gender).

Not matching the left‑most prefix of a composite index.

MySQL disables the index because these operations can break the ordered nature of the index.

1.1.2 Why These Causes Break Indexes

Functions on indexed columns force MySQL to evaluate the expression before it can use the index, effectively losing the ordered path.

Implicit type or character‑set conversions can also disrupt index ordering.

For example, a query like WHERE LENGTH(a) = 6 on an indexed string column forces MySQL to compute the length for every row, which defeats the index.

1.1.3 Why Not Index Low‑Cardinality Fields Like Gender

Indexing a column with very low distinct values (e.g., gender) often yields no performance gain because the index still points to many rows, causing more work than a full table scan.

In InnoDB, if a column’s distinct values occupy roughly 30% of the table, the optimizer may choose to ignore the index.

1.1.4 Simple and Effective Indexing Practices

Index pushdown: for multi‑condition queries, create a composite index that includes the selective columns.

Covering index: store all needed columns in the composite index to avoid table lookups.

Prefix index: index only the first N characters of a string column to reduce index size.

Avoid functions on indexed columns.

Consider the maintenance cost of indexes on tables that are write‑heavy.

1.1.5 How to Diagnose a Wrong Index Choice

Sometimes an index looks correct but MySQL still chooses a sub‑optimal one, leading to excessive scans. Common causes are inaccurate statistics (fix with ANALYZE TABLE) and optimizer mis‑estimation (use FORCE INDEX or rewrite the query).

1.2 MDL Locks

MySQL 5.5 introduced Metadata Locks (MDL). A CRUD operation acquires an MDL read lock, while DDL acquires an MDL write lock. MDL write locks block read locks; you can see waiting statements with SHOW PROCESSLIST.

1.3 Flush Waits

Flush operations are usually fast, but they can be blocked by other statements, causing queries to wait for a table flush.

1.4 Row Locks

A transaction holding a write lock that has not been committed can cause other sessions to wait.

1.5 Current Read (Repeatable Read)

InnoDB’s default isolation level is REPEATABLE READ. If transaction A starts, and transaction B updates rows and commits, A will see the state before B’s commit by applying undo logs.

1.6 Large‑Table Scenarios

Tables with billions of rows stress I/O and CPU even with good indexing. InnoDB stores each B+‑tree node in 16 KB pages, typically three levels deep for ~2 million rows. Buffer pool size and LRU eviction affect cache hit rates.

1.6.1 Sharding (Database & Table)

When I/O is the bottleneck, split databases (vertical sharding) and tables. When CPU is the bottleneck, use horizontal sharding to distribute rows across many tables.

Tools: Sharding‑Sphere, TDDL, Mycat. Sharding requires key selection, rule definition, data migration, and planning for scaling.

1.6.2 Read/Write Splitting

If QPS is high and reads far exceed writes, use master‑slave replication to offload reads, improving availability and load balancing.

Challenges include replication lag (stale reads) and routing logic (application‑level or middleware).

1.7 Summary

The above lists common MySQL slow‑query causes and mitigation methods, and introduces sharding and read/write splitting for large‑scale data.

2. How to Evaluate ElasticSearch

2.1 What Can ES Do?

ElasticSearch, built on Lucene, provides near‑real‑time distributed search, full‑text search, NoSQL JSON document storage, log monitoring, and data analytics. It is often paired with Logstash and Kibana (the ELK stack).

2.2 ES Architecture

Before ES 7.0 the hierarchy was Index → Type → Document; after 7.0, Type is removed, so think of Index as a table.

Common DevTools commands:

GET /_cat/health?v&pretty   // cluster health
GET /_cat/shards?v          // shard status
GET yourindex/_mapping      // mapping structure
GET yourindex/_settings     // index settings
GET /_cat/indices?v        // list all indices

Mapping defines the document schema (similar to a MySQL table), while settings control shard and replica counts.

2.3 Why ES Queries Are Fast

ES uses an inverted index: terms point to posting lists of document IDs. A Term Dictionary stores term metadata, and a Term Index (in‑memory FST) quickly locates dictionary offsets.

The in‑memory Term Index lets ES locate terms without costly disk random accesses, making tokenized searches very fast.

2.3.1 Tokenized Search

Because terms are indexed, a phrase like “Ada” can be found instantly, unlike a MySQL LIKE '%da%' which requires a full scan.

2.3.2 Exact Search

For exact matches the advantage narrows; MySQL covering indexes may be comparable.

2.4 When to Use ES

2.4.1 Full‑Text Search

MySQL fuzzy string searches are inefficient; ES handles them easily, e.g., searching chat logs.

Chinese text requires a proper analyzer (e.g., IK) to avoid poor tokenization.

2.4.2 Combined Queries

For massive datasets, store searchable fields in ES (with IDs) and keep the full records in MySQL or HBase. This hybrid approach leverages ES’s fast search and the relational store’s completeness.

POST yourindex/_analyze
{
  "field":"yourfield",
  "text":"我可真是个机灵鬼"
}

2.5 Summary

ES is fast because of its inverted index and in‑memory term index, making it ideal for full‑text and large‑scale search scenarios, while still allowing integration with traditional databases.

3. HBase

3.1 Storage Structure

Unlike row‑oriented relational databases, HBase stores data column‑family wise.

Each row has a RowKey (primary key) sorted lexicographically, a timestamp version, column families (e.g., info, area), dynamic columns, and cells holding values.

3.2 OLTP vs. OLAP

OLTP: traditional relational DB workloads, handling daily transactions.

OLAP: data‑warehouse workloads for complex analytics. HBase is not an OLAP engine; it lacks transactions and is column‑family oriented.

3.3 RowKey Design

HBase supports only three query patterns: single‑row lookup by RowKey, range scans by RowKey, and full table scans. Good RowKey design is therefore critical.

3.4 Use Cases

HBase excels at write‑intensive scenarios with high throughput. Single‑row or small‑range reads are fine, but complex queries are not supported.

4. Conclusion

Software development should be incremental; technology must serve the project, and suitability outweighs novelty.

To speed up queries, first eliminate bugs, then apply the appropriate indexing, sharding, or search‑engine solutions discussed above.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Indexing Sharding MySQL HBase Read‑Write Splitting Database Performance

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.