Why Your MySQL Queries Are Slow and How to Fix Them with Indexes, ES & HBase
This article explains why MySQL queries become slow—covering index misuse, MDL locks, flush waits, large‑table bottlenecks, and read/write splitting—then shows how ElasticSearch’s inverted index and HBase’s column‑family design can complement MySQL for faster search and scalable storage.
1. What Is the Experience of a Slow MySQL Query?
Most internet applications are read‑heavy and write‑light, so fast reads are essential. Various factors can cause a query to become painfully slow.
1.1 Indexes
When data volume is modest, most slow queries can be solved with proper indexes, but many slow queries arise from poorly designed indexes.
MySQL indexes are based on B+ trees; interviewers love to ask about left‑most prefix indexes, B+ trees, and related concepts.
Using a composite index (a,b) allows index‑pushdown: after matching
a, MySQL can evaluate
bwithin the index, reducing table lookups.
If a query’s columns are fully covered by a composite index, the index becomes a covering index and no table lookup is needed.
1.1.1 Why Do Indexes Fail?
Even with an index, queries can be slow when the index is not used (index‑skip). Common reasons for index‑skip include:
WHERE uses
!=,
<>, OR, or functions on the indexed column.
LIKE patterns that start with
%.
String literals without quotes.
Low‑cardinality columns (e.g., gender).
Not matching the left‑most prefix of a composite index.
1.1.2 Why Do These Practices Break Indexes?
Functions on indexed columns (e.g.,
LENGTH(a) = 6) force MySQL to evaluate the expression before using the index, effectively losing the ordered nature of the B+ tree.
Implicit type or character‑set conversions can also break index ordering, especially in join queries where column types match but encodings differ.
1.1.3 Why Not Index Low‑Cardinality Fields?
Indexing a low‑cardinality column like gender often leads to more table lookups than a full scan, so InnoDB may ignore such indexes when the column occupies roughly 30% of rows.
1.1.4 Simple and Effective Indexing Techniques
Index push‑down: use composite indexes for multi‑condition queries.
Covering indexes: keep all needed columns in the index to avoid table lookups.
Prefix indexes: index only the first N characters of a string.
Avoid functions on indexed columns.
1.1.5 When MySQL Chooses the Wrong Index
Sometimes MySQL picks a low‑selectivity index, causing excessive scans. This can be due to inaccurate statistics (run
ANALYZE TABLE) or optimizer mis‑prediction (use
FORCE INDEXor rewrite the query).
1.2 MDL Locks
Since MySQL 5.5, metadata locks (MDL) are taken for CRUD operations (read lock) and schema changes (write lock). A statement waiting for an MDL write lock blocks MDL read locks; you can see such waits with
SHOW PROCESSLIST.
1.3 Flush Waits
Flush commands can be blocked by other statements, causing subsequent SELECTs to wait.
SHOW PROCESSLISTshows Waiting for table flush status.
1.4 Row Locks
A transaction holding an uncommitted write lock can block other statements.
1.5 Current Read (Repeatable Read)
InnoDB’s default isolation level is REPEATABLE READ. If transaction B commits before transaction A reads, A must walk the undo log to see the value before B’s commit.
1.6 Large‑Table Scenarios
Tables with billions of rows can hit I/O or CPU bottlenecks even with good indexing. InnoDB stores each B+‑tree node in 16 KB pages, typically three levels deep for ~2 million rows.
When the buffer pool cannot hold all index data, cache hit rates drop, and LRU eviction may push hot data out.
1.6.1 Sharding (Database/Table Partitioning)
Choose vertical sharding (different databases for different business domains) for I/O bottlenecks, or horizontal sharding (splitting rows across many tables) for CPU‑bound query latency. Tools include Sharding‑Sphere, TDDL, Mycat.
1.6.2 Read‑Write Splitting
When read traffic far exceeds write traffic, a master‑slave setup can distribute reads, improve availability, and balance load. Challenges include replication lag (stale reads) and routing logic.
2. How to Evaluate ElasticSearch
ElasticSearch (ES) is a real‑time distributed search engine built on Lucene, used for full‑text search, JSON document storage, log monitoring, and analytics.
2.1 What Can ES Do?
Common use cases are full‑text search and log analysis, often paired with Logstash and Kibana (the ELK stack).
2.2 ES Architecture
Before ES 7.0 the hierarchy was
index → type → document; after 7.0,
typewas removed, making
indexanalogous to a table.
2.3 Why Is ES Fast?
ES uses an inverted index: terms are indexed, and document IDs are stored in posting lists. A memory‑resident term index (FST) quickly locates term dictionary offsets, reducing disk random I/O.
2.3.1 Tokenized Search
Because terms are tokenized, queries like
%da%that would require a full table scan in MySQL become fast lookups in ES.
2.3.2 Exact Search
For exact matches, ES may be comparable to MySQL’s covering index, but the advantage of the term index diminishes.
2.4 When to Use ES
Full‑text search where MySQL’s LIKE is inefficient.
Large‑scale log or document search with need for fast tokenized queries.
2.5 Sample ES Query
<code>GET yourIndex/_search
{
"from": 0,
"size": 10,
"query": {
"match_phrase": {
"log": "xxx"
}
}
}</code>2.6 Integration Tips
In Java projects, Spring Boot provides out‑of‑the‑box support for ES CRUD operations once the ES cluster is reachable.
3. HBase Overview
HBase stores data by column families rather than rows, making it suitable for write‑intensive workloads.
3.1 Storage Model
Rows are sorted by row‑key (lexicographically). Each column family can contain dynamic columns; a cell holds the actual value with a timestamp version.
3.2 OLTP vs. OLAP
HBase is not designed for OLAP; it lacks transactions and is optimized for fast writes and point reads based on row‑key.
3.3 RowKey Design
Only three query patterns are supported: single‑row lookup by row‑key, range scans by row‑key, and full table scans. Good row‑key design is crucial.
3.4 Use Cases
HBase excels in write‑heavy scenarios where low‑latency ingestion is required, while reads are typically limited to small ranges or single rows.
4. Conclusion
Software development should be incremental; technology must serve the project, and suitability outweighs novelty. To speed up queries, first locate and fix bugs, then apply the appropriate optimizations discussed above.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.