Why Your MySQL Queries Are Slow and How to Fix Them with Indexes, ES, and HBase
This article analyzes common causes of slow MySQL queries—especially index misuse—offers practical indexing techniques, explains MDL locks and large‑table bottlenecks, and then compares ElasticSearch and HBase as complementary solutions for high‑performance search and storage.
MySQL Slow Query Causes and Index Optimization
Read‑heavy internet applications rely on fast query execution. Most slow queries stem from index misuse or missing indexes.
Common Index Failure Reasons
Using !=, <>, OR, functions, or expressions on indexed columns in the WHERE clause.
LIKE patterns that start with a leading wildcard ( %).
Omitting quotes around string literals.
Low‑selectivity columns (e.g., gender) that filter too few rows.
Not matching the leftmost prefix of a composite index.
Why These Patterns Break Index Usage
Functions or expressions (e.g., WHERE LENGTH(col)=6) force MySQL to evaluate the expression before it can traverse the B+‑tree, so the optimizer cannot use the index. Implicit type or charset conversion can also disrupt the sorted order required by the index. A leading‑wildcard LIKE ( LIKE '%abc%') destroys the ability to perform a range scan, causing a full table scan.
Low‑Selectivity Indexes
Indexes on columns with very few distinct values often degrade performance because the engine still needs to read many rows; a full table scan can be cheaper.
Practical Index Best Practices
Index push‑down : create composite indexes so that MySQL can evaluate multiple predicates inside the index.
Covering index : include all columns required by the query in the index to avoid a table lookup.
Prefix index : for long VARCHAR columns, index only the first N characters.
Avoid functions on indexed columns.
Consider maintenance cost for write‑heavy tables; each index adds overhead on INSERT/UPDATE/DELETE.
Diagnosing Wrong Index Choice
Run EXPLAIN to see which index MySQL selects. If the chosen index is sub‑optimal, you can:
Refresh statistics with ANALYZE TABLE tbl_name.
Force a specific index using FORCE INDEX (idx_name).
Metadata Locks (MDL)
Since MySQL 5.5, DDL statements acquire a metadata lock. A write lock blocks read locks. Use SHOW PROCESSLIST and look for the state “Waiting for table metadata lock” to identify blocking sessions.
Flush Wait
Flush commands (e.g., FLUSH TABLES) can be blocked by other statements. The waiting state appears as “Waiting for table flush” in SHOW PROCESSLIST.
Row Locks
Uncommitted write transactions hold row locks, causing other sessions to wait until the transaction commits or rolls back.
Repeatable‑Read Isolation (InnoDB Default)
Each transaction reads a consistent snapshot. When a concurrent transaction commits, the reading transaction applies undo logs to reconstruct the view as of its start time.
Large‑Table Considerations
In tables with billions of rows, even well‑indexed queries may hit I/O or CPU limits. InnoDB stores B+‑tree nodes of 16 KB, typically three levels deep. Under heavy load the buffer pool may evict hot pages, reducing cache hit rate.
Two common mitigation strategies:
Sharding (horizontal or vertical) : split data across multiple databases or tables based on a shard key. Tools such as Sharding‑Sphere, TDDL, and Mycat assist with rule definition, data migration, and scaling.
Read/Write Splitting : use a master‑slave (primary‑replica) topology to offload read traffic to replicas, improving scalability and availability.
ElasticSearch Overview
ElasticSearch (ES) is a Lucene‑based near‑real‑time distributed search engine. It excels at full‑text search, log aggregation (ELK stack), and JSON document storage.
Structure Changes
Before ES 7.0 the hierarchy was index → type → document. Since 7.0 the type layer was removed, making index analogous to a table.
Why ES Queries Are Fast
ES builds an inverted index: each term maps to a posting list of document IDs. A term dictionary (stored on disk) is complemented by an in‑memory Finite State Transducer (FST) term index, allowing rapid location of the dictionary entry without costly random I/O.
Example Search Request
GET yourIndex/_search
{
"from": 0,
"size": 10,
"query": {
"match_phrase": {
"log": "xxx"
}
}
}This request performs a phrase match, returning documents that contain the exact sequence of terms.
Cluster Inspection Commands
GET /_cat/health?v&pretty– cluster health. GET /_cat/shards?v – shard allocation. GET yourindex/_mapping – mapping (schema) definition. GET yourindex/_settings – index settings (shard count, replicas, etc.). GET /_cat/indices?v – list all indices on the node.
Mapping Example (partial)
"appname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}Text fields are analyzed (tokenized) for full‑text search, while the keyword sub‑field stores the exact value for term‑level queries.
When to Use ES
Full‑text search : fuzzy, phrase, or proximity queries on large text fields (e.g., chat message search).
Combined queries : store searchable fields and document IDs in ES, keep the full record in MySQL; query ES first, then fetch details from MySQL.
Hybrid architectures : use ES for search and a write‑optimized store such as HBase for massive ingestion, linking records via a common key.
HBase Basics
Storage Model
HBase is a column‑family NoSQL store. Rows are identified by a lexicographically ordered row key . Each column family (e.g., info, area) groups related columns, which can be added dynamically.
OLTP vs OLAP
Row‑oriented databases excel at OLTP (transactional) workloads, while column‑oriented stores are suited for OLAP (analytical) queries. HBase is optimized for write‑heavy OLTP scenarios but is not a full‑featured OLAP engine.
RowKey Design
HBase supports only three query patterns: single‑row lookup by row key, range scans on row keys, and full table scans. Therefore, a well‑designed row key (e.g., prefixing with a region identifier, timestamp, or hash) is critical for performance and data distribution.
Typical Use Cases
HBase shines in write‑intensive applications that require fast ingestion and low‑latency reads for single rows or small ranges. Complex ad‑hoc analytics are better served by dedicated OLAP systems.
References
https://juejin.im/post/5bfe771251882509a7681b3a
https://wsgzao.github.io/post/elk/
https://www.cnblogs.com/luxiaoxun/p/5452502.html
https://www.ibm.com/developerworks/cn/analytics/library/ba-cn-bigdata-hbase/index.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
