Master MySQL Slow Queries, ElasticSearch, and HBase: Practical Performance Tips
This article explores why MySQL queries become slow, delves into index pitfalls and optimization techniques, then compares ElasticSearch and HBase architectures, offering practical guidance on when to use each technology and how to combine them for high‑performance data retrieval.
1. MySQL query slow experience?
Most internet applications are read‑heavy and write‑light; fast reads are essential. Various factors can cause a seemingly perfect query to become slow.
1.1 Index
When data volume is modest, many slow queries can be solved with proper indexes, but improper indexes are also a common cause of slowness.
MySQL indexes are based on B+ trees, a fact often memorized for interviews, leading to discussions about left‑most prefix indexes and tree structures.
Composite indexes follow the left‑most prefix rule; using them wisely can dramatically improve query speed because of index‑push‑down and covering indexes.
1.1.1 Reasons for index failure
Using !=, <>, OR, or functions on indexed columns in the WHERE clause.
LIKE statements with a leading %.
String literals without proper quoting.
Low‑cardinality columns (e.g., gender).
Not matching the left‑most prefix of a composite index.
1.1.2 Why these cause index failure
MySQL’s B+ tree cannot efficiently use the index when the above patterns break the ordered nature of the index.
Function operations
When a function is applied to an indexed column, e.g., where length(a) = 6, the optimizer cannot use the index because the value is computed at runtime.
Implicit conversion
Implicit type or character‑set conversions can also invalidate index usage.
Implicit type conversion is rare for frameworks like JOOQ.
Implicit character‑set conversion may appear in join queries when columns share a type but differ in encoding.
Breaking order
Operations such as leading‑% LIKE or unquoted strings may disrupt index ordering, causing MySQL to skip the index.
1.1.3 Why not index low‑cardinality fields like gender
Low‑cardinality fields provide little selectivity; indexing them often yields no performance gain and can even be slower.
For non‑clustered indexes, a query on a gender field may require scanning many rows after the index lookup, making a full table scan more efficient.
1.1.4 Simple and effective indexing methods
Index push‑down: use composite indexes to let the engine evaluate additional conditions within the index.
Covering index: store all needed columns in the index to avoid table lookups.
Prefix index: index only the first N characters of a string to reduce index size.
Avoid functions on indexed columns.
Consider maintenance cost for frequently updated columns.
1.1.5 Evaluating a wrong index choice
Sometimes an index looks correct but the optimizer picks a low‑selectivity one, leading to excessive scans.
Inaccurate statistics – run ANALYZE TABLE x to refresh.
Optimizer mis‑prediction – use FORCE INDEX or rewrite the query.
1.2 MDL lock
MySQL 5.5 introduced Metadata Locks (MDL). CRUD operations acquire a read MDL; schema changes acquire a write MDL, and they are mutually exclusive.
1.3 Flush
Flush commands can be blocked by other statements, causing queries to wait. Use SHOW PROCESSLIST to see Waiting for table flush status.
1.4 Row lock
A transaction holding a write lock without committing can block other operations.
1.5 Current read
InnoDB’s default isolation is REPEATABLE READ. A transaction may need to walk the undo log to see a consistent snapshot.
1.6 Large‑table scenarios
Tables with billions of rows face I/O and CPU bottlenecks even with good indexing. InnoDB stores each B+‑tree node in 16 KB pages, typically three levels deep. Buffer pool pressure can cause hot data eviction.
1.6.1 Sharding
Solution
Choose sharding based on the bottleneck: IO‑bound workloads benefit from database‑level sharding (vertical), while CPU‑bound workloads benefit from horizontal table sharding.
If disk or network I/O is the bottleneck, apply database and vertical table sharding.
If query latency is the bottleneck, apply horizontal table sharding.
Horizontal sharding distributes rows across many tables; vertical sharding splits columns into separate tables.
Issues
Unique ID generation, non‑partition‑key queries, and scaling strategies need careful planning.
Various ID strategies: auto‑increment, Snowflake, segment, GUID, etc.
Non‑partition‑key queries can be handled via mapping tables or secondary indexes.
Scaling depends on the sharding algorithm; range‑based sharding eases data migration compared to random modulo.
1.6.2 Read‑write separation
Why read‑write separation
When read traffic far exceeds write traffic, a master‑slave setup can distribute reads, improve availability, and balance load.
Problems
Typical issues include replication lag (stale reads) and routing logic for directing queries to master or slave.
Stale reads caused by master‑slave delay.
Routing decisions can be handled in application code or middleware.
1.7 Summary
The above enumerates common MySQL slow‑query causes and remedies, and introduces typical solutions for large‑scale data such as sharding and read‑write separation.
2. How to evaluate ElasticSearch
Beyond MySQL, full‑text search and log analysis often benefit from ElasticSearch (ES).
2.1 What ES can do
ES is a near‑real‑time distributed search engine built on Lucene, suitable for full‑text search, JSON document storage, log monitoring, and data analytics.
2.2 ES structure
Before ES 7.0 the hierarchy was Index → Type → Document (similar to database → table → row). Types were removed in 7.0; think of an index as a table.
GET /_cat/health?v&pretty – cluster health.
GET /_cat/shards?v – shard status.
GET yourindex/_mapping – index mapping (schema).
GET yourindex/_settings – index settings (shard count, replicas).
GET /_cat/indices?v – list all indices.
Mapping defines field types; settings control shard and replica numbers.
GET yourIndex/_search
{
"from": 0,
"size": 10,
"query": {
"match_phrase": {
"log": "xxx"
}
}
}The query uses match_phrase to return documents containing the exact phrase.
2.3 Why ES is fast
ES relies on inverted indexes. Terms are stored in a Term Dictionary, and a Term Index (in‑memory FST) quickly locates dictionary entries.
The Term Index reduces disk random access, making term lookup very fast. However, ES excels mainly for tokenized searches; exact match queries may not outperform a well‑indexed MySQL query.
2.3.1 Tokenized search
Because ES indexes tokenized terms, a search for "Ada" can be resolved without a full scan, unlike MySQL's %da% pattern.
2.3.2 Exact search
For exact matches, the advantage diminishes as the Term Index adds an extra lookup step.
2.4 When to use ES
2.4.1 Full‑text search
Keyword‑based fuzzy searches are inefficient in MySQL but trivial in ES. For example, searching chat messages.
Tokenization
Chinese analysis requires a tokenizer like IK; otherwise phrase queries may return only exact matches.
POST yourindex/_analyze
{
"field": "yourfield",
"text": "我可真是个机灵鬼"
}2.4.2 Combined queries
ES + MySQL
Store searchable fields and IDs in ES for fast tokenized search, while keeping full records in MySQL for transactional integrity.
ES + HBASE
For massive write‑heavy workloads, use HBase as the primary store and ES as a secondary index layer.
Both patterns separate indexing from data storage, but introduce challenges such as data sync, mapping design, and high availability.
2.5 Summary
ES achieves speed through inverted indexes and in‑memory term lookup, making it ideal for full‑text and log queries, while still requiring careful integration with relational stores for complete solutions.
3. HBASE
HBase is a column‑oriented NoSQL store designed for write‑intensive workloads.
3.1 Storage structure
Unlike row‑oriented MySQL, HBase stores data by column families. Each row is identified by a RowKey (sorted lexicographically) and can have multiple versions (timestamps). Columns belong to families such as info or area, and cells hold the actual values.
3.2 OLTP and OLAP
OLTP: traditional relational databases for routine transactional processing.
OLAP: data‑warehouse systems for complex analytical queries.
Column‑oriented stores excel at OLAP, while row‑oriented stores suit OLTP. HBase is not an OLAP engine; it lacks transactions and is primarily used for write‑heavy scenarios.
3.3 RowKey design
Effective HBase schema hinges on a well‑designed RowKey because HBase only supports three query patterns: single‑row lookup, range scan, and full table scan.
3.4 Use cases
HBase shines in write‑intensive applications where fast ingestion is critical. Point queries or small scans are acceptable, but complex ad‑hoc queries are not supported.
4. Summary
Software development should be incremental; technology must serve the project, and simplicity often beats novelty.
To achieve fast queries, first eliminate bugs, then explore optimizations. The solutions discussed—MySQL indexing, sharding, read‑write separation, ElasticSearch integration, and HBase usage—each involve detailed trade‑offs that engineers must address in practice.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
