Understanding and Optimizing Fast Queries: MySQL Indexes, ElasticSearch, and HBase
This article explains why MySQL queries become slow, how proper index design, MDL locks, sharding, read‑write separation, and the use of ElasticSearch or HBase can improve query performance in large‑scale systems, and provides practical tips and code examples for each technique.
1. MySQL Slow Query Experience
Most internet applications are read‑heavy, so query speed is critical; many slow queries stem from index misuse or missing indexes.
1.1 Index
MySQL indexes are B+ trees; using left‑most prefix and index push‑down can dramatically speed up queries. A covering index avoids the need to read the table rows.
1.1.1 Causes of Index Failure
Using !=, <>, OR, functions, or expressions on indexed columns.
LIKE patterns that start with %.
Missing quotes around string literals.
Low‑cardinality columns (e.g., gender) that provide little selectivity.
Not matching the left‑most prefix of a composite index.
1.1.2 Why These Cause Failure
Functions and implicit type/character‑set conversions break the ordered nature of the index, causing the optimizer to skip it.
1.1.3 Low‑Cardinality Columns
Indexes on columns with low selectivity often cost more than they save; InnoDB may ignore them when they cover >30% of rows.
1.1.4 Simple Index Strategies
Index push‑down: use composite indexes for multi‑condition queries.
Covering index: keep all needed columns in the index.
Prefix index: index only the first N characters of a string.
Avoid functions on indexed columns.
Consider maintenance cost for frequently updated columns.
1.1.5 When MySQL Chooses the Wrong Index
If statistics are stale, run ANALYZE TABLE x; if the optimizer mis‑chooses, use FORCE INDEX or rewrite the query.
1.2 MDL Locks
MySQL 5.5 introduced metadata locks (MDL). A write lock blocks read locks; use SHOW PROCESSLIST to see sessions waiting for "Waiting for table metadata lock".
1.3 Flush Waits
Flush commands can be blocked by other statements; monitor with SHOW PROCESSLIST for "Waiting for table flush".
1.4 Row Locks
Uncommitted write locks cause other transactions to wait.
1.5 Current Read (Repeatable Read)
InnoDB's default isolation reads undo logs to present a consistent snapshot.
1.6 Large Table Scenarios
For tables with billions of rows, even with good indexes, aggregation can hit I/O or CPU bottlenecks; consider sharding (horizontal) or vertical partitioning and read‑write separation.
1.6.1 Sharding
Choose database‑level sharding for I/O bottlenecks and table‑level sharding for CPU bottlenecks; tools include Sharding‑Sphere, TDDL, Mycat.
1.6.2 Read‑Write Separation
When reads far exceed writes, replicate the master and direct reads to slaves to balance load.
1.7 Summary
The section lists common MySQL slow‑query causes and mitigation techniques, plus strategies for handling massive data volumes.
2. How to Evaluate ElasticSearch
ElasticSearch (ES) is a real‑time distributed search engine built on Lucene, suitable for full‑text search, JSON document storage, log monitoring, and analytics.
2.1 What It Can Do
ES excels at full‑text search, log analysis, and can serve as a NoSQL document store; often used with Logstash and Kibana (ELK stack).
2.2 ES Architecture
Before ES 7.0 the hierarchy was Index → Type → Document; after 7.0, Type is removed, leaving Index → Document. Useful CLI commands include GET /_cat/health?v&pretty, GET /_cat/shards?v, GET yourindex/_mapping, and GET yourindex/_settings.
2.3 Why ES Queries Are Fast
ES uses inverted indexes with a Term Dictionary and an in‑memory Term Index (FST) for rapid term lookup, reducing disk random I/O compared to MySQL.
2.3.1 Tokenized Search
After tokenization, ES can locate terms directly without full table scans.
2.3.2 Exact Search
For exact matches, the advantage diminishes; MySQL covering indexes may be faster.
2.4 When to Use ES
Full‑text search where MySQL pattern matching is inefficient.
Combined queries: store searchable fields in ES (with IDs) and full records in MySQL.
Hybrid architectures with HBase for massive write‑heavy workloads.
2.4.1 Full‑Text Search
ES handles fuzzy and phrase queries efficiently; Chinese requires appropriate analyzers (e.g., IK).
2.4.2 Combined Queries
Use ES for search‑heavy fields and MySQL for transactional data, or pair ES with HBase for massive scale.
2.5 Summary
ES is fast due to its inverted index and in‑memory term lookup; it is ideal for log analysis and full‑text search but not a universal replacement for relational queries.
3. HBase Overview
HBase stores data by column families rather than rows, making it suitable for write‑intensive workloads.
3.1 Storage Model
Unlike MySQL's row‑oriented tables, HBase's column‑family model allows sparse data and dynamic columns.
3.2 OLTP vs OLAP
Row‑oriented databases excel at OLTP; column‑oriented stores like HBase are better for OLAP‑style analytics, though HBase itself is not an OLAP engine.
3.3 RowKey Design
HBase supports only three query patterns: single row by RowKey, range scans, and full table scans; good RowKey design is critical.
3.4 Use Cases
HBase shines in write‑heavy scenarios with high reliability and no single point of failure, but queries are limited to RowKey‑based access.
4. Overall Conclusion
Software development should prioritize appropriate, maintainable solutions over flashy complexity; fixing underlying bugs before adding new features yields the best performance gains.
References
亿级流量系统架构之如何设计每秒十万查询的高并发架构
使用 ELK 搭建日志集中分析平台
MySQL和Lucene索引对比分析
HBASE 深入浅出
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
