Databases 20 min read

Understanding and Optimizing Fast Queries: MySQL Indexes, ElasticSearch, and HBase

This article explains why MySQL queries become slow, how proper index design, MDL locks, sharding, read‑write separation, and the use of ElasticSearch or HBase can improve query performance in large‑scale systems, and provides practical tips and code examples for each technique.

Top Architect
Top Architect
Top Architect
Understanding and Optimizing Fast Queries: MySQL Indexes, ElasticSearch, and HBase

1. MySQL Slow Query Experience

Most internet applications are read‑heavy, so query speed is critical; many slow queries stem from index misuse or missing indexes.

1.1 Index

MySQL indexes are B+ trees; using left‑most prefix and index push‑down can dramatically speed up queries. A covering index avoids the need to read the table rows.

1.1.1 Causes of Index Failure

Using !=, <>, OR, functions, or expressions on indexed columns.

LIKE patterns that start with %.

Missing quotes around string literals.

Low‑cardinality columns (e.g., gender) that provide little selectivity.

Not matching the left‑most prefix of a composite index.

1.1.2 Why These Cause Failure

Functions and implicit type/character‑set conversions break the ordered nature of the index, causing the optimizer to skip it.

1.1.3 Low‑Cardinality Columns

Indexes on columns with low selectivity often cost more than they save; InnoDB may ignore them when they cover >30% of rows.

1.1.4 Simple Index Strategies

Index push‑down: use composite indexes for multi‑condition queries.

Covering index: keep all needed columns in the index.

Prefix index: index only the first N characters of a string.

Avoid functions on indexed columns.

Consider maintenance cost for frequently updated columns.

1.1.5 When MySQL Chooses the Wrong Index

If statistics are stale, run ANALYZE TABLE x; if the optimizer mis‑chooses, use FORCE INDEX or rewrite the query.

1.2 MDL Locks

MySQL 5.5 introduced metadata locks (MDL). A write lock blocks read locks; use SHOW PROCESSLIST to see sessions waiting for "Waiting for table metadata lock".

1.3 Flush Waits

Flush commands can be blocked by other statements; monitor with SHOW PROCESSLIST for "Waiting for table flush".

1.4 Row Locks

Uncommitted write locks cause other transactions to wait.

1.5 Current Read (Repeatable Read)

InnoDB's default isolation reads undo logs to present a consistent snapshot.

1.6 Large Table Scenarios

For tables with billions of rows, even with good indexes, aggregation can hit I/O or CPU bottlenecks; consider sharding (horizontal) or vertical partitioning and read‑write separation.

1.6.1 Sharding

Choose database‑level sharding for I/O bottlenecks and table‑level sharding for CPU bottlenecks; tools include Sharding‑Sphere, TDDL, Mycat.

1.6.2 Read‑Write Separation

When reads far exceed writes, replicate the master and direct reads to slaves to balance load.

1.7 Summary

The section lists common MySQL slow‑query causes and mitigation techniques, plus strategies for handling massive data volumes.

2. How to Evaluate ElasticSearch

ElasticSearch (ES) is a real‑time distributed search engine built on Lucene, suitable for full‑text search, JSON document storage, log monitoring, and analytics.

2.1 What It Can Do

ES excels at full‑text search, log analysis, and can serve as a NoSQL document store; often used with Logstash and Kibana (ELK stack).

2.2 ES Architecture

Before ES 7.0 the hierarchy was Index → Type → Document; after 7.0, Type is removed, leaving Index → Document. Useful CLI commands include GET /_cat/health?v&pretty, GET /_cat/shards?v, GET yourindex/_mapping, and GET yourindex/_settings.

2.3 Why ES Queries Are Fast

ES uses inverted indexes with a Term Dictionary and an in‑memory Term Index (FST) for rapid term lookup, reducing disk random I/O compared to MySQL.

2.3.1 Tokenized Search

After tokenization, ES can locate terms directly without full table scans.

2.3.2 Exact Search

For exact matches, the advantage diminishes; MySQL covering indexes may be faster.

2.4 When to Use ES

Full‑text search where MySQL pattern matching is inefficient.

Combined queries: store searchable fields in ES (with IDs) and full records in MySQL.

Hybrid architectures with HBase for massive write‑heavy workloads.

2.4.1 Full‑Text Search

ES handles fuzzy and phrase queries efficiently; Chinese requires appropriate analyzers (e.g., IK).

2.4.2 Combined Queries

Use ES for search‑heavy fields and MySQL for transactional data, or pair ES with HBase for massive scale.

2.5 Summary

ES is fast due to its inverted index and in‑memory term lookup; it is ideal for log analysis and full‑text search but not a universal replacement for relational queries.

3. HBase Overview

HBase stores data by column families rather than rows, making it suitable for write‑intensive workloads.

3.1 Storage Model

Unlike MySQL's row‑oriented tables, HBase's column‑family model allows sparse data and dynamic columns.

3.2 OLTP vs OLAP

Row‑oriented databases excel at OLTP; column‑oriented stores like HBase are better for OLAP‑style analytics, though HBase itself is not an OLAP engine.

3.3 RowKey Design

HBase supports only three query patterns: single row by RowKey, range scans, and full table scans; good RowKey design is critical.

3.4 Use Cases

HBase shines in write‑heavy scenarios with high reliability and no single point of failure, but queries are limited to RowKey‑based access.

4. Overall Conclusion

Software development should prioritize appropriate, maintainable solutions over flashy complexity; fixing underlying bugs before adding new features yields the best performance gains.

References

亿级流量系统架构之如何设计每秒十万查询的高并发架构

使用 ELK 搭建日志集中分析平台

MySQL和Lucene索引对比分析

HBASE 深入浅出

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

indexingHBaseDatabase Optimizationquery-performance
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.