Databases 23 min read

Understanding MySQL Slow Queries and When to Use ElasticSearch or HBase

This article explores the common causes of MySQL slow queries, explains index usage and pitfalls, introduces ElasticSearch architecture and its fast search capabilities, and outlines HBase's column‑family storage model and suitable use cases for large‑scale data systems.

Java Interview Crash Guide

May 28, 2021

Understanding MySQL Slow Queries and When to Use ElasticSearch or HBase

1. MySQL Query Slow Experience?

Most internet applications are read‑heavy and write‑light, so fast reads are essential. Various factors can cause a seemingly perfect slow query.

1.1 Index

When data volume is modest, most slow queries can be solved with proper indexes; many slow queries arise from poorly designed indexes.

MySQL indexes are based on B+ trees, leading to discussions about left‑most prefix indexes, B+ trees, and other tree structures.

The left‑most prefix rule governs composite index usage; a well‑designed composite index can significantly improve query speed because of index‑pushdown.

If the query conditions are fully covered by a composite index (a,b), MySQL can evaluate the second column within the index, reducing the need for table lookups.

When the queried columns are included in the composite index, it becomes a covering index, eliminating the need for a table lookup.

1.1.1 Reasons Indexes Fail

Indexes may be built but remain unused, leading to slow queries. Common reasons for index failure include:

Using !=, <>, OR, or functions on the left side of a WHERE clause.

LIKE patterns that start with %.

String literals without quotes.

Low‑cardinality index fields, such as gender.

Not matching the leftmost prefix.

1.1.2 Why These Cause Index Failure

MySQL attributes the failure to the B+ tree structure.

Function Operations

When a function or expression is used on the indexed column, e.g., where length(a) = 6, the index cannot be used effectively.

Implicit Conversion

Implicit type or character set conversions can also break index usage.

Implicit type conversion rarely occurs with frameworks like JOOQ.

Implicit character set conversion may appear in join queries when column types match but encodings differ.

Disrupting Order

LIKE patterns starting with % or unquoted strings can disrupt index ordering, causing MySQL to skip the index.

1.1.3 Why Not Index Low‑Cardinality Fields

Low‑cardinality fields are not indexed because the overhead outweighs the benefit; scanning the index may be as costly as a full table scan.

For non‑clustered indexes, a full table scan can be cheaper than using an index on a field like gender.

1.1.4 Simple Indexing Practices

Index push‑down: use composite indexes for multi‑condition queries.

Covering index: keep all needed columns within the index to avoid table lookups.

Prefix index: index only the first N characters of a string.

Avoid functions on indexed columns.

Consider maintenance cost for frequently updated columns.

1.1.5 Evaluating Wrong Index Choices

Sometimes an index looks correct but query performance remains poor because MySQL selects a low‑cardinality index.

Two main reasons:

Out‑of‑date statistics – run ANALYZE TABLE x to refresh.

Optimizer mis‑prediction – use FORCE INDEX or adjust the query to guide the optimizer.

1.2 MDL Locks

MySQL 5.5 introduced Metadata Locks (MDL). CRUD operations acquire a read MDL lock; schema changes acquire a write MDL lock, and they are mutually exclusive.

When a statement holds a write MDL lock, read MDL locks are blocked. Use SHOW PROCESSLIST to see statements in Waiting for table metadata lock state.

1.3 Flush Waits

Flush commands can be blocked by other statements, causing queries to wait. SHOW PROCESSLIST reveals Waiting for table flush status.

1.4 Row Locks

A transaction holding a write lock that hasn't been committed can cause other sessions to wait.

1.5 Current Read

InnoDB’s default isolation level is REPEATABLE READ. If transaction B commits before transaction A reads, A must apply undo logs to see the state before B’s commit.

1.6 Large‑Table Scenarios

Tables with billions of rows can still suffer I/O or CPU bottlenecks despite good indexing. InnoDB stores each B+ tree node in 16 KB pages, typically three levels deep for ~2 million rows.

When the buffer pool cannot hold all index data, cache hit rates drop, and LRU eviction may push hot data out.

Common solutions for large tables are sharding (database/table splitting) and read‑write separation.

1.6.1 Sharding

Approach

Choose vertical or horizontal sharding based on the bottleneck:

IO bottleneck → vertical sharding (different databases or tables per business domain).

CPU bottleneck → horizontal sharding (split rows across tables).

Horizontal sharding distributes data across many tables; vertical sharding splits by business or column families.

Tools include Sharding‑Sphere, TDDL, Mycat. Implementation requires key selection, rule definition, development, data migration, and scaling considerations.

Challenges

Unique ID generation (auto‑increment, Snowflake, segment, GUID, etc.).

Non‑partition‑key queries – often solved with mapping tables and covering indexes.

Scaling – range‑based sharding is simple; hash‑based sharding may require data migration.

Cross‑shard joins and transaction consistency are additional concerns.

1.6.2 Read‑Write Separation

Why Separate Reads and Writes

When read traffic far exceeds write traffic, a master‑slave setup can distribute read load, improve availability, and achieve load balancing.

Issues

Stale reads due to replication lag.

Routing – deciding whether a query goes to master or slave, often handled in code or middleware.

1.7 Summary

The section lists common MySQL slow‑query causes and mitigation methods, and introduces typical solutions for large‑scale data scenarios.

2. How to Evaluate ElasticSearch

Earlier we mentioned using ES for keyword searches; now we discuss ES itself.

2.1 What ES Can Do

ElasticSearch, built on Lucene, provides near‑real‑time distributed search, full‑text search, NoSQL JSON document storage, log monitoring, and data analytics.

Typical use cases include full‑text retrieval and log analysis, often paired with Logstash and Kibana (the ELK stack).

Example Kibana Discover query translates to the following Dev Tools request:

GET yourIndex/_search
{
  "from": 0,
  "size": 10,
  "query": {
    "match_phrase": {
      "log": "xxx"
    }
  }
}

The match_phrase query returns documents containing the exact phrase.

2.2 ES Architecture

Before ES 7.0 the hierarchy was Index → Type → Document (similar to database → table → row). Types were removed; think of an index as a table.

Key commands to inspect a cluster include:

GET /_cat/health?v&pretty – cluster health.

GET /_cat/shards?v – shard status.

GET yourindex/_mapping – mapping definition.

GET yourindex/_settings – index settings.

GET /_cat/indices?v – list of indices.

Mapping defines field types; settings control shard and replica counts.

Example mapping snippet (strings become text with a keyword sub‑field):

"******": {
  "mappings": {
    "doc": {
      "properties": {
        "appname": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

2.3 Why ES Is Fast

ES uses inverted indexes. Instead of indexing by document ID, it indexes terms and stores posting lists of document IDs.

ES adds a Term Index (stored in memory as an FST) on top of the Term Dictionary, enabling rapid term lookup.

Thus, term‑based searches are much faster than MySQL full‑table scans.

2.3.1 Tokenized Search

Because ES stores tokenized terms, a query like %da% in MySQL becomes a fast term lookup in ES.

2.3.2 Exact Search

For exact matches, the advantage diminishes; MySQL covering indexes may be comparable.

2.4 When to Use ES

2.4.1 Full‑Text Search

MySQL LIKE queries are inefficient for large text fields; ES handles them easily, e.g., searching chat messages.

For Chinese, a proper analyzer (e.g., IK) is required to avoid poor tokenization.

Analyzing Tokens

POST yourindex/_analyze
{
  "field": "yourfield",
  "text": "我可真是个机灵鬼"
}

2.4.2 Hybrid Queries

When data volume is huge, indexing every field in ES is impractical. A common pattern is:

Store searchable fields and document ID in ES (with tokenization).

Keep full records in MySQL and retrieve them by ID.

Alternatively, combine ES with HBase for massive write‑heavy workloads.

These patterns illustrate the classic "index‑data separation" architecture.

2.5 Summary

ES achieves speed through inverted indexes and in‑memory term lookup, making it ideal for full‑text and large‑scale search scenarios.

Integration with SpringBoot is straightforward; the necessary dependencies enable full CRUD support.

3. HBASE

HBase stores data by column families rather than rows.

3.1 Storage Model

Relational databases like MySQL are row‑oriented; HBase is column‑oriented (column families).

RowKey is the primary key sorted lexicographically; Timestamp serves as version number. Column families (e.g., info, area) group columns, which can be added dynamically.

3.2 OLTP vs OLAP

OLTP (online transaction processing) suits row‑oriented relational databases; OLAP (online analytical processing) suits column‑oriented systems. HBase is column‑oriented but does not provide full OLAP features like transactions.

3.3 RowKey Design

HBase supports only three query patterns: single‑row lookup by RowKey, range scans by RowKey, and full table scans. Hence, good RowKey design is critical.

3.4 Use Cases

HBase excels in write‑heavy scenarios with fast ingestion and acceptable point‑lookup performance. It offers high reliability and no single point of failure.

4. Conclusion

Software development should progress step by step; technology must serve the project, and suitability outweighs novelty.

To achieve fast queries, first locate and fix bugs, then address higher‑level architectural concerns.

Each solution—MySQL sharding, ES integration, or HBase adoption—brings its own complexities that engineers must tackle through practice.

References

亿级流量系统架构之如何设计每秒十万查询的高并发架构

使用 ELK 搭建日志集中分析平台

MySQL和Lucene索引对比分析

HBASE 深入浅出

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Indexing MySQL HBase database optimization

Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.