Databases 22 min read

Master MySQL Slow Queries, ElasticSearch, and HBase: Practical Performance Tips

This article explores why MySQL queries become slow, delves into index pitfalls and optimization techniques, then compares ElasticSearch and HBase architectures, offering practical guidance on when to use each technology and how to combine them for high‑performance data retrieval.

ITFLY8 Architecture Home

Feb 3, 2021

Master MySQL Slow Queries, ElasticSearch, and HBase: Practical Performance Tips

1. MySQL query slow experience?

Most internet applications are read‑heavy and write‑light; fast reads are essential. Various factors can cause a seemingly perfect query to become slow.

1.1 Index

When data volume is modest, many slow queries can be solved with proper indexes, but improper indexes are also a common cause of slowness.

MySQL indexes are based on B+ trees, a fact often memorized for interviews, leading to discussions about left‑most prefix indexes and tree structures.

Composite indexes follow the left‑most prefix rule; using them wisely can dramatically improve query speed because of index‑push‑down and covering indexes.

1.1.1 Reasons for index failure

Using !=, <>, OR, or functions on indexed columns in the WHERE clause.

LIKE statements with a leading %.

String literals without proper quoting.

Low‑cardinality columns (e.g., gender).

Not matching the left‑most prefix of a composite index.

1.1.2 Why these cause index failure

MySQL’s B+ tree cannot efficiently use the index when the above patterns break the ordered nature of the index.

Function operations

When a function is applied to an indexed column, e.g., where length(a) = 6, the optimizer cannot use the index because the value is computed at runtime.

Implicit conversion

Implicit type or character‑set conversions can also invalidate index usage.

Implicit type conversion is rare for frameworks like JOOQ.

Implicit character‑set conversion may appear in join queries when columns share a type but differ in encoding.

Breaking order

Operations such as leading‑% LIKE or unquoted strings may disrupt index ordering, causing MySQL to skip the index.

1.1.3 Why not index low‑cardinality fields like gender

Low‑cardinality fields provide little selectivity; indexing them often yields no performance gain and can even be slower.

For non‑clustered indexes, a query on a gender field may require scanning many rows after the index lookup, making a full table scan more efficient.

1.1.4 Simple and effective indexing methods

Index push‑down: use composite indexes to let the engine evaluate additional conditions within the index.

Covering index: store all needed columns in the index to avoid table lookups.

Prefix index: index only the first N characters of a string to reduce index size.

Avoid functions on indexed columns.

Consider maintenance cost for frequently updated columns.

1.1.5 Evaluating a wrong index choice

Sometimes an index looks correct but the optimizer picks a low‑selectivity one, leading to excessive scans.

Inaccurate statistics – run ANALYZE TABLE x to refresh.

Optimizer mis‑prediction – use FORCE INDEX or rewrite the query.

1.2 MDL lock

MySQL 5.5 introduced Metadata Locks (MDL). CRUD operations acquire a read MDL; schema changes acquire a write MDL, and they are mutually exclusive.

1.3 Flush

Flush commands can be blocked by other statements, causing queries to wait. Use SHOW PROCESSLIST to see Waiting for table flush status.

1.4 Row lock

A transaction holding a write lock without committing can block other operations.

1.5 Current read

InnoDB’s default isolation is REPEATABLE READ. A transaction may need to walk the undo log to see a consistent snapshot.

1.6 Large‑table scenarios

Tables with billions of rows face I/O and CPU bottlenecks even with good indexing. InnoDB stores each B+‑tree node in 16 KB pages, typically three levels deep. Buffer pool pressure can cause hot data eviction.

1.6.1 Sharding

Solution

Choose sharding based on the bottleneck: IO‑bound workloads benefit from database‑level sharding (vertical), while CPU‑bound workloads benefit from horizontal table sharding.

If disk or network I/O is the bottleneck, apply database and vertical table sharding.

If query latency is the bottleneck, apply horizontal table sharding.

Horizontal sharding distributes rows across many tables; vertical sharding splits columns into separate tables.

Issues

Unique ID generation, non‑partition‑key queries, and scaling strategies need careful planning.

Various ID strategies: auto‑increment, Snowflake, segment, GUID, etc.

Non‑partition‑key queries can be handled via mapping tables or secondary indexes.

Scaling depends on the sharding algorithm; range‑based sharding eases data migration compared to random modulo.

1.6.2 Read‑write separation

Why read‑write separation

When read traffic far exceeds write traffic, a master‑slave setup can distribute reads, improve availability, and balance load.

Problems

Typical issues include replication lag (stale reads) and routing logic for directing queries to master or slave.

Stale reads caused by master‑slave delay.

Routing decisions can be handled in application code or middleware.

1.7 Summary

The above enumerates common MySQL slow‑query causes and remedies, and introduces typical solutions for large‑scale data such as sharding and read‑write separation.

2. How to evaluate ElasticSearch

Beyond MySQL, full‑text search and log analysis often benefit from ElasticSearch (ES).

2.1 What ES can do

ES is a near‑real‑time distributed search engine built on Lucene, suitable for full‑text search, JSON document storage, log monitoring, and data analytics.

2.2 ES structure

Before ES 7.0 the hierarchy was Index → Type → Document (similar to database → table → row). Types were removed in 7.0; think of an index as a table.

GET /_cat/health?v&pretty – cluster health.

GET /_cat/shards?v – shard status.

GET yourindex/_mapping – index mapping (schema).

GET yourindex/_settings – index settings (shard count, replicas).

GET /_cat/indices?v – list all indices.

Mapping defines field types; settings control shard and replica numbers.

GET yourIndex/_search
{
  "from": 0,
  "size": 10,
  "query": {
    "match_phrase": {
      "log": "xxx"
    }
  }
}

The query uses match_phrase to return documents containing the exact phrase.

2.3 Why ES is fast

ES relies on inverted indexes. Terms are stored in a Term Dictionary, and a Term Index (in‑memory FST) quickly locates dictionary entries.

The Term Index reduces disk random access, making term lookup very fast. However, ES excels mainly for tokenized searches; exact match queries may not outperform a well‑indexed MySQL query.

2.3.1 Tokenized search

Because ES indexes tokenized terms, a search for "Ada" can be resolved without a full scan, unlike MySQL's %da% pattern.

2.3.2 Exact search

For exact matches, the advantage diminishes as the Term Index adds an extra lookup step.

2.4 When to use ES

2.4.1 Full‑text search

Keyword‑based fuzzy searches are inefficient in MySQL but trivial in ES. For example, searching chat messages.

Tokenization

Chinese analysis requires a tokenizer like IK; otherwise phrase queries may return only exact matches.

POST yourindex/_analyze
{
  "field": "yourfield",
  "text": "我可真是个机灵鬼"
}

2.4.2 Combined queries

ES + MySQL

Store searchable fields and IDs in ES for fast tokenized search, while keeping full records in MySQL for transactional integrity.

ES + HBASE

For massive write‑heavy workloads, use HBase as the primary store and ES as a secondary index layer.

Both patterns separate indexing from data storage, but introduce challenges such as data sync, mapping design, and high availability.

2.5 Summary

ES achieves speed through inverted indexes and in‑memory term lookup, making it ideal for full‑text and log queries, while still requiring careful integration with relational stores for complete solutions.

3. HBASE

HBase is a column‑oriented NoSQL store designed for write‑intensive workloads.

3.1 Storage structure

Unlike row‑oriented MySQL, HBase stores data by column families. Each row is identified by a RowKey (sorted lexicographically) and can have multiple versions (timestamps). Columns belong to families such as info or area, and cells hold the actual values.

3.2 OLTP and OLAP

OLTP: traditional relational databases for routine transactional processing.

OLAP: data‑warehouse systems for complex analytical queries.

Column‑oriented stores excel at OLAP, while row‑oriented stores suit OLTP. HBase is not an OLAP engine; it lacks transactions and is primarily used for write‑heavy scenarios.

3.3 RowKey design

Effective HBase schema hinges on a well‑designed RowKey because HBase only supports three query patterns: single‑row lookup, range scan, and full table scan.

3.4 Use cases

HBase shines in write‑intensive applications where fast ingestion is critical. Point queries or small scans are acceptable, but complex ad‑hoc queries are not supported.

4. Summary

Software development should be incremental; technology must serve the project, and simplicity often beats novelty.

To achieve fast queries, first eliminate bugs, then explore optimizations. The solutions discussed—MySQL indexing, sharding, read‑write separation, ElasticSearch integration, and HBase usage—each involve detailed trade‑offs that engineers must address in practice.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Indexing Elasticsearch MySQL HBase Database Performance

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.