Databases 12 min read

How to Query Phone Number Suffix in Milliseconds on 50 Million Records

When faced with 50 million user rows, using LIKE '%1234' triggers a full table scan, but adding a suffix column, reversing the phone number, leveraging function indexes, or integrating Elasticsearch can reduce query time from minutes to a few milliseconds, each with its own trade‑offs.

Java Companion
Java Companion
Java Companion
How to Query Phone Number Suffix in Milliseconds on 50 Million Records

Why LIKE '%1234' Fails

The naïve query forces a full table scan because the leading wildcard prevents the B+‑tree index from being used. An EXPLAIN shows type: ALL and an estimated rows: 50000000, leading to 30 seconds‑plus execution time and 100 % CPU usage.

Solution 1 – Redundant Suffix Column

Add a phone_suffix column that stores the last four digits, populate it, and create an index on it.

-- 1. Add column
ALTER TABLE users ADD COLUMN phone_suffix CHAR(4) NOT NULL DEFAULT '';
-- 2. Populate data
UPDATE users SET phone_suffix = RIGHT(phone_number, 4);
-- 3. Create index
CREATE INDEX idx_phone_suffix ON users(phone_suffix);

Query becomes an equality lookup:

SELECT * FROM users WHERE phone_suffix = '1234';
-- EXPLAIN shows type: ref, rows: ~5000

Performance drops from 30 seconds to about 5 ms (≈6000× speed‑up). The trade‑off is an extra 4 bytes per row (~200 MB for 50 M rows) and the need to keep the suffix column in sync with the phone number.

@Transactional
public void updatePhone(Long userId, String newPhone) {
    String suffix = newPhone.substring(newPhone.length() - 4);
    userMapper.updatePhoneAndSuffix(userId, newPhone, suffix);
}

Solution 2 – Store Reversed Phone Number

Insert a reversed_phone column containing the phone number reversed, index it, and query the reversed suffix as a prefix.

-- Add reversed column
ALTER TABLE users ADD COLUMN reversed_phone CHAR(11) NOT NULL DEFAULT '';
-- Populate
UPDATE users SET reversed_phone = REVERSE(phone_number);
-- Index
CREATE INDEX idx_reversed_phone ON users(reversed_phone);

Original suffix query LIKE '%5678' becomes:

SELECT * FROM users WHERE reversed_phone LIKE '8765%';

This supports variable‑length suffixes (e.g., 4‑, 5‑, 6‑digit searches) with a modest storage increase (11 bytes per row) and typical query times under 10 ms.

Solution 3 – Function Index (MySQL 8.0+ / PostgreSQL)

Create an index on an expression, e.g., REVERSE(phone_number), without adding a new column.

CREATE INDEX idx_phone_reverse_func ON users ((REVERSE(phone_number)));

Query using the same expression:

SELECT * FROM users WHERE REVERSE(phone_number) LIKE '8765%';

Works only on MySQL 8.0+ (or PostgreSQL) and requires the SQL to match the indexed expression exactly; otherwise the index is ignored.

Solution 4 – Introduce Elasticsearch

For complex, multi‑field fuzzy searches, offload the query to Elasticsearch, which uses inverted indexes.

Application → MySQL (writes) → Canal → Elasticsearch (reads)

Example ES mapping includes phone_suffix as a keyword field. Queries can combine terms, e.g., name and phone suffix. The downside is added infrastructure, data‑sync latency (1‑2 s), and maintenance overhead, so it’s recommended only for large‑scale or highly complex search scenarios.

Solution 5 – Sharding & Specialized Structures

When data exceeds 50 M rows, combine sharding (hash‑by‑suffix) with the redundant suffix column. For ultra‑low latency, use Redis bitmaps or Bloom filters to pre‑filter candidates.

Choosing the Right Approach

Based on data volume and query requirements:

< 1 M rows : Simple LIKE with pagination is acceptable.

1 M – 5 M rows, fixed‑length suffix: Redundant suffix column (Solution 1) gives the best performance.

Flexible suffix length: Reversed field method (Solution 2).

MySQL 8.0+ and desire for schema cleanliness: Function index (Solution 3).

> 50 M rows, single‑suffix queries: Sharding + suffix column.

Complex multi‑field fuzzy search: MySQL + Elasticsearch (Solution 4).

Real‑time ultra‑low latency: Redis bitmap / Bloom filter.

Monitoring

After any optimization, monitor slow‑query logs, index hit rates, and P95/P99 response times. Edge cases can still cause high tail latency if not accounted for.

Conclusion

The interview question tests understanding of index mechanics, space‑time trade‑offs, and architectural choices. Demonstrating the step‑by‑step reasoning—from why LIKE '%...' is bad, through multiple viable solutions, to selecting the most appropriate one—shows deep competence in database design and performance engineering.

indexingElasticsearchshardingQuery OptimizationMySQLdatabase design
Java Companion
Written by

Java Companion

A highly professional Java public account

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.