Databases 9 min read

Why Storing Phone Numbers as VARCHAR Beats BIGINT in Billion‑Row Databases

This article explains why phone numbers should be stored as VARCHAR(15~20) rather than INT/BIGINT, covering underlying principles, performance tests, internationalization, security, large‑scale optimization techniques, and a concrete best‑practice checklist for enterprise systems.

Ray's Galactic Tech
Ray's Galactic Tech
Ray's Galactic Tech
Why Storing Phone Numbers as VARCHAR Beats BIGINT in Billion‑Row Databases

Background and Core Conclusion

When designing a user table, developers often debate whether to store phone numbers as BIGINT for space savings or as VARCHAR for flexibility. The article demonstrates that, especially at scales of tens of millions to billions of rows, the correct choice is VARCHAR(15~20) , not an integer type.

Why a Phone Number Is Not a Numeric Value

Numeric fields imply arithmetic participation, sortable mathematical meaning, and compact storage. Phone numbers, however, are identifiers with format constraints (leading zeros, country codes) and never participate in calculations. Storing them as numbers leads to:

Loss of leading zeros (e.g., 013811122233 becomes 13811122233).

Inability to store international formats containing ‘+’ or variable length.

Inapplicability of LIKE or prefix queries.

MySQL’s internal normalization that strips formatting, turning a semantically rich string into a plain number.

Why VARCHAR(15~20) Is the Optimal Solution

Preserves the exact format, including leading zeros and optional "+86" international prefix.

Complies with the E.164 global standard (≤15 digits, optional country code).

Supports flexible queries: exact match, prefix LIKE '138%', and fuzzy matching.

Storage overhead is negligible at massive scale: BIGINT uses 8 bytes per row (≈16 GB for 2 billion rows) while VARCHAR(11~15) uses 12‑16 bytes (≈24‑32 GB), a difference that is trivial in TB‑level databases.

Performance Analysis

Benchmarks show VARCHAR(11) vs BIGINT query speed differences of less than 1 %. The dominant factors for performance are index design, sharding strategy, table schema bloat, and query patterns—not the column type itself.

Optimizations for 20‑Billion‑Row Datasets

Sharding / Partitioning

Two common strategies:

Prefix‑based sharding (e.g., 130‑139 → user_1, 150‑159 → user_2, 180‑189 → user_3).

Hash‑based sharding:

table_index = hash(mobile) % 64;

Index Enhancements

Single‑column index: CREATE INDEX idx_mobile ON users(mobile); High‑selectivity prefix index (first 7 digits):

CREATE INDEX idx_mobile_prefix ON users(mobile(7));

Hash Field for Massive Concurrency

Adding a mobile_hash CHAR(32) column (e.g., MD5 of the number) reduces index length and improves QPS:

ALTER TABLE users ADD mobile_hash CHAR(32);
mobile_hash = MD5(mobile);
SELECT * FROM users WHERE mobile_hash = MD5('13800138000');

Pre‑Insert Normalization Workflow

Remove noise characters: mobile = mobile.replaceAll("[\s\-()]", ""); Strip country code to a unified 11‑digit domestic format:

if (mobile.startsWith("+86")) mobile = mobile.substring(3);
if (mobile.startsWith("86") && mobile.length() > 11) mobile = mobile.substring(2);

Validate pattern:

mobile.matches("^1[3-9]\d{9}$");

Security and Compliance Requirements

Masking for display:

SELECT CONCAT(LEFT(mobile,3),'****',RIGHT(mobile,4)) AS mobile_mask FROM users;

Avoid logging plain numbers.

Optionally encrypt the column with AES when business demands.

Uniqueness Considerations

Whether the mobile column should be unique depends on the business scenario (login accounts usually require uniqueness, while shared or family accounts may not). The DDL for a unique constraint is:

UNIQUE KEY idx_mobile_unique (mobile);

Common Pitfalls (Checklist)

Using BIGINT leads to loss of leading zeros, no international support, and cumbersome queries.

Relying on LIKE '%xxx' disables index usage.

Inconsistent formatting before storage causes query mismatches.

Treating the phone field as an integer creates type mismatches across the stack.

Logging plain numbers violates GDPR and local data‑protection laws.

Final Best‑Practice Specification

Column type: VARCHAR(20) (recommended).

Store pure digits without spaces, signs, or country code.

Create a single‑column index idx_mobile.

For >20 billion rows, add a mobile_hash field to accelerate lookups.

Normalize and validate format before insertion.

Uniqueness is business‑driven.

Prefer exact match queries; avoid fuzzy patterns.

Always mask numbers in UI and logs.

Adopting this guideline ensures semantic correctness, international compatibility, security compliance, query flexibility, and scalability for systems ranging from millions to tens of billions of rows.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Database designphone number storagedata normalizationsecurity complianceVARCHAR vs BIGINTMySQL indexing
Ray's Galactic Tech
Written by

Ray's Galactic Tech

Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.