Databases 8 min read

Choosing the Right Data Type for Storing Billions of Phone Numbers: INT vs VARCHAR

This article examines why storing 2 billion phone numbers using INT or BIGINT is problematic and argues for using VARCHAR(20) to preserve data integrity, support international formats, simplify queries, and avoid common pitfalls in database design and development.

IT Services Circle
IT Services Circle
IT Services Circle
Choosing the Right Data Type for Storing Billions of Phone Numbers: INT vs VARCHAR

1. Problems with INT/BIGINT for Phone Numbers

Phone numbers are 11‑digit strings (e.g., 13728199213 ). In Java, int is a 32‑bit signed integer with a maximum value of 2,147,483,647, which cannot hold an 11‑digit phone number, so a 64‑bit Long (or database BIGINT ) is required.

However, using Long loses leading zeros and cannot store international prefixes (+86) or symbols (e.g., 137-2819-9213 ), breaking data integrity.

Long phoneNumber =01324567890L; //编译报错,Java不允许前导0的Long整数

Querying by prefix (e.g., numbers starting with 137 ) becomes inefficient because a BIGINT must be cast to a string for fuzzy matching.

2. Advantages of Using VARCHAR

Fidelity : Stores digits, symbols, and leading zeros exactly as entered.

Flexibility : Supports fuzzy queries, international numbers, and future extensions without schema changes.

Ease of Use : No overflow or format‑conversion concerns.

CREATE TABLE user_tab (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '用户ID',
  phone_number VARCHAR(20) NOT NULL COMMENT '手机号',
  PRIMARY KEY (id),
  UNIQUE KEY idx_phone (phone_number)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='用户表';

The interviewer's hidden focus is on business scalability, data fault‑tolerance, and comprehensive problem‑solving. Explaining why VARCHAR(20) is preferred over VARCHAR(11) demonstrates awareness of international formats, extensions (e.g., landlines, virtual numbers), and the need to reserve field length for future changes.

2.1 Why Not VARCHAR(11)

Scenarios such as international numbers (+8613822223333, 14 digits), numbers with country codes (008613822223333, 15 digits), or extensions (13822223333#123) exceed 11 characters and would cause errors.

2.2 Extreme Cases

If a phone number is purely numeric and does not start with 0, BIGINT can be used, but INT should never be used.

3. Common Development Pitfalls

Designing field length too short (e.g., VARCHAR(11) ) leads to truncation of international or extended numbers.

Using an inadequate character set; utf8mb4 with utf8mb4_unicode_ci supports all Unicode symbols.

Missing unique index on phone numbers causes duplicate records. ALTER TABLE user ADD UNIQUE INDEX idx_phone (phone);

Skipping data cleaning and validation results in inconsistent formats (e.g., spaces, hyphens).

Storing phone numbers in plain text exposes user privacy; encryption or masking is recommended.

// Strict validation (11‑digit pure number, no country code)
String regex = "^1(3[0-9]|4[579]|5[0-35-9]|6[2567]|7[0-8]|8[0-9]|9[0-35-9])\\d{8}$";
// Loose validation (allows optional international code)
String looseRegex = "^(\\+\\d{1,3})?1(3\\d|4[579]|5[0-35-9]|6[2567]|7[0-8]|8\\d|9[0-35-9])\\d{8}$";

Overall, choosing VARCHAR(20) balances storage cost (≈18 GB for 2 billion rows) with flexibility, ensuring data integrity, scalability, and ease of maintenance.

SQLphone number storageData Integritydatabase schemaVARCHAR vs BIGINT
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.