Databases 8 min read

Why Storing 2 Billion Phone Numbers Requires VARCHAR(20) Over INT: Interview Insights

The article examines an interview question about storing 2 billion phone numbers, explains why int cannot hold 11‑digit numbers, advocates using VARCHAR(20) for flexibility and data integrity, and outlines common pitfalls and best practices for database design.

macrozheng
macrozheng
macrozheng
Why Storing 2 Billion Phone Numbers Requires VARCHAR(20) Over INT: Interview Insights

Introduction

A candidate was asked in an interview how to store 2 billion phone numbers and whether to use int or string, varchar or char, and why.

1. Problems with using int or BigInt

1.1 Can int hold an 11‑digit phone number?

In Java, int is 32‑bit with a maximum value of 2,147,483,647, which cannot store an 11‑digit phone number such as 13728199213. A 64‑bit long (BIGINT in MySQL) is required.

1.2 Data integrity

Storing a phone number as a long drops leading zeros (e.g., 01324567890 becomes 1324567890), breaking data integrity. Phone numbers may also contain country codes (+86) or hyphens, which cannot be represented by a numeric type.

1.3 Query difficulty

Searching for numbers that start with a certain prefix (e.g., 137) is inefficient with BIGINT because it requires conversion to a string before pattern matching.

2. Advantages of using VARCHAR

Preservation : Stores digits, symbols, and leading zeros exactly as entered.

Flexibility : Supports fuzzy queries, international numbers, and future extensions.

Convenience : Avoids overflow and format‑conversion issues.

<code>CREATE TABLE user_tab (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'User ID',
  phone_number VARCHAR(20) NOT NULL COMMENT 'Phone number',
  PRIMARY KEY (id),
  UNIQUE KEY idx_phone (phone_number)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='User table';
</code>

2.1 Why VARCHAR(20) instead of VARCHAR(11)

International numbers (+8613822223333), numbers with country code (008613822223333), and extensions (13822223333#123) exceed 11 characters, so VARCHAR(11) would cause errors. VARCHAR(20) leaves room for business expansion and data tolerance.

Reasonable length = basic requirement + international extension + tolerance buffer.

2.2 Extreme cases

If a phone number is pure digits and does not start with 0, BIGINT can be used, but INT should never be used.

3. Common pitfalls in daily development

3.1 Field length too short

Using VARCHAR(11) truncates numbers like +8613822223333.

Use VARCHAR(20) to accommodate international and extension numbers.

3.2 Character set and collation

utf8 cannot store emojis or special symbols; use utf8mb4 with utf8mb4_unicode_ci to support all Unicode characters.

3.3 Index design

Missing a unique index on the phone number leads to duplicate data.

<code>ALTER TABLE user ADD UNIQUE INDEX idx_phone (phone);
</code>

3.4 Data cleaning and validation

Input may contain spaces or hyphens; clean data before insertion and apply regex validation.

<code>// Strict validation (11 digits, no country code)
String regex = "^1(3[0-9]|4[579]|5[0-35-9]|6[2567]|7[0-8]|8[0-9]|9[0-35-9])\\d{8}$";
// Loose validation (allows optional country code)
String looseRegex = "^(\\+\\d{1,3})?1(3\\d|4[579]|5[0-35-9]|6[2567]|7[0-8]|8\\d|9[0-35-9])\\d{8}$";
</code>

3.5 Privacy and security

Store phone numbers encrypted or mask them in query results to protect user privacy.

Encrypt with AES or use built‑in DB encryption; display as 138****3333.
MySQLBIGINTDatabase Designphone number storageVARCHARinterview question
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.