Designing Scalable Login Schemas for Billions of Users
This article explains how to design a flexible, extensible database schema and login flow for a system with a billion users, covering multi‑credential handling, sharding strategies, hash‑based routing, and practical considerations such as password updates and caching.
In an interview scenario, a candidate is asked to design a table structure and login process for a system with a billion users who can log in via phone number, email, nickname, or third‑party accounts.
Key assessment points are the candidate's experience with similar business logic and solid database design skills, as well as understanding of high‑concurrency sharding (分库分表).
Table Structure Design
When multiple login methods exist, a user may have several associated accounts. A naïve design adds a column for each credential (e.g., phone, email, nick_name), resulting in a table like: id | name | phone | email | nick_name | desc This approach has two major drawbacks:
Login logic must check which column to query based on the credential type, making code complex.
Adding a new login method requires altering the table schema and updating code, increasing the risk of bugs.
A more flexible solution is to store login credentials in a separate authorization table, where each credential is a row rather than a column. The user table then holds only non‑login related information.
Example schema:
The user table stores basic profile data:
Sample data for the authorization table (Author):
id user_name type passwd user_id
10001 siyuanwai 01 xxxxx 1
10002 [email protected] 02 xxxxx 1Sample data for the user table (User):
id nick_name logo_url user_number user_names
1 四猿外 /pic/xyz.png xxxxx siyuanwai,[email protected]Note: Changing a password requires updating both rows, which must be handled carefully.
Login Process for a Billion‑User System
With a billion users, high concurrency and large data volume demand sharding. Assuming a MySQL instance can reliably handle ~1,000 TPS and a table up to ~10 million rows, we aim to keep each shard below ten million rows, resulting in roughly 100 tables.
Distribute these tables across 10 databases (10 tables per DB) to stay well within capacity.
Sharding strategy:
Use user_id modulo operations to determine database and table. For example, user_id % 10 selects the database, and user_id % 100 selects the table.
Ensure the number of databases and tables are co‑prime (e.g., 11 databases and 100 tables) to avoid uneven distribution.
For the authorization table, which stores login credentials, sharding is based on a hash of user_name:
db_index = hash(user_name) % 11
table_index = hash(user_name) % 100Login flow:
User enters credential (e.g., email) and password.
System determines the credential type.
Information is sent to the server.
A routing layer computes hash(credential) % 11 and % 100 to locate the correct database and table.
The server queries the authorization table to verify the password.
In real‑world scenarios, additional layers such as external caches (Redis) and search indexes (Elasticsearch) are often added to store frequently accessed user metadata and routing information, reducing load on the sharded databases.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
