Can Machine Learning Predict China’s Car License Lottery? Secrets in 13‑Digit IDs

This article investigates whether the 13‑digit user IDs used in Chinese car‑license lotteries are truly random, revealing how the ID generation, seed‑based selection, and hidden patterns—especially the influential seventh digit—affect outcomes, and demonstrates that simple linear models can achieve an AUC of around 0.8 in predicting winners, while also discussing the system’s opacity across major cities.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Can Machine Learning Predict China’s Car License Lottery? Secrets in 13‑Digit IDs

Is the 13‑digit user code really random?

What patterns and regularities exist in the lottery pool? Can we predict lottery results with machine learning?

Eligibility and winning multiplier

Having lottery eligibility is relatively lucky:

Beijing household registration

Work residence permit

Five years of social security contributions

Active military or armed police personnel in Beijing

Social security eligibility differs from home‑buying rules: the social security record may have month‑level gaps but not year‑level gaps, making the lottery threshold lower.

Every six lottery attempts double the winning probability, but if you fail to re‑apply on the official website after three attempts, the system stops auto‑lottery for you. Forgetting to re‑apply is common; the retained multiplier is kept, but if you win and do not purchase, the multiplier resets to zero.

How the lottery pool is generated

After registration, the system assigns a unique 13‑digit ID linked to the identity card; the ID remains the same even after cancellation or re‑registration. Once approved, you enter the lottery pool.

The pool is sorted in ascending order by the numeric value of the ID. When users have unequal winning probabilities, the ordering becomes slightly more complex.

Thus, once the IDs are fixed, the lottery pool is uniquely determined.

How random numbers are generated

The method for generating random numbers was explained in a previous article and is illustrated below:

The algorithm that seeds the random number generator is not publicly disclosed, so we cannot verify whether the numbers are truly random or pre‑defined.

Using a seed to run the lottery

Computer‑generated random numbers appear random but are deterministic when a seed is fixed. If no seed is specified, the current OS time is used, producing different sequences each run. When a specific seed is set, the sequence becomes deterministic.

With a known sequence we can pick winners from the pool in order (e.g., 1st, 8th, 32nd, …). The appendix provides pseudo‑code for the lottery program.

Can machine learning predict winners?

Because manual pattern discovery is difficult, we tried a simple linear model to see whether any features of the 13‑digit ID correlate with winning probability. The system diagram is shown below:

Each decimal digit can be expanded into a binary vector (e.g., 7 → 0,0,0,0,0,0,0,1,0,0). Thus the 13‑digit ID becomes a 130‑dimensional Boolean feature vector, with the win flag as the label.

From 4.72 million users we sampled 200 k for training and 200 k for testing, using logistic regression. The resulting ROC curve is shown below:

The curve indicates good predictive performance with an AUC close to 0.8, meaning the model can fairly reliably predict whether a specific number will win.

We visualized the weight distribution of the 130‑dimensional linear model, revealing a striking pattern: the seventh digit of the ID heavily influences the outcome.

When the seventh digit is 0, the winning probability is about 53 %, higher than the non‑winning probability. As the seventh digit increases, the chance drops dramatically; a digit of 5 corresponds to a win rate of roughly 1 in 2,000.

However, this global pattern does not translate to predictive power for a single lottery round; the linear model fails to forecast individual draws.

How the ID code is determined

The 13‑digit code is not random. Analysis shows the first four digits appear random, while digits 5 and 6 are always “10”. Digits 7‑13 form a 7‑digit auto‑increment primary key in the database, representing the registration order.

By the third issue of 2016, 5.72 million users had registered, but only 4.67 million participated in the lottery; about 1.05 million registered users never entered, likely due to failing eligibility checks.

How to make a specific number win (thought experiment)

Given the above mechanics, one could theoretically manipulate the seed or insert dummy entries to force certain positions to be selected. Three methods are described:

Method 1: Find a seed that generates a sequence covering the desired positions.

Method 2: Insert two invalid numbers before a target to shift it into a winning slot.

Method 3: Remove some numbers to change the ordering, noting that users who miss re‑application are automatically paused after three missed rounds.

Lottery practices in China’s five major cities

Beijing, Hangzhou, Tianjin, Shenzhen, and Guangzhou all use lottery systems; Shanghai and Hangzhou use bidding. All cities’ lottery systems are developed by the same company, sharing the 13‑digit user code format and similar UI designs.

Information disclosure varies: Beijing historically published winner names, but stopped after privacy concerns; Guangzhou only publishes winners without revealing participants or the random seed.

Interesting patterns discovered in the data

Pattern 1

IDs increase with registration time but are not continuous; gaps of 20 or more indicate many registrations that failed verification.

Pattern 2

Each year, thousands of “old” numbers participate for the first time, suggesting many users finally meet the five‑year social security requirement.

Pattern 3

About 2.1 million users have interrupted their lottery participation at some point, accounting for nearly half of all registrations.

Other observations

Controlling the first four random digits can place a newly generated ID at any desired position in the pool.

Calls for greater transparency

More openness would be beneficial, such as publishing the six‑digit random‑seed algorithm and the names of both winning and non‑winning participants.

Random‑seed generation algorithm

Names of all participants and winners

Privacy concerns are often cited, yet many cities already disclose winner names publicly.

Miscellaneous reflections

The license‑plate quota system, while intended to limit traffic, restricts resource flow and forces many to resort to underground methods or expensive rentals. Electric‑vehicle lotteries are less competitive, but quality vehicles remain scarce despite subsidies.

Appendix

1. Lottery program pseudo‑code (Python, illustrative)

slot_size = 9  # size of the lottery pool
seed = 954732  # seed
quota = 3
array = ['A','B','C',...]
random = Random(seed)
selected = []
while len(selected) < quota:
    index = random.next(slot_size)
    select = array[index]
    if select in selected:
        # already selected, skip
        continue
    print(selected)

2. ROC curve explanation

The ROC (Receiver Operating Characteristic) curve plots false‑positive rate versus true‑positive rate. The area under the curve (AUC) quantifies model performance; an AUC near 1 indicates excellent discrimination, while 0.5 corresponds to random guessing.

3. The mysterious seventh digit

The seventh digit (counting from zero) is the first digit of the auto‑increment primary key. Smaller numbers tend to have been registered earlier, receiving more lottery attempts and thus a higher chance of winning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

machine learningdata analysisID generationrandomnesscar license lottery
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.