Why LLMs Pick 7200–7500: The Hidden Bias Behind Their ‘Random’ Numbers
A Reddit experiment showed that major LLMs consistently choose numbers between 7200 and 7500 when asked to pick a random value from 1 to 10000, revealing deterministic patterns rooted in training data bias, human heuristics, and serious security implications.
Experiment Overview
A Reddit user bet their house that any mainstream LLM asked to pick a random number between 1 and 10,000 would almost always return a value in the 7,200–7,500 range. Hundreds of participants confirmed the prediction, with results like 7428, 7284, 7342, and so on.
Observed Patterns
When the range is limited to 1–10, the models overwhelmingly output 7 (≈90%+).
When no range is given, the most frequent digit is 7 .
The chosen numbers are permutations of the digits [7, 2, 4, 8] .
Model Benchmarks
Independent testing on sanand0.github.io measured several models:
GPT‑4o : 7 appears in 92% of 1–10 trials; 37 appears in the 1–100 range.
Claude 3.5 Sonnet : 7 appears in 90% of 1–10 trials; 37 in 1–100.
Gemini 2.0 Flash : 7 appears in 100% of 1–10 trials; 47 in 1–100.
GPT‑3.5 Turbo : 7 dominates 1–10; 47 in 1–100.
Claude 3 Haiku : No fixed 1–10 digit; 42 appears in 1–100 (cultural meme).
Gemini’s 100% rate for 7 in the 1–10 range demonstrates a deterministic clustering rather than true randomness.
Human Randomness Bias
Humans are notoriously poor random number generators. Large‑scale surveys (e.g., Veritasium’s 200,000‑person study) show a strong preference for the number 37, which is a prime, not a multiple of 5 or 10, and feels “interesting.” Similar heuristics extrapolated to the 1–10,000 range produce the 7,200–7,500 cluster:
Thousands digit = 7 (perceived randomness).
Value is above the median, feeling “larger” and thus “more random.”
Avoids the extreme 10,000, preventing an obvious pattern.
These human tendencies are encoded in LLM training data, causing the models to replicate them.
Why LLMs Are Not Truly Random
LLMs are fundamentally deterministic functions (with optional stochastic sampling). Given a prompt, they compute token probabilities based solely on patterns learned during training; they contain no internal entropy source such as os.urandom(). When asked to “pick a random number,” they generate the most statistically likely continuation based on millions of human examples.
"LLM is merely reproducing human cognitive biases." – arXiv:2502.19965 (2025)
Recent work (arXiv:2406.00092, 2024) quantifies that LLMs are twice as far from true randomness as humans are.
Increasing the temperature flattens the probability distribution but does not eliminate the learned bias toward the 7,200–7,500 region.
Security Implications
When LLMs are used to generate passwords, the same bias leads to dangerously predictable results. Empirical data (2026, Bruce Schneier) shows:
Claude Opus 4.6 produced only 30 unique passwords in 50 attempts, with one password appearing 18 times (36% hit rate).
GPT‑5.2 generated passwords almost always starting with the letter “v.”
Gemini 3 Flash consistently began passwords with “K” or “k.”
Such patterns shrink the effective search space for attackers. Moreover, code‑assistant tools (Claude Code, GitHub Copilot, Gemini CLI) have been observed to emit passwords without explicit requests, leaking them to public repositories.
Medical decision‑making systems that rely on LLM‑generated randomness also inherit these biases, posing risks beyond simple password generation.
Practical Takeaways
Do not rely on LLMs for true randomness. Use cryptographically secure RNGs such as crypto/rand (Go), os.urandom() (Python), or SecureRandom (Java) for passwords, nonces, and sampling.
Model outputs reflect training‑data distributions. Unexpectedly specific answers often indicate over‑represented patterns in the corpus.
Bias propagates. The “7 effect” and the 7,200–7,500 clustering are benign examples of a broader phenomenon affecting demographics, sentiment analysis, and factual statements.
Tool calls are the only reliable fix. Route randomness‑requiring steps to external functions (e.g., via function calling APIs) or invoke a separate RNG process.
Conclusion
The Reddit number‑picking experiment succinctly demonstrates that LLMs are deterministic pattern‑matchers, reproducing human biases embedded in their training data. This leads to predictable numeric outputs, security‑critical weaknesses in password generation, and broader implications for any application that assumes LLMs can produce genuine randomness.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
