How the Central Limit Theorem Solves Real-World Probability Problems
This article explains the Central Limit Theorem, shows how a sum of independent identically distributed variables approaches a normal distribution, and demonstrates its practical use through six detailed examples ranging from power supply planning to medical donor matching.
Central Limit Theorem
Let \(X_1, X_2, \dots, X_n\) be a sequence of independent and identically distributed random variables with mean \(\mu\) and variance \(\sigma^2\). Then for any real number \(z\),
The theorem states that if the variables are i.i.d., then when the sample size \(n\) is sufficiently large (generally at least 30, and the larger the better), the standardized sum \(\frac{\sum_{i=1}^n X_i - n\mu}{\sigma\sqrt{n}}\) approximately follows a standard normal distribution, and the sum \(\sum_{i=1}^n X_i\) approximately follows a normal distribution with mean \(n\mu\) and variance \(n\sigma^2\).
Example 1
A workshop has 200 identical lathes, each requiring 10 kW of power. Each lathe operates independently with probability \(p\). How much total power must be supplied so that the probability all lathes work is at least a given level?
Solution: Let \(X_i\) denote the power consumption of the \(i\)‑th lathe (10 kW if it works, 0 otherwise). The total power demand \(S=\sum_{i=1}^{200} X_i\) is the sum of i.i.d. variables. By the Central Limit Theorem, \(S\) is approximately normal. Computing the required quantile yields the needed power, e.g., about 1 800 kW for a 95% reliability, saving several hundred kilowatts compared with the worst‑case 2 000 kW.
Example 2
In a city of one million people, the probability that a person needs an ambulance for an acute illness is 1/20 000. How many ambulances must the emergency center have to ensure a given probability that a call can be answered promptly?
Solution: Define \(X_i\) as the indicator that the \(i\)‑th person needs an ambulance. The total number of calls \(S=\sum_{i=1}^{10^6} X_i\) is approximated by a normal distribution via the CLT. Using the desired service level and the normal quantile, the required number of ambulances is found to be 67.
Example 3
A supermarket runs a prize draw: for every ¥50 spent, a customer receives a ticket. Among 10 000 tickets there are 1 first prize, 10 second prizes, 100 third prizes, and 1 000 encouragement prizes (prizes can be won multiple times). Questions: (1) If a customer spends ¥3 000, what is the probability of winning at least three prizes? (2) How much must a customer spend to have at least a given probability of winning a second‑prize or better?
Solution: Model the number of winning tickets as a sum of independent Bernoulli variables and apply the CLT to approximate the distribution. Solving the resulting normal‑approximation equations yields the required spending amount, which turns out to be a very large figure.
Example 4
When a computer adds numbers after rounding each to the nearest integer, the rounding error for each number follows a uniform distribution on \([-0.5,0.5]\). Assuming the errors are independent, what is the probability that the absolute total error of adding 1 500 such numbers is less than 15?
Solution: Let \(E_i\) be the rounding error of the \(i\)‑th number. Each \(E_i\) is uniform on \([-0.5,0.5]\) with mean 0 and variance \(1/12\). The total error \(E=\sum_{i=1}^{1500} E_i\) is approximately normal with mean 0 and variance \(1500/12\). Computing \(P(|E|<15)\) using the normal CDF gives a probability of about 0.125.
Example 5
In a non‑related population, the probability of a bone‑marrow match is 1/100 000. A leukemia patient needs a match. (1) If a marrow bank contains 200 000 donor records, what is the probability of finding a match? (2) How many records are needed to achieve a desired match probability \(p\)?
Solution: Let \(X_i\) be the indicator that the \(i\)‑th donor matches the patient. The total number of matches \(S=\sum_{i=1}^N X_i\) is approximated by a normal distribution. For \(N=200 000\), the expected number of matches is 2, giving a match probability of roughly 0.0198. Solving the normal‑approximation equation for a target probability yields the required bank size, which turns out to be very large.
Example 6
From a large batch of seeds, 600 are randomly selected and 93 are found to be high‑quality. Construct a confidence interval for the true proportion of high‑quality seeds with confidence level \(1-\alpha\).
Solution: Let \(X_i\) be the indicator that the \(i\)‑th sampled seed is high‑quality. The sample proportion \(\hat p = 93/600\) estimates the true proportion \(p\). By the CLT, \(\hat p\) is approximately normal with mean \(p\) and variance \(p(1-p)/600\). Using the standard normal quantile for the chosen confidence level, the interval \(\hat p \pm z_{\alpha/2}\sqrt{\hat p(1-\hat p)/600}\) is obtained, giving the numerical bounds after substitution.
These examples illustrate that whenever a random variable can be expressed as the sum of many independent and identically distributed components, the Central Limit Theorem allows us to approximate probabilities that would otherwise be difficult to compute. Moreover, for large samples the theorem underpins interval estimation and hypothesis testing for non‑normal populations, acting as a bridge between probability theory and mathematical statistics.
Reference:
Huatiantui. “On the Central Limit Theorem in Mathematical Modeling.” Journal of Suzhou Vocational University, 2002(03):22‑24. DOI:10.16219/j.cnki.szxbzk.2002.03.007.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.