Artificial Intelligence 19 min read

The Story of Machine Learning: Why Machines Can Learn and How Statistical Learning Makes It Possible

This article explains why machine learning relies on big‑data statistical learning, illustrating human learning through induction and deduction, presenting case studies that highlight the limits of anecdotal reasoning, and introducing the law of large numbers and probabilistic trust as foundations for reliable AI models.

DataFunTalk

Aug 20, 2019

The Story of Machine Learning: Why Machines Can Learn and How Statistical Learning Makes It Possible

Course Name: Machine Learning Storytelling Instructor: Bi Ran, Chief Architect at Baidu Editor: Hoh Xil Source: Machine Learning Training Camp Platform: Baidu Tech Academy, PaddlePaddle, DataFun

Reading

Chapter 1, "Machine Learning and Big Data," explains why we should hop on the big‑data carriage and is divided into four lessons:

1) Is machine learning possible? Why can machines learn? 2) How does machine learning work? What steps does a machine take to acquire knowledge from the real world? 3) The value of big data – not just hype, but what big data truly means for machine learning and AI from a practitioner’s perspective. 4) Riding the big‑data carriage – why businesses across industries want to adopt big data, what benefits it brings, and how to transform business with those benefits.

First, let’s ask: can machines learn?

Before answering, we look at how humans learn. Humans often draw inspiration from nature – e.g., inventing airplanes after observing birds. A primitive person, after repeatedly experiencing storms followed by rain, learns to seek shelter when clouds gather. This learning consists of two steps: induction (generalizing that storms bring rain) and deduction (predicting rain and taking shelter). The same process applies to modern learning, such as abstracting the addition rule from many concrete examples.

Human learning also relies on statistical observation. For example, a personal story about a friend who argues that smoking does not affect lifespan because some famous smokers live long lives. By examining large‑scale statistics (e.g., 3,000 samples split equally between smokers and non‑smokers), we see that while individual cases can be misleading, the average lifespan of smokers is about five years shorter than that of non‑smokers.

Other anecdotes illustrate how people form conclusions from limited cases: a diligent graduate who feels unlucky because peers achieve great success, or entrepreneurs who attribute success to effort while blaming failure on missed opportunities. These examples show that learning from a few cases can lead to biased worldviews.

The key message is that learning must be based on large numbers of observations – statistical learning – rather than isolated anecdotes.

Can we trust statistics?

Three illustrative questions are presented:

Ball jar: drawing 10 balls (7 green, 3 yellow) – can we infer the jar’s true proportion?

Pattern classification: given two groups of images (A and B), which class does a new image belong to?

Relationship between X and Y: given five points, what is the underlying function?

The answers demonstrate that small samples cannot reliably reveal the true distribution; only with enough observations does the statistical estimate converge to the real value.

Statistical learning rests on the Law of Large Numbers. When the number of samples N grows, the probability that the empirical value v deviates from the true value μ by more than ε approaches zero.

This inequality shows that with enough data, the statistical estimate becomes highly trustworthy. The concept of "probabilistic trust" (or PAC – Probably Approximately Correct) follows: we do not claim absolute certainty, but we express confidence as a probability interval around the estimate.

As more data are gathered, the confidence in statistical conclusions increases, which directly links to the value of big data discussed later. Revisiting the earlier X‑Y example, with many samples the relationship is far more likely to be linear than a coincidental curve.

In summary, machine learning mirrors human learning (induction + deduction) but must rely on statistical learning rather than isolated cases. The foundation of statistical learning is the Law of Large Numbers and probabilistic trust, which enable machines to acquire reliable knowledge from massive data.

Next, we will explore step‑by‑step how machines actually perform learning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data machine learning statistics probability learning theory

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.