An Introduction to Machine Learning: Concepts, Learning Path, and Knowledge System
This article provides a comprehensive overview of machine learning, explaining core AI terminology, distinguishing statistics, statistical learning, and machine learning, outlining a three‑part learning roadmap covering mathematical foundations, algorithms, and Python programming practice, and offering curated resources for building a solid knowledge system.
Author Wang Maolin from Huazhong University of Science and Technology, sourced from Datawhale, shares a structured guide to help beginners enter the field of machine learning.
Part 1: Machine Learning Related Concepts
Artificial intelligence (AI) encompasses many terms such as machine learning, statistical learning, data science, data analysis, data mining, and deep learning. The article first clarifies the difference between statistics and statistical learning, noting that statistical learning draws heavily on statistical foundations.
Machine learning is defined as a set of fixed algorithms or methods that enable computers to learn rules or mappings from data; it originated from statistical learning, making the two concepts largely equivalent except for deep learning.
Data science and data analysis are briefly described, and the relationship between statistical learning, machine learning, and deep learning is illustrated.
Part 2: Overall Machine Learning Learning Process
The learning process is divided into three interrelated blocks: mathematical foundations, machine learning algorithms, and programming practice. The article emphasizes that these blocks are not strictly sequential; learners can start with algorithms or programming and supplement mathematics as needed.
Mathematical Foundations
Key topics include calculus derivatives, linear algebra matrix operations, probability theory (total probability, conditional probability), information theory (entropy), and optimization methods such as gradient descent and KKT conditions. Advanced topics for research include learnability, complexity, generalization, and variational methods.
Machine Learning Algorithms
Algorithms are split into traditional machine learning (e.g., linear regression, logistic regression, decision trees) and deep learning (ANN, CNN, RNN). Each sub‑model can be studied independently.
Programming Practice
Python is recommended as the most accessible language for machine learning. Learners should start with basic syntax, then explore NumPy, Pandas, followed by the scikit‑learn framework for traditional algorithms and Keras for deep learning, practicing alongside theory.
Part 3: Machine Learning Knowledge System
The article proposes a three‑part knowledge system—machine learning theory, algorithms, and practice—and lists recommended resources and Datawhale projects for each area.
Key Book (theory supplement): https://github.com/datawhalechina/key-book
Pumpkin Book (detailed study of Zhou Zhihua's book): https://github.com/datawhalechina/pumpkin-book
Easy‑RL (deep reinforcement learning tutorial): https://github.com/datawhalechina/easy-rl
LeeML‑Notes (Li Hongyi’s machine learning course notes): https://github.com/datawhalechina/leeml-notes
The article concludes by thanking readers and encouraging them to like, share, and join the DataFunTalk machine learning community.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.