Artificial Intelligence 12 min read

Introduction to Machine Learning Concepts: Data, Features, Labels, Training, and Common Algorithms

This article provides a beginner-friendly overview of machine learning fundamentals, covering the definition of data, the distinction between features and labels, types of features, dimensionality, training and test datasets, normalization, supervised and unsupervised learning methods, algorithm selection, development workflow, and recommended Python libraries such as NumPy.

Python Programming Learning Circle

Mar 6, 2020

Introduction to Machine Learning Concepts: Data, Features, Labels, Training, and Common Algorithms

Machine learning often feels lofty because of its terminology; this article explains the core concepts in plain language.

1. Data

In programming we frequently use database; a row in a database corresponds to a record, which contains many attributes. These attributes are directly related to machine learning features.

2. Features

Each attribute of a record is called a feature in machine learning. Features are divided into discrete and continuous types.

2.1 Discrete Features

Examples from high‑school math: continuous vs discrete. In a sample Bird class we define attributes such as weight, length, fins, color, and type. These attributes are also called features.

Color is an enumerated feature.

Weight and length are numerical features.

Fins are boolean features (also called binary features).

Type is the target variable.

2.2 Continuous Features

Continuous data are typically interval data, e.g., temperature ranges like 19~28°C or water freezing point -4~−∞°C.

3. Dimensionality

The number of features equals the dimensionality. For the Bird class with four features, each data vector has four dimensions.

4. Target Variable (Label)

The target variable is the value the algorithm predicts, e.g., y = x + 1. In classification it is usually categorical, while in regression it is continuous. The target must be present in the training set to relate features to the label.

5. Training

Training (algorithm training) feeds a large set of labeled data to produce an algorithm model. The data used for training is called the training set.

Example: feeding data to an algorithm yields f(x) = x/2, which becomes the model.

6. Datasets

6.1 Training Dataset

The training dataset provides input data for model training.

6.2 Test Dataset

The test dataset is used after training to evaluate the model’s performance; it must contain the target variable to measure accuracy and must not be used during training.

7. Normalization

Normalization (numeric scaling) rescales features to the 0~1 range to avoid dominance of large‑scale attributes in distance calculations.

Simple formula: new_value = (old_value - min) / (max - min) where min and max are the feature’s minimum and maximum in the dataset.

8. Supervised Learning

Supervised learning uses labeled data. It requires knowledge of both features and target variable.

8.1 Classification

Classification assigns data to predefined categories; it is a primary task of machine learning.

Banana  Orange  Fork  Cola
|       |       |     |
Fruit   Fruit   Utensil  Drink

8.2 Regression

Regression predicts continuous values, e.g., fitting a curve to data points.

8.3 Common Supervised Algorithms

(Illustration omitted.)

9. Unsupervised Learning

Unsupervised learning works with only features (no target). It discovers structure such as clusters.

9.1 Clustering

Clustering groups similar data points into clusters.

Apple  Orange  Spoon  Fork  Cola  Sprite
 \      /      \    /    /      \
  Fruit   Utensil   Drink

9.2 Density Estimation

Estimating the statistical distribution of data.

9.3 Common Unsupervised Algorithms

(Illustration omitted.)

10. Choosing the Right Algorithm

Select an algorithm based on the task and data type: use supervised learning for predicting a target; if the target is discrete, choose a classification algorithm; if continuous, choose regression. If no target is needed, consider unsupervised methods such as clustering or density estimation.

11. Machine Learning Development Steps

Collect data (web crawling, sensors, public datasets).

Prepare input data (formatting, discretization).

Analyze data (missing values, outliers, visualizations).

Train algorithm (choose appropriate model; unsupervised methods skip training).

Test algorithm (evaluate model performance).

Deploy algorithm (engineer the model for production).

12. Recommended Language and Libraries

Python is recommended for its powerful libraries and simplicity.

Common libraries: numpy for vector/matrix operations and matplotlib for 2D/3D plotting.

13. Numpy Common Operations

13.1 Create Random Data

import numpy
numpy.random.ramd(4,4)

13.2 Convert Array to Matrix

import numpy
random_array = numpy.random.ramd(4,4)
random_matrix = numpy.mat(random_array)

13.3 Matrix Inverse

import numpy
random_matrix = numpy.mat(numpy.random.ramd(4,4))
random_matrix.I

13.4 Matrix Multiplication

import numpy
random_matrix = numpy.mat(numpy.random.ramd(4,4))
result = random_matrix * random_matrix.I

13.5 Create Identity Matrix

import numpy
matrix = numpy.eye(4)

- END -

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning features Data preprocessing unsupervised learning supervised learning

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.