Artificial Intelligence 64 min read

From Linear Regression to Transformers: Mastering Machine Learning Foundations

This comprehensive guide walks readers through the evolution of machine learning, starting with basic linear models and feature engineering, progressing through logistic regression, decision trees, and deep learning architectures like MLPs, CNNs, RNNs, and transformers, and demonstrates practical implementations with code examples and evaluation metrics.

Alibaba Cloud Developer

Mar 6, 2025

From Linear Regression to Transformers: Mastering Machine Learning Foundations

Overview

The article presents a step‑by‑step exploration of machine‑learning concepts, from traditional algorithms to modern deep‑learning models, and shows how to build, train, evaluate, and deploy them in real‑world scenarios such as recommendation systems.

From Simple Linear Models to Algorithms

It begins with the formulation of a linear scoring rule for mini‑program quality, introduces feature engineering (e.g., error‑rate and best‑practice bins), and shows how to compute a total score using a weighted sum.

Feature Engineering

Feature engineering is described as the bridge that converts raw business attributes into numeric vectors, including one‑hot, multi‑hot, embedding, normalization, and feature crossing techniques.

Model Selection and Training

Various supervised learning models are compared: logistic regression, decision trees, ensemble methods, and non‑linear models. The chosen baseline is a logistic‑regression model trained with stochastic gradient descent.

import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from torch.utils.data import DataLoader, TensorDataset

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

data = pd.read_csv('mini_program_data.csv')
X = data[['completeness', 'error_rate']].values
y = data['label'].values

scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

X_train_tensor = torch.tensor(X_train, dtype=torch.float32).to(device)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).view(-1, 1).to(device)
X_val_tensor = torch.tensor(X_val, dtype=torch.float32).to(device)
y_val_tensor = torch.tensor(y_val, dtype=torch.float32).view(-1, 1).to(device)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

class LogisticRegressionModel(nn.Module):
    def __init__(self, input_dim):
        super(LogisticRegressionModel, self).__init__()
        self.linear = nn.Linear(input_dim, 1)
    def forward(self, x):
        return torch.sigmoid(self.linear(x))

model = LogisticRegressionModel(X_train.shape[1]).to(device)
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

for epoch in range(100):
    model.train()
    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/100], Loss: {loss.item():.4f}')

model.eval()
with torch.no_grad():
    val_outputs = model(X_val_tensor)
    predicted = (val_outputs > 0.5).float()
    accuracy = (predicted.cpu() == y_val_tensor.cpu()).float().mean()
    print(f'Validation Accuracy: {accuracy:.2f}')

Evaluation Metrics

Standard classification metrics such as accuracy, precision, recall, F1‑score, MCC, and AUC are explained, together with how they relate to quality‑control test cases.

Deep Learning Foundations

The article introduces the core deep‑learning paradigm “embedding + MLP”. Embeddings compress high‑dimensional raw data (text, images, graphs) into dense vectors; MLPs (multi‑layer perceptrons) then learn non‑linear mappings.

Multi‑Layer Perceptron (MLP)

MLP consists of stacked linear layers with activation functions (ReLU, sigmoid, etc.). It can approximate any continuous function (universal approximation theorem) and serves as the backbone for many downstream tasks.

Convolutional Neural Networks (CNN)

CNNs extract hierarchical visual features via convolution and pooling layers, enabling image classification, object detection, and visual embedding generation.

Recurrent Networks and Attention

RNN, LSTM, and GRU handle sequential data by maintaining hidden states. Attention mechanisms (self‑attention, multi‑head attention) allow each token to weigh all others, overcoming long‑range dependency limits.

Transformer and Variants

The transformer replaces recurrence with stacked self‑attention blocks, adding positional encodings, layer normalization, and residual connections. Variants such as sparse attention, native sparse attention, and efficient MHA improve scalability.

Recommendation Models

Classic Wide&Deep, DeepFM, DCN, DCN‑V2, DIN, and recent sequence‑based recommenders (GRU4Rec, BERT4Rec, BST) are described, showing how to combine memorization (wide part) and generalization (deep part) for click‑through‑rate prediction.

Knowledge Graphs and Graph Neural Networks

Knowledge graphs capture relational data; graph neural networks (GCN, GAT, DeepWalk) learn node embeddings that can be fed into recommendation models for richer context.

Practical Pipeline

The end‑to‑end workflow includes data collection, feature engineering, model selection, training with appropriate loss functions (e.g., GHMC for imbalanced data), evaluation, deployment, and continuous online feedback loops.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning feature engineering deep learning Evaluation Metrics Recommendation Systems

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.