Artificial Intelligence 12 min read

How to Teach a Machine to Recognize Cats: From Feature Extraction to Gradient Descent

This article explains the fundamentals of machine learning by illustrating how humans recognize cats, extracting features, training models with gradient descent, implementing linear regression, and providing practical JavaScript and SQL code examples for building a simple classifier.

Alibaba Cloud Developer

May 26, 2020

How to Teach a Machine to Recognize Cats: From Feature Extraction to Gradient Descent

What Is Machine Learning and Why Do We Need It

Machine learning is the process of teaching computers to recognize patterns and make decisions based on data, especially when the number of features becomes too large for simple if‑else rules.

How Humans Learn the Concept of a Cat – An Analogy

Imagine a baby who has never seen a cat. By repeatedly showing the baby furry animals and labeling them as “cat,” the baby extracts features such as meowing, two ears, four legs, a tail, and whiskers. This pattern‑recognition process mirrors how machines learn from labeled examples.

Feature Extraction for Cats and Test Modules

Cat features: can meow, two ears, four legs, one tail, whiskers.

Test‑module features: the strings test, demo, 测试 appear in the module name or keywords.

By formalizing these features, both humans and machines can correctly identify a cat or a test module.

Training a Model – Gradient Descent

We aim to find parameters a and b for the linear model y = a·x + b that minimize the cost (mean squared error). The cost function is:

cost = ((y'1‑y1)^2 + (y'2‑y2)^2 + (y'3‑y3)^2 + (y'4‑y4)^2 + (y'5‑y5)^2 + (y'6‑y6)^2) / 6

Gradient descent iteratively updates a and b using their partial derivatives until the cost reaches a minimum (the parabola’s vertex where the slope is zero).

const costDaoA = (((a*x1+b)-y1)*2*x1 + ((a*x2+b)-y2)*2*x1 + ((a*x3+b)-y3)*2*x1 + ((a*x4+b)-y4)*2*x1 + ((a*x5+b)-y5)*2*x1 + ((a*x6+b)-y6)*2*x1) / 6;
const costDaoB = (((a*x1+b)-y1)*2 + ((a*x2+b)-y2)*2 + ((a*x3+b)-y3)*2 + ((a*x4+b)-y4)*2 + ((a*x5+b)-y5)*2 + ((a*x6+b)-y6)*2) / 6;

These derivatives tell us how to adjust a and b to move toward the cost minimum.

JavaScript Implementation of Gradient Descent

function cost(a, b) {
    let sum = data.reduce((pre, cur) => {
        return pre + ((a + cur[0] * b) - cur[1]) ** 2;
    }, 0);
    return sum / (2 * data.length);
}
function gradientA(a, b) {
    let sum = data.reduce((pre, cur) => {
        return pre + ((a + cur[0] * b) - cur[1]) * cur[0];
    }, 0);
    return sum / data.length;
}
function gradientB(a, b) {
    let sum = data.reduce((pre, cur) => {
        return pre + ((a + cur[0] * b) - cur[1]);
    }, 0);
    return sum / data.length;
}
let batch = 200;
let alpha = 0.001;
let args = [0, 0]; // initial a, b
function step() {
    const costNumber = cost(args[0], args[1]);
    console.log('cost', costNumber);
    chartLoss.series[0].addPoint(costNumber, true, false, false);
    args[0] -= alpha * gradientA(args[0], args[1]);
    args[1] -= alpha * gradientB(args[0], args[1]);
    if (--batch > 0) {
        window.requestAnimationFrame(step);
    } else {
        drawLine(args[0], args[1]);
    }
}
step();

The script runs 200 iterations, visualizes the loss curve, and finally draws the fitted line.

SQL Example for Filtering Test Modules

SELECT * FROM tianma.module_xx
WHERE pt = TO_CHAR(DATEADD(GETDATE(), -1, 'dd'), 'yyyymmdd')
  AND name NOT LIKE '%test%'
  AND name NOT LIKE '%demo%'
  AND name NOT LIKE '%测试%'
  AND keywords NOT LIKE '%test%'
  AND keywords NOT LIKE '%测试%'
  AND keywords NOT LIKE '%demo%';

This query demonstrates how feature‑based rules can be applied directly in a database to exclude test modules.

Linear Regression Demo

Using a simple dataset of temperature vs. cricket chirping frequency, we fit a straight line via gradient descent. The scatter plot and fitted line are shown below.

The blue line represents the model y = a·x + b after training.

Gradient Descent Visualization

The parabola illustrates the cost surface; the algorithm seeks the lowest point where the slope (derivative) is zero.

Conclusion

By extracting meaningful features, defining a loss function, and applying gradient descent, we can train simple models such as linear regression to make predictions. The same principles extend to more complex tasks with many features.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

JavaScript SQL feature extraction gradient descent linear regression

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.