Artificial Intelligence 6 min read

Understanding Gradient Descent for Linear Regression with a Python Implementation

This article explains the concept of loss functions and gradient descent, illustrates how to find the global optimum for linear regression, discusses the role of learning rate, and provides a complete Python example that generates data, applies gradient descent, and visualizes the results.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Understanding Gradient Descent for Linear Regression with a Python Implementation

> Gradient Descent

Linear regression is familiar to most people: given data such as house price and area, software like Excel or SPSS quickly produces a fitted function. However, the underlying method is often overlooked.

When the number of dimensions is small, the normal equation can be used, but for high‑dimensional problems such as image recognition or natural language processing, the normal equation becomes impractical and we need gradient descent.

Loss function measures the discrepancy between predicted values and actual values. The goal is to find a set of parameters that minimizes this loss, which is called the global optimum.

In linear regression we typically use the mean squared error as the loss function. The loss surface is a convex “bowl”, and the optimal point lies at the bottom of the bowl.

To reach the bottom we must decide two things: the direction to move and how far to move. The direction is given by the gradient (derivative) of the loss function; the step size is the learning rate. A learning rate that is too small leads to slow convergence, while a learning rate that is too large can cause divergence.

The update rule derived from the gradient of the loss function is applied iteratively until convergence.

> Python Implementation

First we generate synthetic data (y = 2x + 1 with some noise) and plot it.

<code># Generate a scatter plot based on y = 2x + 1 (with noise)
X0 = np.ones((100, 1))
X1 = np.random.random(100).reshape(100, 1)
X = np.hstack((X0, X1))
y = np.zeros((100, 1))
for i, x in enumerate(X1):
    val = x * 2 + 1 + random.uniform(-0.2, 0.2)
    y[i] = val

plt.figure(figsize=(8, 6))
plt.scatter(X1, y, color='g')
plt.plot(X1, X1 * 2 + 1, color='r', linewidth=2.5, linestyle='-')
plt.show()
</code>

Next we implement gradient descent to find the optimal parameters.

<code># Gradient Descent to find the optimal solution

def gradientDescent(X, Y, times=1000, alpha=0.01):
    """\
    alpha: learning rate (default 0.01)\
    times: number of iterations (default 1000)\
    """
    m = len(Y)
    theta = np.array([1, 1]).reshape(2, 1)
    loss = {}
    for i in range(times):
        diff = np.dot(X, theta) - Y
        cost = (diff ** 2).sum() / (2.0 * m)
        loss[i] = cost
        theta = theta - alpha * (np.dot(X.T, diff) / m)
    plt.figure(figsize=(8, 6))
    plt.scatter(list(loss.keys()), list(loss.values()), color='r')
    plt.show()
    return theta

theta = gradientDescent(X, y)
</code>

Running the code with 1000 iterations and a learning rate of 0.01 yields the final parameters (approximately 1.0323, 1.9516) and shows the loss decreasing over iterations.

- END -

Optimizationmachine learninggradient descentlinear regression
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.