Linear Regression Algorithm: Definition, Structure, Implementation, Cost Function, Gradient Descent, and Regularization
This article provides a comprehensive overview of linear regression, covering its definition, purpose, algorithmic steps, data preparation, feature scaling, parameter initialization, cost function computation, gradient descent optimization, visualization, normal equation solution, and regularization, accompanied by detailed Python code examples.
The linear regression algorithm aims to build a linear model that describes the relationship between input features (independent variables) and the output variable (dependent variable), enabling prediction of the output based on given inputs. It is widely used in economics, statistics, and machine learning for modeling and prediction.
The basic workflow consists of data preparation, model hypothesis definition, model building via least‑squares minimization, model evaluation (e.g., MSE, R²), and finally using the trained model for prediction on new data.
Data preparation example (reading a CSV file and inspecting the first rows):
path = 'ex1data2.txt'
# write file path
data = pd.read_csv(path, header=None, names=['Size', 'Bedrooms', 'Price'])
# load data into DataFrame
data.head()Feature scaling normalizes each column to have zero mean and unit variance, which improves convergence speed and model generalization:
data = (data - data.mean()) / data.std()
data.head()Adding a column of ones to account for the intercept term:
data.insert(0, 'Ones', 1)Assigning matrices and initializing parameters:
cols = data.shape[1]
X = data.iloc[:, :cols-1]
Y = data.iloc[:, cols-1:cols]
X = np.matrix(X.values)
Y = np.matrix(Y.values)
theta = np.matrix(np.array([0, 0, 0]))
alpha = 0.01 # learning rate
iters = 1000 # number of iterationsCost function implementation:
def computeCost(X, Y, theta):
inner = np.power((X * theta.T) - Y, 2)
return np.sum(inner) / (2 * len(X))Gradient descent implementation for optimizing theta:
def gradientDescent(X, Y, theta, alpha, iters):
temp = np.matrix(np.zeros(theta.shape))
parameters = int(theta.shape[1])
cost = np.zeros(iters)
for i in range(iters):
error = X * theta.T - Y
for j in range(parameters):
term = np.multiply(error, X[:, j])
temp[0, j] = temp[0, j] - alpha / len(X) * np.sum(term)
theta = temp
cost[i] = computeCost(X, Y, theta)
return theta, costVisualization of the cost over iterations using Matplotlib:
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(np.arange(iters), cost, 'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs Training Epoch')
plt.show()Closed‑form solution (normal equation) to compute theta directly:
def normalEqn(X, Y):
theta = np.linalg.inv(X.T @ X) @ X.T @ Y
return theta
theta = normalEqn(X, Y)Regularized cost function to prevent overfitting:
def regularized_cost(theta, X, Y, l=1):
theta_1n = theta[1:]
regularized_term = l / (2 * len(X)) * np.power(theta_1n, 2).sum()
return cost(theta, X, Y) + regularized_termThe normal equation provides an analytical solution without iterative optimization, suitable for small datasets, while gradient descent scales to large datasets and offers flexibility but requires tuning of the learning rate and may converge slowly.
In summary, the article walks through the complete linear regression pipeline—from data loading and preprocessing, through model training with both gradient descent and normal equation, to evaluation, visualization, and regularization—illustrated with clear Python code snippets.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.