10 Common Loss Functions and Their Python Implementations
This article explains ten widely used loss functions for regression and classification tasks, describes their mathematical definitions, compares their purposes, and provides complete Python code examples for each, helping readers understand how to select and implement appropriate loss metrics in machine‑learning models.
This article provides an in‑depth overview of ten commonly used loss functions in machine learning, explaining their definitions, typical use cases, and offering Python implementations for each.
What is a loss function?
A loss function measures the discrepancy between a model's predictions and the true values; lower values indicate better predictions. The average of individual losses across all samples is called the cost function.
Loss functions vs. evaluation metrics
While some loss functions can serve as evaluation metrics, loss functions are primarily used during model training to guide optimization, whereas metrics assess final model performance.
Why use loss functions?
Loss functions quantify prediction errors, enabling gradient‑based optimization to adjust model parameters toward better performance.
Regression loss functions
1. Mean Squared Error (MSE)
Computes the average of squared differences between predicted and true values.
def MSE(y, y_predicted):
sq_error = (y_predicted - y) ** 2
sum_sq_error = np.sum(sq_error)
mse = sum_sq_error / y.size
return mse2. Mean Absolute Error (MAE)
Calculates the average absolute difference, which is more robust to outliers.
def MAE(y, y_predicted):
error = y_predicted - y
absolute_error = np.absolute(error)
total_absolute_error = np.sum(absolute_error)
mae = total_absolute_error / y.size
return mae3. Root Mean Squared Error (RMSE)
The square root of MSE, useful when the error scale should match the original units.
def RMSE(y, y_predicted):
sq_error = (y_predicted - y) ** 2
total_sq_error = np.sum(sq_error)
mse = total_sq_error / y.size
rmse = math.sqrt(mse)
return rmse4. Mean Bias Error (MBE)
Similar to MAE but retains the sign of the error, indicating systematic over‑ or under‑prediction.
def MBE(y, y_predicted):
error = y_predicted - y
total_error = np.sum(error)
mbe = total_error / y.size
return mbe5. Huber Loss
Combines MAE and MSE, using a quadratic loss for small errors and linear loss for large errors.
def hubber_loss(y, y_predicted, delta):
delta = 1.35 * MAE(y, y_predicted)
y_size = y.size
total_error = 0
for i in range(y_size):
error = np.absolute(y_predicted[i] - y[i])
if error < delta:
hubber_error = (error * error) / 2
else:
hubber_error = (delta * error) / (0.5 * (delta * delta))
total_error += hubber_error
total_hubber_error = total_error / y.size
return total_hubber_errorBinary classification loss functions
6. Likelihood Loss (LHL)
Multiplies predicted probabilities for the true class and averages them.
def LHL(y, y_predicted):
likelihood_loss = (y * y_predicted) + ((1 - y) * (y_predicted))
total_likelihood_loss = np.sum(likelihood_loss)
lhl = - total_likelihood_loss / y.size
return lhl7. Binary Cross‑Entropy (BCE)
Penalizes confident but wrong predictions by applying the logarithm to predicted probabilities.
def BCE(y, y_predicted):
ce_loss = y * np.log(y_predicted) + (1 - y) * np.log(1 - y_predicted)
total_ce = np.sum(ce_loss)
bce = - total_ce / y.size
return bce8. Hinge Loss and Squared Hinge Loss
Used for support‑vector machines; penalizes predictions that are on the wrong side of the margin.
# Hinge Loss
def Hinge(y, y_predicted):
hinge_loss = np.sum(np.maximum(0, 1 - (y_predicted * y)))
return hinge_loss
# Squared Hinge Loss
def SqHinge(y, y_predicted):
sq_hinge_loss = np.maximum(0, 1 - (y_predicted * y)) ** 2
total_sq_hinge_loss = np.sum(sq_hinge_loss)
return total_sq_hinge_lossMulticlass classification loss functions
9. Categorical Cross‑Entropy (CE)
Generalizes binary cross‑entropy to multiple classes.
def CCE(y, y_predicted):
cce_class = y * np.log(y_predicted)
sum_totalpair_cce = np.sum(cce_class)
cce = - sum_totalpair_cce / y.size
return cce10. Kullback‑Leibler Divergence (KLD)
Measures how one probability distribution diverges from a reference distribution, useful for imbalanced classes.
def KL(y, y_predicted):
kl = y * np.log(y / y_predicted)
total_kl = np.sum(kl)
return total_klThese ten loss functions cover the most common scenarios in regression and classification, providing both theoretical insight and ready‑to‑use Python code.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.