Logistic Regression: Definition, Purpose, Structure, Implementation, and Regularization
This article explains logistic regression as a classification algorithm, covering its definition, purpose, mathematical structure, data preparation, core functions such as sigmoid, cost, gradient descent, prediction, model evaluation, decision boundary visualization, feature mapping, and regularization techniques, all illustrated with Python code examples.
1. Definition, Purpose, and Role
Logistic regression is a machine‑learning algorithm used for binary classification; despite its name, it predicts a probability between 0 and 1 that a sample belongs to a given class.
The algorithm models the output with the sigmoid (logistic) function, learns parameters by maximizing likelihood or minimizing cross‑entropy loss, and is widely applied to problems such as spam detection or disease diagnosis.
2. Algorithm Structure
The logistic‑regression pipeline consists of the following components:
Input layer : receives feature vectors.
Weights : a parameter θ for each feature (including an intercept term).
Linear combination : computes X·θ.
Activation function : applies the sigmoid to map the linear output to a probability.
Output layer : classifies using a 0.5 threshold.
Loss function : cross‑entropy (log loss) measures prediction error.
Optimization algorithm : gradient descent (or advanced optimizers) updates θ to minimize the loss.
3. Data Preparation (One)
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">path = 'ex2data1.txt' # file path
data = pd.read_csv(path, header=None, names=['Exam1','Exam2','Admitted'])
data.head()
</code>Visualising the two exam scores:
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'], positive['Exam2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam1'], negative['Exam2'], s=50, c='r', marker='x', label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam1 Score')
ax.set_ylabel('Exam2 Score')
plt.show()
</code>4. Sigmoid Function (Two)
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def sigmoid(z):
return 1 / (1 + np.exp(-z))
</code>The sigmoid maps any real‑valued input to the interval (0,1), providing the probability estimate for the positive class.
5. Cost Function (Three)
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def cost(theta, X, Y):
first = Y * np.log(sigmoid(X @ theta.T))
second = (1 - Y) * np.log(1 - sigmoid(X @ theta.T))
return -1 * np.mean(first + second)
</code>This is the cross‑entropy loss used to evaluate how well the model fits the data.
6. Gradient Descent (Four)
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def gradient(theta, X, Y):
return (1/len(X)) * X.T @ (sigmoid(X @ theta.T) - Y)
</code>Three optimization approaches are shown:
Using scipy.optimize.fmin_tnc with the cost and gradient functions.
Manual gradient descent with explicit loops.
Using scipy.optimize.minimize (Newton‑CG method).
7. Prediction (Five)
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def predict(theta, X):
probability = sigmoid(X @ theta.T)
return [1 if x >= 0.5 else 0 for x in probability]
</code>8. Model Accuracy (Six)
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">theta_min = np.matrix(result[0])
predictions = predict(theta_min, X)
correct = [1 if a ^ b == 0 else 0 for (a,b) in zip(predictions, Y)]
accuracy = sum(correct) / len(correct)
print('accuracy = {0:.0f}%'.format(accuracy*100))
</code>The accuracy is printed as a percentage.
9. Extensions
Additional evaluation metrics (precision, recall, F1‑score) can be obtained via sklearn.metrics.classification_report:
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">from sklearn.metrics import classification_report
print(classification_report(Y, predictions))
</code>Decision boundary visualisation:
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">coef = -res.x / res.x[2]
x = np.arange(30, 100, 0.5)
y = coef[0] + coef[1] * x
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'], positive['Exam2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam1'], negative['Exam2'], s=50, c='r', marker='x', label='Not Admitted')
ax.plot(x, y, label='Decision Boundary', c='grey')
ax.legend()
ax.set_xlabel('Exam1 Score')
ax.set_ylabel('Exam2 Score')
plt.show()
</code>Feature mapping to higher‑order polynomial features:
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def feature_mapping(x, y, power, as_ndarray=False):
data = {'f{0}{1}'.format(i-p, p): np.power(x, i-p) * np.power(y, p)
for i in range(0, power+1)
for p in range(0, i+1)}
if as_ndarray:
return pd.DataFrame(data).values
else:
return pd.DataFrame(data)
</code>Regularization to prevent over‑fitting:
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def regularized_cost(theta, X, Y, l=1):
theta_1n = theta[1:]
regularized_term = l / (2 * len(X)) * np.power(theta_1n, 2).sum()
return cost(theta, X, Y) + regularized_term
def regularized_gradient(theta, X, Y, l=1):
theta_1n = theta[1:]
regularized_theta = l / len(X) * theta_1n
regularized_term = np.concatenate([np.array([0]), regularized_theta])
return gradient(theta, X, Y) + regularized_term
</code>By selecting an appropriate regularization parameter λ, the model balances bias and variance, mitigating both under‑fitting and over‑fitting.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
