Logistic Regression from Scratch
Goal: Implement binary logistic regression with NumPy — sigmoid, cross-entropy loss, gradient descent. Then extend to multi-class.
Prerequisites: Logistic Regression, Loss Functions, Probability Basics, 01 - Linear Regression from Scratch
Core Idea
Linear regression outputs any real number. For classification, we need probabilities (0 to 1). The sigmoid function does this:
The model:
Sigmoid and Its Gradient
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(z):
return 1 / (1 + np.exp(-np.clip(z, -500, 500))) # clip for numerical stability
# Plot
z = np.linspace(-6, 6, 200)
plt.plot(z, sigmoid(z))
plt.axhline(0.5, color="gray", linestyle="--", alpha=0.5)
plt.axvline(0, color="gray", linestyle="--", alpha=0.5)
plt.xlabel("z"); plt.ylabel("σ(z)")
plt.title("Sigmoid function")
plt.show()Key property: — the gradient depends on the output itself.
Binary Cross-Entropy Loss
For true label and predicted probability :
When : loss = — penalizes low confidence in positive class When : loss = — penalizes high confidence in positive class
def binary_cross_entropy(y, p):
eps = 1e-15 # avoid log(0)
p = np.clip(p, eps, 1 - eps)
return -np.mean(y * np.log(p) + (1 - y) * np.log(1 - p))Generate Data
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=300, n_features=2, n_redundant=0,
n_informative=2, random_state=42, n_clusters_per_class=1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap="bwr", s=15, alpha=0.7)
plt.xlabel("x1"); plt.ylabel("x2")
plt.title("Binary classification data")
plt.show()Full Implementation
class LogisticRegressionScratch:
def __init__(self, lr=0.1):
self.lr = lr
def fit(self, X, y, n_epochs=1000):
n, d = X.shape
self.w = np.zeros(d)
self.b = 0.0
self.losses = []
for epoch in range(n_epochs):
# Forward pass
z = X @ self.w + self.b
p = sigmoid(z)
# Loss
loss = binary_cross_entropy(y, p)
self.losses.append(loss)
# Gradients (derive these — it's a clean result)
error = p - y # shape (n,)
dw = (1 / n) * X.T @ error # shape (d,)
db = (1 / n) * np.sum(error) # scalar
# Update
self.w -= self.lr * dw
self.b -= self.lr * db
return self
def predict_proba(self, X):
return sigmoid(X @ self.w + self.b)
def predict(self, X, threshold=0.5):
return (self.predict_proba(X) >= threshold).astype(int)Why p - y is the gradient
The gradient of BCE with respect to simplifies beautifully:
This is the same form as linear regression’s gradient — the chain rule through sigmoid cancels the denominator in the log.
Train and Evaluate
model = LogisticRegressionScratch(lr=0.1)
model.fit(X, y, n_epochs=500)
# Accuracy
preds = model.predict(X)
acc = np.mean(preds == y)
print(f"Training accuracy: {acc:.4f}")
# Loss curve
plt.plot(model.losses)
plt.xlabel("Epoch"); plt.ylabel("BCE Loss")
plt.title("Training convergence")
plt.show()Visualize Decision Boundary
def plot_decision_boundary(model, X, y):
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
np.linspace(y_min, y_max, 200))
grid = np.c_[xx.ravel(), yy.ravel()]
probs = model.predict_proba(grid).reshape(xx.shape)
plt.contourf(xx, yy, probs, levels=50, cmap="RdBu", alpha=0.6)
plt.colorbar(label="P(y=1)")
plt.scatter(X[:, 0], X[:, 1], c=y, cmap="bwr", s=15, edgecolors="k", linewidth=0.3)
plt.title("Decision boundary")
plt.show()
plot_decision_boundary(model, X, y)The decision boundary is a straight line where , i.e., .
Regularization (L2)
Add to the loss. Gradient gets an extra term :
class LogisticRegressionL2(LogisticRegressionScratch):
def __init__(self, lr=0.1, lam=0.01):
super().__init__(lr)
self.lam = lam
def fit(self, X, y, n_epochs=1000):
n, d = X.shape
self.w = np.zeros(d)
self.b = 0.0
self.losses = []
for epoch in range(n_epochs):
z = X @ self.w + self.b
p = sigmoid(z)
loss = binary_cross_entropy(y, p) + (self.lam / (2 * n)) * np.sum(self.w ** 2)
self.losses.append(loss)
error = p - y
dw = (1 / n) * X.T @ error + (self.lam / n) * self.w
db = (1 / n) * np.sum(error)
self.w -= self.lr * dw
self.b -= self.lr * db
return selfVerify Against sklearn
from sklearn.linear_model import LogisticRegression
sk_model = LogisticRegression(C=1e10) # C=1e10 ≈ no regularization
sk_model.fit(X, y)
print(f"Ours: w={model.w}, b={model.b:.4f}")
print(f"sklearn: w={sk_model.coef_[0]}, b={sk_model.intercept_[0]:.4f}")
print(f"sklearn accuracy: {sk_model.score(X, y):.4f}")Exercises
-
Multi-class: Implement one-vs-rest classification. Train binary classifiers, predict the class with highest probability.
-
Softmax regression: Replace sigmoid with softmax for multi-class: . Use categorical cross-entropy loss.
-
Precision and recall: Vary the threshold from 0 to 1 and plot precision vs recall. At what threshold is F1 maximized?
-
Polynomial features: The boundary is linear. Generate data with
make_moons(noise=0.2)and show logistic regression fails. Then add polynomial features () and show it works.
Next: 03 - Decision Tree from Scratch — a non-linear classifier that doesn’t need feature engineering.