অধ্যায় 28 — Cost Functions

📖 একটি ছোট গল্প

একজন teacher exam paper grade করছেন — প্রতিটি ভুলের জন্য কত mark কাটবেন? Strict হলে 1 wrong = 0; lenient হলে partial credit। ML model-এর "teacher" = cost function — যা মাপে model কতটা ভুল করছে এবং কীভাবে শাস্তি দিচ্ছে।

Regression Losses

MSE (Mean Squared Error)

L = (1/n) Σ (yᵢ − ŷᵢ)²

Large error-কে বেশি শাস্তি (squared)। Gaussian noise-এর MLE। Outlier-sensitive।

MAE (Mean Absolute Error)

L = (1/n) Σ |yᵢ − ŷᵢ|

Outlier-robust। Median predict করে। Origin-এ non-differentiable।

Huber Loss

L_δ(e) = ½e² if |e| ≤ δ, else δ(|e| − ½δ)

Small error-এ MSE, large error-এ MAE — best of both।

Classification Losses

Binary Cross-Entropy (Log Loss)

L = −[y log p + (1−y) log(1−p)]

Bernoulli NLL। p → 0 হলে y=1-এ infinite penalty।

Categorical Cross-Entropy

L = −Σ_c y_c log ŷ_c

Multi-class softmax output-এর সাথে jointly use হয়।

Hinge Loss (SVM)

L = max(0, 1 − y · ŷ), y ∈ {−1, +1}

Margin-based — সঠিকভাবে classify এবং confident হলে 0 loss।

Focal Loss

L = −(1 − p)^γ log p

Easy example-এর contribution down-weight করে — class imbalance-এ ভালো।

Ranking & Contrastive Losses

Pairwise ranking: max(0, margin − s⁺ + s⁻) — positive item negative-এর চেয়ে higher score।
Triplet loss: anchor, positive, negative — face recognition-এ ব্যবহৃত।
InfoNCE / contrastive: CLIP, SimCLR — similarity-কে softmax-এর মতো treat করে।

Python Implementation

pythonPython · NumPy

import numpy as np

y_true = np.array([1.0, 2.0, 3.0, 100.0])    # last is outlier
y_pred = np.array([1.5, 2.5, 2.8, 4.0])

mse = np.mean((y_true - y_pred)**2)
mae = np.mean(np.abs(y_true - y_pred))

def huber(y, yhat, delta=1.0):
    e = np.abs(y - yhat)
    return np.mean(np.where(e <= delta, 0.5*e**2, delta*(e - 0.5*delta)))

print(f"MSE   = {mse:.2f}   (dominated by outlier)")
print(f"MAE   = {mae:.2f}")
print(f"Huber = {huber(y_true, y_pred):.2f}  (robust)")

# Cross-entropy
def bce(y, p, eps=1e-12):
    p = np.clip(p, eps, 1 - eps)
    return -np.mean(y*np.log(p) + (1-y)*np.log(1-p))

print(f"\nBCE(y=1, p=0.9)  = {bce(np.array([1.0]), np.array([0.9])):.4f}")
print(f"BCE(y=1, p=0.1)  = {bce(np.array([1.0]), np.array([0.1])):.4f}  (huge penalty)")

# Categorical
def ce(y_onehot, probs, eps=1e-12):
    return -np.mean(np.sum(y_onehot * np.log(probs + eps), axis=1))

y = np.array([[0,1,0]])
probs = np.array([[0.2, 0.7, 0.1]])
print(f"Cat CE = {ce(y, probs):.4f}")

AI/ML সংযোগ

Loss choice = model behavior define করে — MSE → mean predict, MAE → median।
Cross-entropy + softmax = classification-এর de facto standard।
Focal loss → object detection (RetinaNet)।
Contrastive loss → representation learning (CLIP, SimCLR)।
Multi-task learning = weighted sum of multiple losses।

Common Mistakes

Imbalanced classification-এ accuracy দেখা loss-এর বদলে — misleading।
Softmax output-এর সাথে MSE use করা cross-entropy-র জায়গায় — slow convergence।
Log-এর ভেতরে clipping না করে NaN পাওয়া (log 0)।
Regression-এ label scale না করে large MSE পেয়ে confused হওয়া।

Practice Tasks

একই prediction set-এ MSE, MAE, Huber compare করুন — outlier থাকা ও না-থাকা data-তে।
Cross-entropy এবং MSE softmax-এর সাথে compare — gradient magnitude দেখুন।
Class imbalance (95-5) data-তে BCE vs Focal loss-এ accuracy/recall দেখুন।

Assignment

৬টি loss function (MSE, MAE, Huber, BCE, Categorical CE, Hinge) NumPy-তে scratch implement করুন এবং PyTorch-এর সাথে output match করুন। প্রতিটির gradient analytic-ভাবে derive ও code করুন।

Interview Questions

MSE vs MAE — কখন কোনটি?
Cross-entropy কেন softmax-এর সাথে natural choice?
Focal loss কীভাবে class imbalance handle করে?
Hinge loss vs logistic loss — পার্থক্য?

Mini Project

"Loss Function Sandbox" — user prediction এবং truth দেয়, tool ৭টি loss এবং তাদের gradient real-time calculate করে; outlier-effect slider দিয়ে interactive comparison।

Summary · সারসংক্ষেপ

Loss = model-এর "ভুল" পরিমাপ — choice problem-define করে।
Regression: MSE (Gaussian), MAE (Laplace), Huber (hybrid)।
Classification: BCE/CE (probabilistic), Hinge (margin), Focal (imbalance)।
Loss = NLL — সব ML actually MLE-এর variant।

✨ পরবর্তী পদক্ষেপ

Chapter 29-এ Gradient-Based Optimization — কীভাবে cost minimize করি।

পূর্ববর্তী · CH 27

Convex Functions

পরবর্তী · CH 29

Gradient-Based Optimization