অধ্যায় 51 — Bayesian Deep Learning

📖 একটি ছোট গল্প

১০০ model train করলাম, কিন্তু কোনটার weight সবচেয়ে reliable? Overconfident prediction-এ কীভাবে sure হবো? Bayesian Deep Learning weight-কে fixed point না, probability distribution হিসেবে দেখে — uncertainty estimation, robust prediction, এবং out-of-distribution detection-এ revolution আনে।

The Bayesian Perspective

Traditional ML: θ* = argmin L(θ) — একটিমাত্র weight vector।

Bayesian ML: weight-এর posterior p(θ | D) compute করি — সব possible model-এর probability distribution।

p(\theta | \mathcal{D}) = \frac{p(\mathcal{D} | \theta) p(\theta)}{p(\mathcal{D})} = \frac{p(\mathcal{D} | \theta) p(\theta)}{\int p(\mathcal{D} | \theta) p(\theta) d\theta}

Prediction: posterior predictive — সব model-এর prediction-কে weight দিয়ে গড়:

p(y | x, \mathcal{D}) = \int p(y | x, \theta) p(\theta | \mathcal{D}) d\theta

Aleatoric vs Epistemic Uncertainty

Bayesian model দুই ধরনের uncertainty আলাদা করে:

Aleatoric uncertainty — data-এর intrinsic noise (irreducible)। Example: blurry image, ambiguous sentence।
Epistemic uncertainty — model-এর knowledge gap (reducible with more data)। Example: unseen class, out-of-distribution input।

Total predictive variance = aleatoric + epistemic:

\text{Var}(y|x) = \underbrace{\mathbb{E}_{p(\theta)}[\text{Var}(y|x,\theta)]}_{\text{aleatoric}} + \underbrace{\text{Var}_{p(\theta)}[\mathbb{E}[y|x,\theta]]}_{\text{epistemic}}

Approximate Bayesian Inference

Exact posterior intractable — approximation methods:

MC Dropout — training-এ dropout test time-এও চালু রাখি, multiple forward pass = approximate posterior samples।
Variational Inference — q(θ) দিয়ে posterior approximate (Bayes by Backprop)।
Laplace Approximation — posterior-কে mode-এর চারপাশে Gaussian দিয়ে approximate।
Ensembles — multiple independently trained model-এর prediction variance = epistemic uncertainty proxy।

MC Dropout: Practical Bayesian NN

Simplest approach — training-র dropout test time-এও:

pythonPython · NumPy

# Standard dropout (training only)
model.train()   # dropout ON
# ... training loop ...
model.eval()    # dropout OFF (inference)

MC Dropout — test time-এও dropout ON:

pythonPython · NumPy

model.train()  # keep dropout ON during inference!
predictions = []
for _ in range(T):              # T stochastic forward passes
    pred = model(x)
    predictions.append(pred)

mean_pred = np.mean(predictions, axis=0)    # prediction
epistemic = np.var(predictions, axis=0)     # uncertainty

Gal & Ghahramani (2016) proved: MC Dropout = approximate variational inference with a specific prior।

Bayes by Backprop

Weight-এর posterior q(w) = N(μ, σ²) — each weight has mean and variance:

\mathcal{L}(\phi) = \sum_{i=1}^N \log p(y_i | x_i, \mathbf{w}) - KL(q(\mathbf{w}) \| p(\mathbf{w}))

Gradient μ ও σ উভয়ের উপর compute হয় — weight-এর uncertainty সহ শেখা!

💡 ইনসাইট

Local reparameterization: w \sim \mathcal{N}(\mu, \sigma^2) → y = x \cdot w \sim \mathcal{N}(x\mu, x^2\sigma^2)। Variance of gradient কমে, training stable হয়।

Applications of Bayesian DL

Out-of-Distribution Detection — epistemic uncertainty বেশি হলে "I don't know" বলা যায়।
Active Learning — সবচেয়ে uncertain data point select করে labeling cost কমানো।
Medical AI — diagnosis-এ confidence interval সহ prediction (life-critical)।
Safe RL / Robotics — uncertain state-এ conservative action নেওয়া।
Model Selection — Bayesian model evidence (marginal likelihood) দিয়ে compare।

Python: MC Dropout Uncertainty

pythonPython · NumPy

import torch
import torch.nn as nn

class MCDropoutNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)          # dropout always active
        return self.fc2(x)

# Inference with uncertainty
model = MCDropoutNet()
model.train()  # CRITICAL: keep dropout ON

x_test = torch.randn(1, 784)
T = 100
preds = torch.stack([torch.softmax(model(x_test), dim=1) for _ in range(T)])

mean_pred = preds.mean(dim=0)           # average prediction
epistemic = preds.var(dim=0).sum()      # total epistemic uncertainty
entropy = -(mean_pred * torch.log(mean_pred + 1e-10)).sum()  # predictive entropy

print(f"Predicted class: {mean_pred.argmax().item()}")
print(f"Confidence: {mean_pred.max().item():.4f}")
print(f"Epistemic uncertainty: {epistemic.item():.4f}")

Challenges & Future Directions

Scalability — billion-parameter model-এ Bayesian inference still open problem।
Prior choice — p(θ) কী হবে? Influences posterior significantly।
Deep ensembles — best practical method কিন্তু K× compute cost।
Subnetwork inference — lottery ticket hypothesis-এর Bayesian version, sparse posterior।
Function-space inference — weight posterior-র বদলে function distribution directly — more natural but harder।

Practice Tasks

MC Dropout-এ T = 1 vs T = 100 — uncertainty estimate কীভাবে পাল্টায়?
Posterior predictive p(y|x, D) vs MAP prediction p(y|x, θ*) — পার্থক্য?
Uncertainty estimate দিয়ে কীভাবে OOD sample detect করবেন?
Ensemble (5 models) vs MC Dropout (100 samples) — compute vs accuracy trade-off বিশ্লেষণ করুন।

Interview Questions

Aleatoric ও epistemic uncertainty-এর পার্থক্য বলুন — উদাহরণ দিয়ে।
MC Dropout কেন Bayesian? Intuitive ব্যাখ্যা দিন।
Large language model-এ Bayesian inference কেন challenging?
Uncertainty quantification medical AI-তে কেন critical?

Summary · সারসংক্ষেপ

Bayesian DL = weight-কে distribution হিসেবে দেখা, single point estimate নয়।
Posterior predictive = সব model-এর prediction-এর weighted average।
Aleatoric (data noise) vs epistemic (knowledge gap) uncertainty — আলাদা করতে পারা crucial।
MC Dropout = সবচেয়ে practical approach, test time dropout = approximate posterior samples।
Bayes by Backprop = VI দিয়ে weight posterior directly optimize করা, each weight has uncertainty।

পূর্ববর্তী · CH 50

Variational Inference

পরবর্তী · CH 52

Math Behind Recommendation Systems