১০০ model train করলাম, কিন্তু কোনটার weight সবচেয়ে reliable? Overconfident prediction-এ কীভাবে sure হবো? Bayesian Deep Learning weight-কে fixed point না, probability distribution হিসেবে দেখে — uncertainty estimation, robust prediction, এবং out-of-distribution detection-এ revolution আনে।
The Bayesian Perspective
Traditional ML: θ* = argmin L(θ) — একটিমাত্র weight vector।
Bayesian ML: weight-এর posterior p(θ | D) compute করি — সব possible model-এর probability distribution।
Prediction: posterior predictive — সব model-এর prediction-কে weight দিয়ে গড়:
Aleatoric vs Epistemic Uncertainty
Bayesian model দুই ধরনের uncertainty আলাদা করে:
- Aleatoric uncertainty — data-এর intrinsic noise (irreducible)। Example: blurry image, ambiguous sentence।
- Epistemic uncertainty — model-এর knowledge gap (reducible with more data)। Example: unseen class, out-of-distribution input।
Total predictive variance = aleatoric + epistemic:
Approximate Bayesian Inference
Exact posterior intractable — approximation methods:
- MC Dropout — training-এ dropout test time-এও চালু রাখি, multiple forward pass = approximate posterior samples।
- Variational Inference — q(θ) দিয়ে posterior approximate (Bayes by Backprop)।
- Laplace Approximation — posterior-কে mode-এর চারপাশে Gaussian দিয়ে approximate।
- Ensembles — multiple independently trained model-এর prediction variance = epistemic uncertainty proxy।
MC Dropout: Practical Bayesian NN
Simplest approach — training-র dropout test time-এও:
# Standard dropout (training only)
model.train() # dropout ON
# ... training loop ...
model.eval() # dropout OFF (inference)MC Dropout — test time-এও dropout ON:
model.train() # keep dropout ON during inference!
predictions = []
for _ in range(T): # T stochastic forward passes
pred = model(x)
predictions.append(pred)
mean_pred = np.mean(predictions, axis=0) # prediction
epistemic = np.var(predictions, axis=0) # uncertaintyGal & Ghahramani (2016) proved: MC Dropout = approximate variational inference with a specific prior।
Bayes by Backprop
Weight-এর posterior q(w) = N(μ, σ²) — each weight has mean and variance:
Gradient μ ও σ উভয়ের উপর compute হয় — weight-এর uncertainty সহ শেখা!
Applications of Bayesian DL
- Out-of-Distribution Detection — epistemic uncertainty বেশি হলে "I don't know" বলা যায়।
- Active Learning — সবচেয়ে uncertain data point select করে labeling cost কমানো।
- Medical AI — diagnosis-এ confidence interval সহ prediction (life-critical)।
- Safe RL / Robotics — uncertain state-এ conservative action নেওয়া।
- Model Selection — Bayesian model evidence (marginal likelihood) দিয়ে compare।
Python: MC Dropout Uncertainty
import torch
import torch.nn as nn
class MCDropoutNet(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 10)
self.dropout = nn.Dropout(0.5)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = self.dropout(x) # dropout always active
return self.fc2(x)
# Inference with uncertainty
model = MCDropoutNet()
model.train() # CRITICAL: keep dropout ON
x_test = torch.randn(1, 784)
T = 100
preds = torch.stack([torch.softmax(model(x_test), dim=1) for _ in range(T)])
mean_pred = preds.mean(dim=0) # average prediction
epistemic = preds.var(dim=0).sum() # total epistemic uncertainty
entropy = -(mean_pred * torch.log(mean_pred + 1e-10)).sum() # predictive entropy
print(f"Predicted class: {mean_pred.argmax().item()}")
print(f"Confidence: {mean_pred.max().item():.4f}")
print(f"Epistemic uncertainty: {epistemic.item():.4f}")Challenges & Future Directions
- Scalability — billion-parameter model-এ Bayesian inference still open problem।
- Prior choice — p(θ) কী হবে? Influences posterior significantly।
- Deep ensembles — best practical method কিন্তু K× compute cost।
- Subnetwork inference — lottery ticket hypothesis-এর Bayesian version, sparse posterior।
- Function-space inference — weight posterior-র বদলে function distribution directly — more natural but harder।
Practice Tasks
- MC Dropout-এ T = 1 vs T = 100 — uncertainty estimate কীভাবে পাল্টায়?
- Posterior predictive p(y|x, D) vs MAP prediction p(y|x, θ*) — পার্থক্য?
- Uncertainty estimate দিয়ে কীভাবে OOD sample detect করবেন?
- Ensemble (5 models) vs MC Dropout (100 samples) — compute vs accuracy trade-off বিশ্লেষণ করুন।
Interview Questions
- Aleatoric ও epistemic uncertainty-এর পার্থক্য বলুন — উদাহরণ দিয়ে।
- MC Dropout কেন Bayesian? Intuitive ব্যাখ্যা দিন।
- Large language model-এ Bayesian inference কেন challenging?
- Uncertainty quantification medical AI-তে কেন critical?
Summary · সারসংক্ষেপ
- Bayesian DL = weight-কে distribution হিসেবে দেখা, single point estimate নয়।
- Posterior predictive = সব model-এর prediction-এর weighted average।
- Aleatoric (data noise) vs epistemic (knowledge gap) uncertainty — আলাদা করতে পারা crucial।
- MC Dropout = সবচেয়ে practical approach, test time dropout = approximate posterior samples।
- Bayes by Backprop = VI দিয়ে weight posterior directly optimize করা, each weight has uncertainty।