অধ্যায় 12

📖 একটি ছোট গল্প

গাড়ির speedometer-এ যে সংখ্যা দেখেন — সেটাই derivative। অবস্থান (position) সময়ের সাথে কত দ্রুত বদলাচ্ছে। AI-তে loss "অবস্থান", parameter "সময়" — derivative বলে কোন parameter একটু বদলালে loss কতটা বদলাবে।

এটিই machine learning-এর "শেখা"-র যান্ত্রিক রহস্য।

সংজ্ঞা

Function f(x)-এর derivative x পয়েন্টে:

f'(x) = lim_h→0 [f(x+h) − f(x)] / h

Geometric অর্থ: graph-এ সেই বিন্দুতে tangent line-এর slope।

Notation: f'(x), df/dx, d/dx f(x) — সব একই।

Derivative-এর নিয়ম

Basic rules

d/dx (c) = 0

d/dx (xⁿ) = n · xⁿ⁻¹

d/dx (e^x) = e^x

d/dx (ln x) = 1/x

d/dx (sin x) = cos x, d/dx (cos x) = −sin x

Combination rules

(f + g)' = f' + g'

(c · f)' = c · f'

(f · g)' = f' g + f g' (product rule)

(f / g)' = (f' g − f g') / g² (quotient rule)

(f(g(x)))' = f'(g(x)) · g'(x) (chain rule)

💡 ইনসাইট

Chain rule = backpropagation-এর হৃদয়। Chapter 15 ও 18-এ বিস্তারিত।

Higher-Order Derivatives

f''(x) = d²f/dx²

f'(x) > 0 → বাড়ছে; < 0 → কমছে।
f''(x) > 0 → curve উপরে বাঁকানো (convex)।
f''(x) < 0 → নিচে বাঁকানো (concave)।
Critical point: f'(x) = 0; second derivative দিয়ে min/max বোঝা যায়।

ML-এ ব্যবহৃত Derivatives

d/dx (sigmoid(x)) = σ(x)(1 − σ(x))

d/dx (tanh(x)) = 1 − tanh²(x)

d/dx (ReLU(x)) = 1 if x > 0 else 0

d/dx (x²) = 2x (MSE loss)

d/dx (−log(p)) = −1/p (cross-entropy)

Numerical vs Analytical Derivative

Numerical approximation (gradient checking-এ ব্যবহৃত):

f'(x) ≈ [f(x + h) − f(x − h)] / (2h) (central difference)

⚠️ সতর্কতা

Numerical derivative ধীর ও noisy — production-এ analytical (autograd) ব্যবহার হয়। কিন্তু custom layer debug করার জন্য numerical check অপরিহার্য।

Python Implementation

pythonPython · NumPy

import numpy as np
import sympy as sp

# 1) Symbolic derivative (SymPy)
x = sp.symbols('x')
f = x**3 - 2*x**2 + 5*x - 1
print(sp.diff(f, x))                  # 3x^2 - 4x + 5
print(sp.diff(sp.sin(x)*sp.exp(x), x))

# 2) Numerical derivative
def num_deriv(f, x, h=1e-5):
    return (f(x + h) - f(x - h)) / (2 * h)

print(num_deriv(np.sin, 0.0))         # ~1.0 (= cos 0)
print(num_deriv(np.exp, 1.0))         # ~e

# 3) Common ML derivatives
def sigmoid(x): return 1 / (1 + np.exp(-x))
def d_sigmoid(x): s = sigmoid(x); return s * (1 - s)

def relu(x): return np.maximum(0, x)
def d_relu(x): return (x > 0).astype(float)

xs = np.array([-2.0, -0.5, 0.0, 0.5, 2.0])
print("sigmoid'(x):", d_sigmoid(xs))
print("relu'(x):   ", d_relu(xs))

# 4) Gradient checking (numerical vs analytical)
def f(x):  return x**2
def df(x): return 2*x
x0 = 3.0
print("analytical:", df(x0), "  numerical:", num_deriv(f, x0))

# 5) PyTorch autograd (the real deal)
import torch
w = torch.tensor(2.0, requires_grad=True)
loss = (w - 5)**2
loss.backward()
print("dL/dw =", w.grad.item())       # 2*(2-5) = -6

AI/ML সংযোগ

Gradient descent: θ ← θ − lr · dL/dθ।
Backprop: chain rule দিয়ে প্রতি layer-এ derivative প্রবাহ।
Activation choice: derivative যেন vanish/explode না করে।
Loss curve diagnostics: derivative ছোট → plateau; বড় → instability।

Common Mistakes

Chain rule প্রয়োগে inner derivative ভুলে যাওয়া।
ReLU'(0)-কে undefined ভেবে atke যাওয়া — practice-এ 0 বা 1 দেওয়া হয়।
Numerical derivative-এ h খুব ছোট দিলে floating-point error।

Practice Tasks

Derivative বের করুন: f(x) = (3x + 2)⁵।
f(x) = x · e^x-এর derivative।
SymPy-তে verify করুন আপনার উত্তর।
Sigmoid derivative-এর max value কত ও কোথায়?

Assignment

MSE loss L(w) = (wx − y)²-এর জন্য dL/dw হাতে derive করুন। তারপর Python-এ gradient descent চালিয়ে x = 2, y = 6-এর জন্য w-এর শেখা দেখান (target w = 3)।

Interview Questions

Derivative-এর geometric অর্থ?
Chain rule কী, ML-এ কেন গুরুত্বপূর্ণ?
Sigmoid-এর derivative কেন vanishing gradient ঘটায়?
Numerical vs analytical derivative — কখন কোনটা?

Mini Project

"Tangent Line Visualizer" — matplotlib-এ একটি curve আঁকুন (যেমন x² − 4x + 3), এবং user input x-এ tangent line draw করুন। সাথে derivative value print করুন।

Summary · সারসংক্ষেপ

Derivative = instantaneous rate of change = tangent slope।
৫ basic + ৩ combination + 1 chain rule = সব কিছুর foundation।
ML-এ activation, loss সব function-এর derivative লাগে।
PyTorch/TF autograd analytical derivative auto-করে — কিন্তু intuition থাকতেই হবে।

✨ পরবর্তী পদক্ষেপ

Chapter 13-এ multi-variable function-এর derivative — partial derivatives, যা neural network-এর হাজার-পরিমাণ parameter-এ অপরিহার্য।