Backpropagation-এ ∂L/∂W বের করতে হয়, কিন্তু W একটি matrix। Matrix derivative কী shape নেবে? ∂vec(W) নাকি ∂W? Matrix calculus এই confusion দূর করে — AI paper-এর equation বোঝার চাবিকাঠি।
Scalar by Scalar
সবচেয়ে simple:
Chain rule: \frac{\partial f}{\partial x} = \frac{\partial f}{\partial g} \cdot \frac{\partial g}{\partial x}
Vector by Scalar
f(x) যেখানে x ∈ ℝ আর f ∈ ℝⁿ:
Result = column vector (Jacobian-এর একটি column)।
Scalar by Vector (Gradient)
Loss L(θ) যেখানে θ ∈ ℝᵈ:
Result = column vector (gradient vector)। যে direction-এ loss বাড়ে।
Vector by Vector (Jacobian)
f: ℝⁿ → ℝᵐ:
Backprop-এ layer-এর Jacobian দিয়ে gradient chain করে — \delta^{(\ell)} = J^T \delta^{(\ell+1)}।
Matrix Derivatives
সবচেয়ে Common Case: ∂L/∂W
z = Wx + b, L = L(z):
Shape check: δ ∈ ℝ^(m×1), xᵀ ∈ ℝ^(1×d) → outer product gives m×d = W shape।
Shape check: Wᵀ ∈ ℝ^(d×m), δ ∈ ℝᵐ → d-dimensional vector (matches x)।
Chain Rule for Matrices
Matrix version chain rule:
Key insight: Trace trick ব্যবহার করলে সবসময় scalar output পাওয়া যায় — shape ambiguity দূর হয়।
Python: Manual Backprop with Matrix Calculus
import numpy as np
# Forward: z = W @ x + b, y = relu(z), L = ||y - t||^2
W = np.array([[1.0, 2.0], [3.0, 4.0]]) # 2x2
x = np.array([[1.0], [2.0]]) # 2x1
b = np.array([[0.5], [0.5]]) # 2x1
t = np.array([[3.0], [7.0]]) # target
# Forward
z = W @ x + b
y = np.maximum(0, z) # ReLU
L = np.sum((y - t)**2)
# Backward using matrix calculus rules
dL_dy = 2 * (y - t) # scalar by vector
dy_dz = np.diag((z > 0).flatten()) # ReLU Jacobian
dL_dz = dy_dz @ dL_dy # chain rule
dL_dW = dL_dz @ x.T # outer product
dL_db = dL_dz # same shape as b
dL_dx = W.T @ dL_dz # chain rule for input
print("dL/dW shape:", dL_dW.shape) # (2, 2) = W shape
print("dL/db shape:", dL_db.shape) # (2, 1) = b shape
print("dL/dx shape:", dL_dx.shape) # (2, 1) = x shapeCommon Identities
- \partial(Ax)/\partial x = A (linear)।
- \partial(x^T A x)/\partial x = (A + A^T)x (quadratic)।
- \partial \text{tr}(AB)/\partial A = B^T (trace)।
- \partial \det(A)/\partial A = \det(A) \cdot (A^{-1})^T।
- \partial \log \det(A)/\partial A = A^{-T} (covariance matrix optimization-এ)।
Practice Tasks
- f(x) = xᵀAx + bᵀx + c এর gradient বের করুন।
- L = ||Wx − t||² হলে ∂L/∂W দেখান।
- Sigmoid layer σ(Wx)-এর Jacobian shape কত?
- Batch matrix multiplication-এ ∂L/∂W কী shape?
Interview Questions
- Backprop-এ ∂L/∂W কেন outer product হয়?
- Trace trick কী এবং কেন useful?
- Hessian matrix-এর shape কত? Eigenvalues কী বলে?
- Vector-Jacobian product (VJP) কী এবং কেন automatic differentiation-এ efficient?
Summary · সারসংক্ষেপ
- Matrix calculus = scalar/vector/matrix সব shape-এর derivative নিয়ম।
- Gradient ∇_θ L = column vector, Jacobian = matrix of partials।
- ∂L/∂W = δxᵀ (outer product) — backprop-এর ভিত্তি।
- Trace trick shape ambiguity দূর করে — research paper পড়ার সময় কাজে লাগে।
- Common identities (linear, quadratic, trace, determinant) মুখস্থ না — derivation pattern শিখুন।