অধ্যায় 6 — Matrices ও Operations

📖 একটি ছোট গল্প

একটি neural network layer যখন 512-মাত্রার input নিয়ে 1024-মাত্রার output বানায়, তখন ভেতরে কী হয়? একটি matrix multiplication — y = Wx + b। GPU-র ৯৯% সময় এই একটি operation-ই চালায়। Matrix না বুঝে modern AI বোঝা অসম্ভব।

Matrix কী?

Matrix = সংখ্যার rectangular table। Notation: A ∈ ℝ^m×n — m rows, n columns।

A = [[1, 2, 3], [4, 5, 6]] A ∈ ℝ²ˣ³

A_ij = i-তম row, j-তম column-এর element।
Row vector = (1, n) matrix, Column vector = (n, 1) matrix।

Addition, Scalar Mul, Transpose

Addition (same shape)

(A + B)_ij = A_ij + B_ij

Scalar multiplication

(cA)_ij = c · A_ij

Transpose

(A^T)_ij = A_ji shape (m,n) → (n,m)

Matrix Multiplication — হৃদয়

A ∈ ℝ^m×k, B ∈ ℝ^k×n → AB ∈ ℝ^m×n। "ভেতরের" dimension মিলতে হবে।

(AB)_ij = Σ_k A_ik · B_kj

প্রতিটি element = A-র i-তম row ও B-র j-তম column-এর dot product।

⚠️ সতর্কতা

AB ≠ BA — matrix multiplication commutative নয়। order বদলালে সম্পূর্ণ ভিন্ন output (বা invalid shape)।

উদাহরণ

[[1, 2], [[5, 6], [[19, 22], [3, 4]] · [7, 8]] = [43, 50]]

বিশেষ Matrix

Identity (I): diagonal-এ 1, বাকি 0। AI = IA = A।
Diagonal: শুধু diagonal-এ non-zero। দ্রুত গুণ হয়।
Symmetric: A = A^T। Covariance matrix এর উদাহরণ।
Orthogonal: A^TA = I। Rotation matrix।
Sparse: বেশিরভাগ element 0। Large-scale ML-এ memory-saver।

Inverse & Determinant

Square matrix-এর "উল্টো": A · A⁻¹ = I। সবসময় থাকে না।

det(A) = 0 মানে inverse নেই (singular)।

2x2-এর জন্য: det([[a,b],[c,d]]) = ad − bc

✨ টিপ

ML-এ direct inverse খুব কম দরকার হয় (numerically unstable) — pseudo-inverse, SVD, বা iterative solver ব্যবহার করি।

Broadcasting (NumPy/PyTorch)

ভিন্ন shape-এর tensor automatic alignment। ML code পড়তে গেলে অপরিহার্য।

pythonPython · NumPy

import numpy as np
A = np.ones((3, 4))         # (3, 4)
b = np.array([1, 2, 3, 4])  # (4,)
print((A + b).shape)         # (3, 4) — b broadcasts to each row

Python (NumPy) Implementation

pythonPython · NumPy

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print(A + B)             # element-wise
print(A * B)             # element-wise (Hadamard), NOT matmul
print(A @ B)             # matrix multiplication
print(np.matmul(A, B))   # same
print(A.T)               # transpose

# Identity & inverse
I = np.eye(2)
print(A @ I)             # == A
print(np.linalg.inv(A))  # inverse
print(np.linalg.det(A))  # determinant

# A tiny linear layer: y = W x + b
np.random.seed(0)
x = np.random.randn(4)           # input dim 4
W = np.random.randn(3, 4)        # weight: out=3, in=4
b = np.random.randn(3)           # bias

y = W @ x + b
print("y =", y, "shape=", y.shape)   # (3,)

# Batched version (B inputs at once)
X = np.random.randn(8, 4)        # batch of 8
Y = X @ W.T + b                  # shape (8, 3)
print(Y.shape)

AI/ML সংযোগ

Dense / Linear layer: Y = XW^T + b।
Convolution: ভেতরে matrix multiplication হিসেবে implement হয় (im2col)।
Attention: softmax(QK^T/√d) V — পরপর ৩টি matmul।
Embedding lookup: একটি one-hot vector × embedding matrix।
Batch processing: সব sample একসাথে — একটি বড় matmul।

Common Mistakes

A * B (element-wise) ও A @ B (matmul) গুলিয়ে ফেলা।
Inner dimension মিলছে কিনা — সবসময় (m,k)(k,n) check করুন।
AB ≠ BA ভুলে যাওয়া।
Transpose ছাড়া X @ W — shape mismatch error।

Practice Tasks

হাতে calculate করুন: [[2,0],[1,3]] · [[1,4],[2,1]]।
A ∈ ℝ³ˣ⁴, B ∈ ℝ⁴ˣ² — AB-এর shape কী?
NumPy-তে A @ A⁻¹ বানিয়ে দেখুন কি Identity-র কাছাকাছি আসে।

Assignment

NumPy দিয়ে scratch থেকে একটি 2-layer fully connected network-এর forward pass লিখুন: input 4 → hidden 8 (ReLU) → output 3 (softmax)। Random weight ব্যবহার করুন। batch size 16-এর জন্য চালান এবং shape track করুন।

Interview Questions

A * B ও A @ B-এর পার্থক্য?
Matrix multiplication কেন commutative নয়?
Orthogonal matrix কী, ML-এ কোথায় কাজে আসে?
Inverse না নিয়ে কেন pseudo-inverse বা SVD ব্যবহার করি?

Mini Project

"Image Transformer" — একটি grayscale image (NumPy) নিয়ে 2D rotation matrix দিয়ে গুণ করুন এবং rotate-করা image plot করুন। (পরিচয় ঘটবে — linear transformation কীভাবে geometry বদলায়, যা Chapter 8-এর foundation।)

Summary · সারসংক্ষেপ

Matrix = সংখ্যার 2-D grid; ML-এ data ও weight সবই matrix।
Matrix multiplication = ML-এর সবচেয়ে compute-heavy operation।
Inner dimension মিলতে হবে; AB ≠ BA।
Linear layer, attention, embedding — সব matmul।

✨ পরবর্তী পদক্ষেপ

Chapter 7-এ আমরা matrix-এর "বিশেষ দিক" দেখব — Eigenvalues ও Eigenvectors, যা PCA, PageRank, stability analysis-এর ভিত্তি।

পূর্ববর্তী · CH 5

Vector Operations

পরবর্তী · CH 7

Eigenvalues & Eigenvectors