অধ্যায় 52 — Math Behind Recommendation Systems

📖 একটি ছোট গল্প

Netflix-এর homepage আপনার জন্য অন্যরকম, আমার জন্য অন্যরকম। কীভাবে বোঝে কোন movie আমার ভালো লাগবে? পেছনে কোনো magic নেই — matrix factorization, embeddings, এবং similarity। Recommendation system-এর গণিতই হলো আধুনিক digital economy-এর চালিকা শক্তি।

Intuitive Explanation

Recommendation = "আমি কী পছন্দ করি" + "আমার মতো লোকেরা কী পছন্দ করে"। দুটি approach আছে:

Collaborative Filtering — user behavior থেকে শেখা (আমার মতো লোকেরা কী দেখেছে)।
Content-Based — item feature থেকে শেখা (এই movie-এর genre আমার পছন্দ)।

Modern system দুটোই combine করে — hybrid approach (YouTube, Netflix, Spotify)।

Matrix Factorization

User-item rating matrix R ∈ ℝ^(U×I) — অনেক entry missing (user সব দেখেনি)।

Factorize into low-rank matrices:

R \approx U V^T, \quad U \in \mathbb{R}^{U \times k}, \; V \in \mathbb{R}^{I \times k}

Each user = k-dimensional latent vector, each item = k-dimensional latent vector।

Prediction: \hat{r}_{ui} = u_u^T v_i = \sum_{f=1}^k u_{uf} \cdot v_{if}

💡 ইনসাইট

Latent factor = interpretable হতে পারে (e.g., genre, mood, actor preference) কিন্তু অবশ্যই হওয়া দরকার না। Matrix factorization自动 শিখে নেয়।

SVD & Recommendation

Singular Value Decomposition (CH 9):

R = Q \Sigma P^T

Top-k singular values রাখলে low-rank approximation — Latent Semantic Analysis এর মূল idea।

Problem: SVD missing values handle করে না। Solution: Alternating Least Squares (ALS) — U fix করে V optimize, তারপর V fix করে U optimize, repeat।

Loss Functions for RecSys

Mean Squared Error (explicit ratings)

L = \sum_{(u,i) \in \Omega} (r_{ui} - u_u^T v_i)^2

Ω = observed ratings set।

Binary Cross Entropy (implicit feedback)

Click/view = positive, no interaction = negative (sampling)।

L = -\sum_{(u,i) \in \Omega^+} \log \sigma(u_u^T v_i) - \sum_{(u,j) \in \Omega^-} \log(1 - \sigma(u_u^T v_j))

Negative Sampling

Implicit feedback-এ "negative" examples অনেক বেশি — সব দেখে train করা slow।

Negative sampling: প্রতিটি positive-এর জন্য Kটি random negative sample নেওয়া:

L = -\log \sigma(u_u^T v_i) - \sum_{j=1}^K \mathbb{E}_{j \sim P_n} [\log(1 - \sigma(u_u^T v_j))]

Popular item-এর probability বেশি sample হতে — P_n(j) ∝ f_j^α (word2vec-এর মতো)।

Deep Learning for RecSys

Neural Collaborative Filtering (NCF) — GMF (dot product) + MLP (nonlinear interaction) combine।
Wide & Deep — wide part memorization (cross-product features), deep part generalization (embeddings)।
DeepFM — Factorization Machine + Deep Network, automatic feature interaction capture করে।
Two-Tower Model — user tower ও item tower আলাদা, candidate generation-এ fast ANN search।
Sequential Recommendation — RNN/Transformer দিয়ে user interaction sequence model (SASRec, BERT4Rec)।

Python: Matrix Factorization

pythonPython · NumPy

import numpy as np

# Tiny MF: factorize a 4x5 rating matrix
R = np.array([
    [5, 3, 0, 1, 0],
    [4, 0, 0, 1, 0],
    [1, 1, 0, 5, 0],
    [0, 0, 5, 4, 1],
], dtype=float)

n_users, n_items = R.shape
k = 2                          # latent factors
U = np.random.randn(n_users, k) * 0.01
V = np.random.randn(n_items, k) * 0.01

# ALS training
for epoch in range(1000):
    # Update U (fix V)
    for u in range(n_users):
        rated = R[u] > 0
        if rated.sum() == 0: continue
        A = V[rated].T @ V[rated] + 0.1 * np.eye(k)  # regularized
        b = V[rated].T @ R[u, rated]
        U[u] = np.linalg.solve(A, b)
    
    # Update V (fix U)
    for i in range(n_items):
        rated = R[:, i] > 0
        if rated.sum() == 0: continue
        A = U[rated].T @ U[rated] + 0.1 * np.eye(k)
        b = U[rated].T @ R[rated, i]
        V[i] = np.linalg.solve(A, b)

# Predict missing entry
pred = U @ V.T
print("Predicted R[0,2]:", pred[0, 2])  # was 0, now predicted

Evaluation Metrics

RMSE — rating prediction accuracy (explicit)।
Precision@K / Recall@K — top-K recommendation quality।
NDCG@K — Normalized Discounted Cumulative Gain, rank-aware measure।
MAP (Mean Average Precision) — overall ranking quality।
Hit Rate — test item কতবার top-K-এ আসে।

Practice Tasks

উপরের ALS code-এ k = 3 দিয়ে চালান — prediction কতটা পাল্টায়?
Regularization (0.1) সরিয়ে দিন — overfitting হয় কি?
Negative sampling-এ K = 1 vs K = 5 — convergence speed তুলনা করুন।
MovieLens dataset-এ SVD vs MF vs NCF compare করুন (surprise library)।

Interview Questions

Collaborative filtering-এ cold start problem কী? সমাধান?
Matrix factorization-এ k (latent dimension) কীভাবে choose করবেন?
Implicit vs explicit feedback — loss function পার্থক্য?
Two-tower model-এ ANN search কেন দরকার? ANN =?
Sequential recommendation-এ Transformer কেন RNN-এর চেয়ে ভালো?

Summary · সারসংক্ষেপ

Recommendation = collaborative + content-based + hybrid approach।
Matrix factorization R ≈ UVᵀ = user ও item latent vectors-এর dot product।
ALS = alternating optimize U and V, SVD = closed-form but missing values handle করে না।
Implicit feedback-এ negative sampling + BCE loss, explicit-এ MSE/RMSE।
Deep learning: NCF, Wide&Deep, DeepFM, Two-Tower, Sequential — modern systems combine multiple approaches।

পূর্ববর্তী · CH 51

Bayesian Deep Learning

পরবর্তী · CH 53

Math Behind Computer Vision