AY 2025–26
Instructor: Debasis Sengupta
Office / Department: ASU
Email: sdebasis@isical.ac.in
Marking Scheme:
Assignments: 20% | Midterm Test: 30% | End Semester: 50%
EM alternates expectation (probabilistic imputation of missing data) and maximization (optimization of a tractable surrogate). The ascent property follows from decomposing the observed log-likelihood into expected complete-data terms minus a penalty; M-step raises the expected term while Jensen/KL ensures the penalty cannot overturn the gain. EM is robust and stable, but can be slow and sensitive to starting values—so use sensible initialization and diagnostics.
We start with observed data counts. The EM trick is to imagine hidden complete data, so the optimization becomes much simpler.
We have 197 animals split into 4 categories:
| Group | Count (yi) | Probability |
|---|---|---|
| y1 | 125 | 1/2 + π/4 |
| y2 | 18 | (1-π)/4 |
| y3 | 20 | (1-π)/4 |
| y4 | 34 | π/4 |
So the multinomial probabilities depend on one parameter π. But the first probability y1 is a sum of two probabilities: 1/2 and π/4. That’s the clue for EM.
| Complete Data | Probabilities |
|---|---|
| x1 | 1/2 |
| x2 | π/4 |
| x3 | (1-π)/4 |
| x4 | (1-π)/4 |
| x5 | π/4 |
L(π) = x1 ln(1/2) + x2 ln(π/4) + (x3+x4) ln((1-π)/4) + x5 ln(π/4)
This is simple to maximize if we had x2. But we don’t. That’s where the E-step comes in.
x2(n) = y1 · (π(n)/4) / (1/2 + π(n)/4)
π(n+1) = (x2(n) + x5) / (x2(n) + x3 + x4 + x5)
Substitute observed counts:
π(n+1) = (x2(n) + y4) / (x2(n) + y2 + y3 + y4)
And iterate until convergence.
If you lump two multinomial cells, then condition on their sum, one component is Binomial. This is why x2|(x1+x2) became Binomial.
If Yi|Xi ~ (1/2) λ exp(-λ |yi-(α+βxi)|), this leads to Least Absolute Deviations (LAD) regression.
This genetics example is a template: split complicated observed categories into hidden simple ones, use conditional distributions to impute missing parts, optimize on the completed dataset, and repeat. EM alternates estimation and maximization, guaranteeing monotone ascent in likelihood.
These notes are based on the report by Nan M. Laird. The full document can be accessed here: