AY 2025–26
Instructor: Debasis Sengupta
Office / Department: ASU
Email: sdebasis@isical.ac.in
Marking Scheme:
Assignments: 20% | Midterm Test: 30% | End Semester: 50%
We have counts \(Y_1,\dots,Y_4\) (multinomial given total \(n=\sum_i Y_i\)) with cell probabilities
\[ \pi_1=\tfrac12+\tfrac{\pi}{4},\qquad \pi_2=\pi_3=\tfrac{1-\pi}{4},\qquad \pi_4=\tfrac{\pi}{4}, \] where \(\pi\in(0,1)\) is unknown. Reparametrise \(\pi=(1-\theta)^2\) and derive the Fisher–scoring iterative update for the MLE of \(\pi\) (i.e. the update for \(\theta\)).
Put \(\pi=(1-\theta)^2\) and define
\(a:=\dfrac{1-\theta}{2}\quad(\Rightarrow a^2=\dfrac{(1-\theta)^2}{4}).\)
Then the cell probabilities as functions of \(\theta\) become (useful algebraic forms)
For total \(n=\sum_i Y_i\),
\(\ell(\theta)=\sum_{i=1}^4 Y_i\log p_i(\theta)+\text{const}.\)
Differentiate \(p_i\) w.r.t. \(\theta\). Using \(a=(1-\theta)/2\) and \(da/d\theta=-1/2\),
Hence the score \(S(\theta)=\dfrac{d\ell}{d\theta}\) is
\[ \begin{aligned} S(\theta) &=\sum_{i=1}^4 Y_i\frac{p_i'(\theta)}{p_i(\theta)} = -a\frac{Y_1}{p_1}+a\frac{Y_2}{p_2}+a\frac{Y_3}{p_3}-a\frac{Y_4}{p_4}\\ &= a\Big(\frac{Y_2+Y_3}{p_2}-\frac{Y_1}{p_1}-\frac{Y_4}{p_4}\Big). \end{aligned} \]
For a multinomial with probabilities \(p_i(\theta)\),
\(I(\theta)=n\sum_{i=1}^4\frac{(p_i'(\theta))^2}{p_i(\theta)}.\)
Because \((p_i')^2=a^2\) for every \(i\),
\(I(\theta)=n\,a^2\Big(\frac{1}{p_1}+\frac{2}{p_2}+\frac{1}{p_4}\Big).\)
Fisher scoring updates by
\(\displaystyle \theta^{(t+1)}=\theta^{(t)}+\frac{S(\theta^{(t)})}{I(\theta^{(t)})}.\)
Substitute \(S\) and \(I\). Cancelling common factors gives a convenient expression in terms of proportions \(r_i=Y_i/n\):
\[ \boxed{\,\theta^{(t+1)}=\theta^{(t)}+\frac{1}{a}\cdot \frac{\dfrac{r_2+r_3}{p_2}-\dfrac{r_1}{p_1}-\dfrac{r_4}{p_4}} {\dfrac{1}{p_1}+\dfrac{2}{p_2}+\dfrac{1}{p_4}}\,,} \]
where all \(p_j\) and \(a\) are evaluated at \(\theta^{(t)}\):
This boxed formula is the Fisher–scoring iterative step for \(\theta\). After convergence \(\hat\theta\) gives \(\hat\pi=(1-\hat\theta)^2\).
Short remarks and important connections.
Practical plots to inspect and debug the algorithm numerically.
Additional practical advice, edge-cases, and numerical tips.
A complete-data formulation for the problem in Q1 was given in class so that the EM algorithm can be used. Starting from that, obtain a single iterative step (merge the E-step and M-step) which updates the estimate of \(\pi\).
Write the cell probabilities as linear functions of \(\pi\):
\(p_i(\pi)=a_i+b_i\pi,\qquad i=1,\dots,4,\)
Let \(\widetilde Y_i^{(t)}\) denote the expected complete counts computed at the E-step under \(\pi^{(t)}\). Write \(\tilde r_i^{(t)}=\widetilde Y_i^{(t)}/n\) if you prefer proportions.
The EM M-step maximises
\(Q(\pi\mid\pi^{(t)})=\sum_{i=1}^4 \widetilde Y_i^{(t)}\log\big(a_i+b_i\pi\big) + \text{const (w.r.t. }\pi).\)
Differentiate to get
\(\dfrac{\partial Q}{\partial\pi}(\pi\mid\pi^{(t)})=\sum_{i=1}^4 \widetilde Y_i^{(t)}\dfrac{b_i}{a_i+b_i\pi}.\)
Solving \(\partial Q/\partial\pi=0\) for \(\pi\) exactly gives the M-step but is implicit. A practical and standard merged one-line update is obtained by taking a single Newton step for this equation using \(\widetilde Y_i^{(t)}\).
The second derivative is
\(\dfrac{\partial^2 Q}{\partial\pi^2}(\pi\mid\pi^{(t)})=-\sum_{i=1}^4 \widetilde Y_i^{(t)}\dfrac{b_i^2}{(a_i+b_i\pi)^2}.\)
Therefore a Newton update applied to \(Q\) gives the single-step merged iteration:
\[ \boxed{\; \pi^{(t+1)} \;=\; \pi^{(t)} \;+\; \frac{\displaystyle \sum_{i=1}^4 \widetilde Y_i^{(t)}\dfrac{b_i}{a_i+b_i\pi^{(t)}}} {\displaystyle \sum_{i=1}^4 \widetilde Y_i^{(t)}\dfrac{b_i^2}{(a_i+b_i\pi^{(t)})^2}} \; }. \]
Plugging the \(a_i,b_i\) values and factoring constants yields the equivalent explicit form
\[ \boxed{\; \pi^{(t+1)}=\pi^{(t)} + \dfrac{\displaystyle \frac{\widetilde Y_1^{(t)}}{p_1(\pi^{(t)})}-\frac{\widetilde Y_2^{(t)}}{p_2(\pi^{(t)})}-\frac{\widetilde Y_3^{(t)}}{p_3(\pi^{(t)})}+\frac{\widetilde Y_4^{(t)}}{p_4(\pi^{(t)})} } {\displaystyle \frac{\widetilde Y_1^{(t)}}{p_1(\pi^{(t)})^2}+\frac{\widetilde Y_2^{(t)}}{p_2(\pi^{(t)})^2} +\frac{\widetilde Y_3^{(t)}}{p_3(\pi^{(t)})^2}+\frac{\widetilde Y_4^{(t)}}{p_4(\pi^{(t)})^2} }\; }, \]
where \(p_i(\pi^{(t)})=a_i+b_i\pi^{(t)}\). If data are complete then \(\widetilde Y_i^{(t)}=Y_i\).
Let \(Y_1,Y_2,Y_3,Y_4\) be counts in four genetic categories. Conditionally on the total \(N=Y_1+Y_2+Y_3+Y_4\) the vector has a multinomial distribution with cell probabilities
\[ p_1(\pi)=\tfrac12+\tfrac{\pi}{4},\qquad p_2(\pi)=p_3(\pi)=\tfrac{1-\pi}{4},\qquad p_4(\pi)=\tfrac{\pi}{4}, \]
where the unknown parameter \(\pi\in(0,1)\). Viewing this as an incomplete-data / latent-allocation problem, the EM algorithm applies. The question: obtain a closed form expression for the value of \(\pi\) attained at the convergence of the EM algorithm.
\[ \ell(\pi)=y_1\log\!\Big(\tfrac12+\tfrac{\pi}{4}\Big)+(y_2+y_3)\log\!\Big(\tfrac{1-\pi}{4}\Big)+y_4\log\!\Big(\tfrac{\pi}{4}\Big). \]
\[ \frac{n_1}{2+\pi}-\frac{n_2}{1-\pi}+\frac{n_4}{\pi}=0. \]
\[ n_1\pi(1-\pi)+n_4(1-\pi)(2+\pi)-n_2\pi(2+\pi)=0. \]
\[ - N \pi^2 + (n_1 - n_4 - 2 n_2)\pi + 2 n_4 = 0, \]
and multiplying by \(-1\) yields the standard form\[ N\pi^2 + (-n_1 + n_4 + 2 n_2)\pi - 2 n_4 = 0. \]
\[ \boxed{ \displaystyle \pi \;=\; \frac{\,n_1 - n_4 - 2 n_2 \;\pm\; \sqrt{( -n_1 + n_4 + 2 n_2)^2 + 8 N n_4}\,}{2N} }. \]
(The discriminant simplifies as \(( -n_1 + n_4 + 2 n_2)^2 - 4\cdot N\cdot(-2 n_4) = ( -n_1 + n_4 + 2 n_2)^2 + 8 N n_4\).)
Conclusion / boxed result (explicit):
\[ \pi_{\star} \;=\; \frac{n_1 - n_4 - 2 n_2 \;\pm\; \sqrt{( -n_1 + n_4 + 2 n_2)^2 + 8 N n_4}}{2N}, \]
with the sign chosen so that \(0<\pi_\star<1\). This closed-form gives the stationary value that the EM fixed-point must satisfy (hence the EM limit).
Consider paired data \((X_i,Y_i),\ i=1,\dots,n\), where the pairs are independent, the marginal distribution of the \(X_i\) is unspecified, and the conditional distribution of \(Y_i\) given \(X_i=x_i\) is
\[ Y_i\mid X_i=x_i \sim N(\alpha+\beta x_i,\ \sigma^2). \]
Show that the maximum likelihood estimators (MLEs) of \(\alpha\) and \(\beta\) coincide with the ordinary least squares estimators for the linear regression model \(Y=\alpha+\beta X+\varepsilon\).
\[ L(\alpha,\beta,\sigma^2)=\prod_{i=1}^n f_{Y\mid X}(y_i\mid x_i) =\prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}}\exp\!\Big(-\frac{(y_i-(\alpha+\beta x_i))^2}{2\sigma^2}\Big). \]
Maximizing this over \(\alpha,\beta\) (for fixed \(\sigma^2\)) is equivalent to minimizing the sum of squared residuals
\[ Q(\alpha,\beta)=\sum_{i=1}^n\big(y_i-(\alpha+\beta x_i)\big)^2. \]
Hence the MLEs of \(\alpha,\beta\) are the least-squares estimators.
\[ \ell(\alpha,\beta,\sigma^2) = -\frac{n}{2}\log(2\pi\sigma^2)-\frac{1}{2\sigma^2}\sum_{i=1}^n\big(y_i-(\alpha+\beta x_i)\big)^2. \]
\[ \frac{\partial Q}{\partial \alpha} = -2\sum_{i=1}^n\big(y_i-(\alpha+\beta x_i)\big)=0, \]
\[ \frac{\partial Q}{\partial \beta} = -2\sum_{i=1}^n x_i\big(y_i-(\alpha+\beta x_i)\big)=0. \]
\[ n\alpha + \beta\sum_{i=1}^n x_i = \sum_{i=1}^n y_i, \]
\[ \alpha\sum_{i=1}^n x_i + \beta\sum_{i=1}^n x_i^2 = \sum_{i=1}^n x_i y_i. \]
\[ S_{xx}=\sum_{i=1}^n (x_i-\bar x)^2=\sum_i x_i^2 - n\bar x^2,\qquad S_{xy}=\sum_{i=1}^n (x_i-\bar x)(y_i-\bar y)=\sum_i x_i y_i - n\bar x\bar y. \]
\[ \boxed{\displaystyle \hat\beta \;=\; \frac{S_{xy}}{S_{xx}}, \qquad \hat\alpha \;=\; \bar y - \hat\beta\,\bar x.} \]
\[ \boxed{\displaystyle \hat\sigma^2_{\text{MLE}} \;=\; \frac{1}{n}\sum_{i=1}^n \big(y_i - \hat\alpha - \hat\beta x_i\big)^2.} \]
Note: this ML estimator uses division by \(n\); the unbiased estimator uses division by \(n-2\).
Thus the MLEs of \(\alpha\) and \(\beta\) coincide exactly with the least-squares estimators.
\[ \operatorname{Var}(\hat\beta)=\frac{\sigma^2}{S_{xx}},\qquad \operatorname{Var}(\hat\alpha)=\sigma^2\Big(\frac{1}{n}+\frac{\bar x^2}{S_{xx}}\Big). \]
In the regression model
\[ Y_i \mid X_i = x_i \;\sim\; N(\alpha + \beta x_i,\ \sigma^2), \quad i=1,\dots,n, \]
we already showed that the MLEs of \(\alpha\) and \(\beta\) are the least-squares estimators. Now: What is the MLE of \(\sigma^2\)? Is this MLE unbiased?
\[ L(\alpha,\beta,\sigma^2) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \exp\!\left(-\frac{(y_i-(\alpha+\beta x_i))^2}{2\sigma^2}\right). \]
\[ \ell(\alpha,\beta,\sigma^2) = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log\sigma^2 - \frac{1}{2\sigma^2}\sum_{i=1}^n (y_i - \alpha - \beta x_i)^2. \]
\[ \frac{\partial \ell}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{\mathrm{RSS}}{2\sigma^4}. \]
\[ -\frac{n}{2\sigma^2} + \frac{\mathrm{RSS}}{2\sigma^4} = 0 \ \Longrightarrow\ -n\sigma^2 + \mathrm{RSS}=0, \]
hence
\[ \boxed{\displaystyle \hat\sigma^2_{\mathrm{MLE}} = \frac{\mathrm{RSS}}{n}.} \]
\[ E[\hat\sigma^2_{\mathrm{MLE}}] = E\!\left[\frac{\mathrm{RSS}}{n}\right] = \frac{n-2}{n}\,\sigma^2. \]
\[ \boxed{\displaystyle \tilde\sigma^2 = \frac{\mathrm{RSS}}{n-2},} \]
which satisfies \(E[\tilde\sigma^2]=\sigma^2\).Consider paired data \((X_i,Y_i),\ i=1,\dots,n\), independent, the marginal distribution of the \(X_i\) unspecified, and the conditional density of \(Y_i\) given \(X_i\) is
\[
f(y\mid X_i,\lambda,\alpha,\beta)=\frac{\lambda}{2}\exp\!\big(-\lambda|y-\alpha-\beta X_i|\big),\qquad -\infty (i.e. a Laplace / double-exponential conditional distribution with scale parameter \(\lambda^{-1}\)). Show that the maximum likelihood estimators (MLEs) of \(\alpha\) and \(\beta\) minimize \[
\sum_{i=1}^n |Y_i-\alpha-\beta X_i|.
\]
The conditional likelihood for \((\lambda,\alpha,\beta)\) is
\[
L(\lambda,\alpha,\beta)
=\prod_{i=1}^n \frac{\lambda}{2}\exp\!\big(-\lambda|y_i-\alpha-\beta x_i|\big)
=\Big(\frac{\lambda}{2}\Big)^n \exp\!\Big(-\lambda\sum_{i=1}^n|y_i-\alpha-\beta x_i|\Big).
\]
The log-likelihood is
\[
\ell(\lambda,\alpha,\beta)
= n\log\lambda - n\log 2 \;-\; \lambda\sum_{i=1}^n|y_i-\alpha-\beta x_i|.
\] \[
\ell(\lambda\mid\alpha,\beta)=n\log\lambda - \lambda S(\alpha,\beta) + \text{constant},
\qquad S(\alpha,\beta)=\sum_{i=1}^n|y_i-\alpha-\beta x_i|.
\] \[
\frac{n}{\hat\lambda}-S(\alpha,\beta)=0\quad\Rightarrow\quad \hat\lambda(\alpha,\beta)=\frac{n}{S(\alpha,\beta)}.
\] \[
\ell_{\text{prof}}(\alpha,\beta) = n\log\!\Big(\frac{n}{S(\alpha,\beta)}\Big) - n\log 2 - n
= -\,n\log S(\alpha,\beta) + \text{constant}.
\] \[
S(\alpha,\beta)=\sum_{i=1}^n |y_i-\alpha-\beta x_i|.
\]
Therefore the MLEs \((\hat\alpha,\hat\beta)\) are exactly the minimizers of the sum of absolute residuals:
\[
(\hat\alpha,\hat\beta)
= \arg\min_{(\alpha,\beta)}\sum_{i=1}^n |Y_i-\alpha-\beta X_i|.
\] These are the least absolute deviations (LAD) or L1 regression estimators (equivalently median regression for the conditional median).
Let \(r_i(\alpha,\beta)=y_i-\alpha-\beta x_i\) and \(\operatorname{sgn}(u)\) be the sign function with \(\operatorname{sgn}(0)\in[-1,1]\). A minimizer \((\hat\alpha,\hat\beta)\) satisfies the subgradient equations
\[
\sum_{i=1}^n \operatorname{sgn}\big(r_i(\hat\alpha,\hat\beta)\big)=0,\qquad
\sum_{i=1}^n x_i\,\operatorname{sgn}\big(r_i(\hat\alpha,\hat\beta)\big)=0,
\]
where equalities are interpreted in the subgradient sense if some residuals equal zero. These are the LAD analogues of the normal equations.
\[
\hat\lambda=\frac{n}{\sum_{i=1}^n|y_i-\hat\alpha-\hat\beta x_i|}.
\]Detailed solution
Step 1 — conditional likelihood (treat \(x_i\) as given)
Step 2 — log-likelihood
Step 3 — maximize w.r.t. \(\alpha,\beta\)
Step 4 — conclusion
Optional — subgradient (first-order) conditions
Related concepts
Viz
Other worthwhile points