AY 2025–26
Instructor: Debasis Sengupta
Office / Department: ASU
Email: sdebasis@isical.ac.in
Marking Scheme:
Assignments: 20% | Midterm Test: 30% | End Semester: 50%
Let \( X_1, X_2, \ldots, X_n \) be i.i.d. random variables with pdf \[ f(x \mid \theta) = \frac{x^{1/\theta - 1}}{\theta}, \quad 0 \le x \le 1,\ \theta > 0. \] Is the maximum likelihood estimator (MLE) of \( \theta \) unbiased?
The likelihood is: \[ L(\theta) = \frac{1}{\theta^n} \left( \prod_{i=1}^n X_i \right)^{1/\theta - 1} \quad \Rightarrow \quad \ell(\theta) = -n \log \theta + \left( \frac{1}{\theta} - 1 \right) \sum \log X_i \] Differentiate: \[ \frac{d\ell}{d\theta} = -\frac{n}{\theta} - \frac{S}{\theta^2} \quad \Rightarrow \quad \hat{\theta}_{\text{MLE}} = -\frac{1}{n} \sum \log X_i \] To check unbiasedness: \[ \mathbb{E}[\hat{\theta}_{\text{MLE}}] = -\mathbb{E}[\log X] = - \frac{1}{\theta} \int_0^1 x^{1/\theta - 1} \log x\, dx = - \frac{1}{\theta} \cdot (-\theta^2) = \theta \] ✔️ The MLE is unbiased.
In the previous problem, we found that the MLE of \( \theta \) is:
\[ \hat{\theta}_{\text{MLE}} = -\frac{1}{n} \sum_{i=1}^{n} \log X_i. \]
What is the variance of \( \hat{\theta}_{\text{MLE}} \)?
Does the variance go to zero as \( n \to \infty \)?
Step 1: Recall the MLE expression
\[ \hat{\theta}_{\text{MLE}} = -\frac{1}{n} \sum_{i=1}^{n} \log X_i. \] Define \( Y_i = -\log X_i \). Then: \[ \hat{\theta}_{\text{MLE}} = \frac{1}{n} \sum_{i=1}^n Y_i. \] So, \[ \text{Var}(\hat{\theta}_{\text{MLE}}) = \frac{1}{n^2} \sum_{i=1}^n \text{Var}(Y_i) = \frac{1}{n} \text{Var}(Y_1). \]
Step 2: Distribution of \( Y = -\log X \)
Given: \( X \sim f(x|\theta) = \frac{x^{1/\theta - 1}}{\theta},\ 0 \le x \le 1 \)
Let \( X = e^{-Y} \), then \( dx = -e^{-Y} dY \)
\[
f_Y(y) = f_X(x(y)) \cdot \left| \frac{dx}{dy} \right| = \frac{e^{-(1/\theta - 1)y}}{\theta} \cdot e^{-y} = \frac{e^{-y/\theta}}{\theta}, \quad y \ge 0.
\]
Thus, \( Y = -\log X \sim \text{Exponential}\left( \frac{1}{\theta} \right) \)
Step 3: Mean and Variance of Exponential Distribution
If \( Y \sim \text{Exp}(\lambda) \), then:
Final Answer:
\[ \boxed{ \text{Var}(\hat{\theta}_{\text{MLE}}) = \frac{\theta^2}{n} } \]
Does variance → 0 as \( n \to \infty \)?
Yes — clearly: \[ \lim_{n \to \infty} \frac{\theta^2}{n} = 0 \] So, the MLE is consistent.
As \( n \) increases, the sample mean of i.i.d. exponential variables concentrates around the true mean \( \theta \). The variance shrinks as \( 1/n \), leading to a tighter distribution around \( \theta \). Imagine a histogram of \( \hat{\theta}_{\text{MLE}} \) values from repeated sampling — it gets narrower around \( \theta \) as \( n \) grows.
Let \( X_1, X_2, \ldots, X_n \) be i.i.d. exponential random variables with mean \( \theta \), i.e.,
\[ f(x \mid \theta) = \frac{1}{\theta} e^{-x/\theta}, \quad x \ge 0. \]
Find an unbiased estimator of \( \theta \) that depends only on \( \min\{X_1, X_2, \ldots, X_n\} \).
Step 1: Let’s define the statistic
Let: \[ Y = \min\{X_1, X_2, \ldots, X_n\} \] We aim to find a constant \( c_n \) such that: \[ \mathbb{E}[c_n Y] = \theta \quad \Rightarrow \quad \text{then } c_n Y \text{ is unbiased.} \] So we must find \( c_n = \frac{1}{\mathbb{E}[Y]} \cdot \theta \).
Step 2: Distribution of the Minimum
Let \( Y = X_{(1)} = \min\{X_1, \ldots, X_n\} \).
Then for exponential \( \text{Exp}(\theta) \), the minimum has the distribution:
\[
Y \sim \text{Exponential}\left(\frac{1}{\theta} \cdot n\right)
\]
That is:
\[
f_Y(y) = \frac{n}{\theta} e^{-ny/\theta}, \quad y \ge 0.
\]
So:
\[
\mathbb{E}[Y] = \frac{\theta}{n}
\]
Step 3: Construct Unbiased Estimator
We want: \[ \mathbb{E}[c_n Y] = \theta \Rightarrow c_n \cdot \frac{\theta}{n} = \theta \Rightarrow c_n = n \]
Final Answer:
\[ \boxed{ \hat{\theta} = n \cdot \min\{X_1, \ldots, X_n\} } \] This is an unbiased estimator of \( \theta \) using only the minimum.
Imagine sampling 100 exponential lifetimes, and just recording the first failure time. Since failures happen randomly, and exponentially, the first failure tends to occur faster than the average lifetime.
Hence, the minimum is biased low, but scaling it up by \( n \) compensates for this — making it an unbiased estimate.
Think of \( \min(X_1, \dots, X_n) \approx \frac{\theta}{n} \), so multiplying by \( n \) gets you back to \( \theta \).
Let \(X_1, \dots, X_n \overset{iid}{\sim} \text{Exp}(\theta)\). You’re given:
\[ \hat{\theta}_1 = n \cdot \min\{X_1, \dots, X_n\} \]which is unbiased. Find a better estimator for \(\theta\) and prove that it is better.
Known facts:
Variance comparison:
Conclusion:
Since \(n > 1\), \(\text{Var}(\hat{\theta}_2) < \text{Var}(\hat{\theta}_1)\).
Both are unbiased ⟹ lower variance ⇒ lower MSE.
Final Answer:
\[ \boxed{ \hat{\theta}_2 = \bar{X} \text{ is better than } \hat{\theta}_1 = n X_{(1)} } \]1. Sufficient & Complete Statistic:
2. CRLB:
\[ I(\theta) = \frac{n}{\theta^2} \Rightarrow \text{CRLB} = \frac{\theta^2}{n} \] \[ \text{Var}(\bar{X}) = \frac{\theta^2}{n} = \text{CRLB} \Rightarrow \bar{X} \text{ is efficient} \]3. Visual Intuition:
Imagine observing only the first failure among \(n\) bulbs vs observing all failures and averaging them. Clearly, averaging gives a better estimate.
4. Practical insight:
Let \( X_1, X_2, \ldots, X_n \) be i.i.d. with distribution \( \mathcal{N}(\theta, \theta^2) \), where \( \theta > 0 \).
Define:
For which value of \( c \) is \( cS \) an unbiased estimator of \( \theta \)?
We are to find \( c \) such that:
\[ \mathbb{E}[cS] = \theta. \]
So, we need:
\[ c = \frac{\theta}{\mathbb{E}[S]} \]
We know:
The sample variance:
\[ S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 \]
is an unbiased estimator of \( \theta^2 \):
\[ \Rightarrow \mathbb{E}[S^2] = \theta^2 \]
But:
\[ S = \sqrt{S^2} \Rightarrow \mathbb{E}[S] \ne \sqrt{\mathbb{E}[S^2]} \Rightarrow \mathbb{E}[S] < \theta \]
So we must explicitly calculate \( \mathbb{E}[S] \) when \( X_i \sim \mathcal{N}(\theta, \theta^2) \).
We know that for \( X_i \sim \mathcal{N}(\mu, \sigma^2) \), the statistic:
\[ \frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1} \]
This holds even if \( \mu \ne 0 \).
Here, \( \sigma^2 = \theta^2 \), so:
\[ \frac{(n-1)S^2}{\theta^2} \sim \chi^2_{n-1} \Rightarrow S = \theta \cdot \sqrt{\frac{1}{n-1} \cdot \chi^2_{n-1}} \]
Let:
\[ Z = \chi^2_{n-1}, \quad \Rightarrow S = \theta \cdot \sqrt{\frac{Z}{n-1}} \]
So:
\[ \mathbb{E}[S] = \theta \cdot \mathbb{E}\left[\sqrt{\frac{Z}{n-1}}\right] = \theta \cdot \frac{1}{\sqrt{n-1}} \cdot \mathbb{E}\left[\sqrt{Z}\right] \]
We now need the expected value of \( \sqrt{Z} \), where \( Z \sim \chi^2_{n-1} \).
For \( Z \sim \chi^2_k \), there’s a known result:
\[ \mathbb{E}[\sqrt{Z}] = \sqrt{2} \cdot \frac{\Gamma\left(\frac{k+1}{2}\right)}{\Gamma\left(\frac{k}{2}\right)} \]
Apply this to \( Z \sim \chi^2_{n-1} \):
\[ \mathbb{E}[S] = \theta \cdot \frac{1}{\sqrt{n-1}} \cdot \mathbb{E}[\sqrt{Z}] = \theta \cdot \frac{\sqrt{2}}{\sqrt{n-1}} \cdot \frac{\Gamma\left(\frac{n}{2}\right)}{\Gamma\left(\frac{n-1}{2}\right)} \]
We want:
\[ \mathbb{E}[cS] = \theta \Rightarrow c \cdot \mathbb{E}[S] = \theta \Rightarrow c = \frac{\theta}{\mathbb{E}[S]} \]
Substitute:
\[ \mathbb{E}[S] = \theta \cdot \frac{\sqrt{2}}{\sqrt{n-1}} \cdot \frac{\Gamma\left(\frac{n}{2}\right)}{\Gamma\left(\frac{n-1}{2}\right)} \]
Cancel \( \theta \), we get:
\[ c = \boxed{ \frac{\sqrt{n-1}}{\sqrt{2}} \cdot \frac{\Gamma\left(\frac{n-1}{2}\right)}{\Gamma\left(\frac{n}{2}\right)} } \]
\[ \boxed{ c = \frac{\sqrt{n - 1}}{\sqrt{2}} \cdot \frac{\Gamma\left( \frac{n - 1}{2} \right)}{\Gamma\left( \frac{n}{2} \right)} } \quad \text{makes } cS \text{ an unbiased estimator of } \theta. \]
Suppose, in the above problem, the values of c is chosen as the value that makes cS an unbiased estimator of \(\theta\). Then
is also an unbiased estimator of \(\theta\) for every value of \(\lambda \in [0, 1]\).
For which value of \(\lambda\) in this interval does the unbiased estimator have the minimum variance?
We are given:
Both \(\bar{X}\) and \(cS\) are unbiased estimators of \(\theta\). So:
Since \(\bar{X}\) and \(S\) are independent:
We know:
So:
Putting all together:
Let:
where: \(A = \frac{1}{n}, \quad B = \frac{1}{b_n^2 \cdot 2(n - 1)}\)
Then:
Minimize by setting derivative to 0:
Picture a slider between \(\bar{X}\) and \(cS\). Moving it shifts weight from one to another. The optimal point balances their variance contributions.
Let \( X_1, X_2, \ldots, X_n \) be a random sample from a population with mean \( \mu \) and variance \( \sigma^2 \). (The distribution is not specified.) Show that the statistic
\[ \sum_{i=1}^{n} a_i X_i \]
is an unbiased estimator of \( \mu \) if and only if
\[ \sum_{i=1}^{n} a_i = 1. \]
Let’s denote the statistic:
\[ T = \sum_{i=1}^{n} a_i X_i \]
We want to find the condition under which \( T \) is unbiased for \( \mu \). That is:
\[ \mathbb{E}[T] = \mu \]
\[ \mathbb{E}[T] = \mathbb{E}\left[\sum_{i=1}^n a_i X_i\right] = \sum_{i=1}^n a_i \mathbb{E}[X_i] \]
Since \( X_i \) are i.i.d. with mean \( \mu \),
\[ \mathbb{E}[X_i] = \mu \quad \text{for all } i \]
So:
\[ \mathbb{E}[T] = \sum_{i=1}^n a_i \mu = \mu \sum_{i=1}^n a_i \]
We want \( \mathbb{E}[T] = \mu \), so:
\[ \mu \sum_{i=1}^n a_i = \mu \]
Now divide both sides by \( \mu \):
\[ \sum_{i=1}^n a_i = 1 \]
Hence, the condition is necessary and sufficient.
Think of each \( X_i \) as a “vote” on the value of \( \mu \), and \( a_i \) as the “weight” of that vote. To get an unbiased estimate, the total weight has to be 1 — like averaging with possibly unequal weights. If you give too little or too much total weight, the final value will be systematically too low or too high, respectively.
\[ \mathrm{Var}(T) = \sigma^2 \sum a_i^2 \]
So you can minimize this (i.e., make it most efficient) subject to \( \sum a_i = 1 \).In the above problem, consider all unbiased estimators of
\[ \mu, \quad \text{which are of the form} \quad \sum_{i=1}^n a_i X_i. \]
Which one of these estimators has the minimum variance? What is the minimum variance?
We are given:
We are to minimize the variance of such an unbiased estimator.
Since the \( X_i \)'s are uncorrelated (i.i.d.),
\[ \operatorname{Var}(\hat{\mu}_a) = \operatorname{Var}\left(\sum_{i=1}^n a_i X_i \right) = \sum_{i=1}^n a_i^2 \operatorname{Var}(X_i) = \sigma^2 \sum_{i=1}^n a_i^2 \]
So, minimizing the variance of the estimator \( \hat{\mu}_a \) reduces to minimizing
\[ \sum_{i=1}^n a_i^2 \]
subject to the constraint \( \sum_{i=1}^n a_i = 1 \).
Let us minimize \( L = \sum_{i=1}^n a_i^2 - \lambda \left( \sum_{i=1}^n a_i - 1 \right) \)
Taking derivative w.r.t. \( a_i \):
\[ \frac{\partial L}{\partial a_i} = 2a_i - \lambda = 0 \Rightarrow a_i = \frac{\lambda}{2}, \quad \text{for all } i \]
Apply the constraint:
\[ \sum_{i=1}^n a_i = n \cdot \frac{\lambda}{2} = 1 \Rightarrow \lambda = \frac{2}{n} \Rightarrow a_i = \frac{1}{n} \quad \text{for all } i \]
\[ \boxed{\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i} \]
\[ \operatorname{Var}(\bar{X}) = \sigma^2 \sum_{i=1}^n a_i^2 = \sigma^2 \cdot \sum_{i=1}^n \left(\frac{1}{n}\right)^2 = \sigma^2 \cdot \frac{n}{n^2} = \frac{\sigma^2}{n} \]
Think of all unbiased estimators as lying on a plane defined by \( \sum a_i = 1 \). Among them, the one that "spreads" weight equally (i.e., \( a_i = 1/n \)) is the most "stable" because it averages out randomness optimally. Any deviation from equal weights increases sensitivity and thus variance.
Let \( X_1, X_2, \ldots, X_n \) be i.i.d. Bernoulli(\( p \)). Show that the variance of the sample mean \( \bar{X} \) attains the Cramér–Rao Lower Bound (CRLB), and hence is the best unbiased estimator of \( p \).
We are asked to:
\[ \operatorname{Var}(\bar{X}) = \operatorname{Var}\left( \frac{1}{n} \sum_{i=1}^n X_i \right) = \frac{1}{n^2} \sum_{i=1}^n \operatorname{Var}(X_i) = \frac{1}{n^2} \cdot n \cdot p(1-p) = \frac{p(1-p)}{n} \]
We use the CRLB formula for i.i.d. samples:
\[ \text{CRLB} = \frac{1}{n \cdot I(p)} \]
where \( I(p) \) is the Fisher Information for a single observation \( X_i \).
For a Bernoulli distribution:
\[ f(x|p) = p^x(1-p)^{1-x}, \quad x \in \{0, 1\} \]
Compute the log-likelihood:
\[ \log f(x|p) = x \log p + (1-x) \log(1-p) \]
Now compute the score:
\[ \frac{d}{dp} \log f(x|p) = \frac{x}{p} - \frac{1-x}{1-p} \]
Then the Fisher Information is:
\[ I(p) = \mathbb{E} \left[ \left( \frac{d}{dp} \log f(x|p) \right)^2 \right] = \mathbb{E} \left[ \left( \frac{x}{p} - \frac{1 - x}{1 - p} \right)^2 \right] \]
Evaluate this expectation:
\[ = \left( \frac{1}{p} \right)^2 \cdot p + \left( \frac{1}{1-p} \right)^2 \cdot (1 - p) = \frac{1}{p} + \frac{1}{1-p} \]
So:
\[ I(p) = \frac{1}{p(1 - p)} \Rightarrow \text{CRLB} = \frac{1}{n \cdot \frac{1}{p(1 - p)}} = \frac{p(1 - p)}{n} \]
We found earlier:
\[ \operatorname{Var}(\bar{X}) = \frac{p(1-p)}{n} = \text{CRLB} \]
✔️ So the sample mean achieves the Cramér–Rao Lower Bound.
Since \( \bar{X} \) is unbiased for \( p \) and achieves the CRLB, it is the Minimum Variance Unbiased Estimator (MVUE) for \( p \). That is, it is the best unbiased estimator of \( p \).
Imagine you’re trying to estimate the true probability \( p \) of getting heads in a biased coin using sample proportions. As you increase the number of tosses \( n \), the sample mean \( \bar{X} \) (i.e., proportion of heads) becomes more accurate.
The CRLB says: “No matter how clever your estimator is, if it’s unbiased, its variance can’t be better than this limit.” And the sample mean exactly hits that limit here.
We are given:
We are asked to compute \( \text{Var}(T^2) \), and compare it with the Cramér–Rao Lower Bound (CRLB) for an unbiased estimator of \( \sigma^2 \).
Recall that:
\[ \mathbb{E}[T^2] = \frac{n-1}{n} \sigma^2 \]
So, \( T^2 \) is a biased estimator of \( \sigma^2 \).
Let’s define:
\[ S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 \]
Then \( \mathbb{E}[S^2] = \sigma^2 \), so \( S^2 \) is unbiased for \( \sigma^2 \). And clearly:
\[ T^2 = \frac{n-1}{n} S^2 \]
Let us find \( \text{Var}(T^2) \) using the fact that \( S^2 \) is a scaled chi-squared variable:
\[ (n-1)\frac{S^2}{\sigma^2} \sim \chi^2_{n-1} \Rightarrow \text{Var}(S^2) = \frac{2\sigma^4}{n-1} \]
Now use \( T^2 = \frac{n-1}{n} S^2 \):
\[ \text{Var}(T^2) = \left(\frac{n-1}{n}\right)^2 \text{Var}(S^2) = \left(\frac{n-1}{n}\right)^2 \cdot \frac{2\sigma^4}{n-1} = \frac{(n-1) \cdot 2\sigma^4}{n^2} = \frac{2(n-1)\sigma^4}{n^2} \]
For \( X_1, \ldots, X_n \sim \mathcal{N}(\mu, \sigma^2) \), the CRLB for an unbiased estimator of \( \sigma^2 \) is:
\[ \text{Var}(\hat{\sigma}^2) \geq \frac{2\sigma^4}{n} \]
This is the minimum possible variance for any unbiased estimator of \( \sigma^2 \).
We found:
Now compare:
\[ \frac{2(n-1)\sigma^4}{n^2} < \frac{2\sigma^4}{n} \quad \text{since } \frac{n-1}{n} < 1 \]
BUT! This comparison is misleading. CRLB applies only to unbiased estimators, and \( T^2 \) is biased. So even though \( \text{Var}(T^2) < \text{CRLB} \), this does not contradict the CRLB, because the bound doesn’t apply to biased estimators.
We are given the setup:
We know that:
\[ E[T^2] = \left( \frac{n-1}{n} \right) \sigma^2 \]
Hence, \( T^2 \) is biased for \( \sigma^2 \). Its bias is:
\[ \text{Bias}(T^2) = E[T^2] - \sigma^2 = \left( \frac{n-1}{n} - 1 \right) \sigma^2 = -\frac{\sigma^2}{n} \]
Now compute the Mean Squared Error (MSE):
\[ \text{MSE}(T^2) = \text{Var}(T^2) + \left( \text{Bias}(T^2) \right)^2 \]
We also know that:
\[ \text{Var}(T^2) = \frac{2\sigma^4(n - 1)}{n^2} \quad \text{(from standard results on sample variance)} \]
So,
\[ \text{Bias}(T^2)^2 = \left( \frac{\sigma^2}{n} \right)^2 = \frac{\sigma^4}{n^2} \]
Putting it together:
\[ \text{MSE}(T^2) = \frac{2\sigma^4(n - 1)}{n^2} + \frac{\sigma^4}{n^2} = \frac{\sigma^4}{n^2} (2(n - 1) + 1) = \frac{\sigma^4}{n^2} (2n - 1) \]
The Fisher Information for \( \sigma^2 \) in the normal case is:
\[ I(\sigma^2) = \frac{n}{2\sigma^4} \]
Therefore, the CRLB for any unbiased estimator of \( \sigma^2 \) is:
\[ \text{Var}(\hat{\sigma}^2) \geq \frac{1}{I(\sigma^2)} = \frac{2\sigma^4}{n} \]
We already found:
\[ \text{MSE}(T^2) = \frac{\sigma^4}{n^2} (2n - 1) \]
Let's compare this to the CRLB:
\[ \frac{\sigma^4}{n^2} (2n - 1) \quad \text{vs.} \quad \frac{2\sigma^4}{n} \]
Multiply both sides of the CRLB by \( \frac{n}{n} \) to make the denominator same:
\[ \frac{2\sigma^4}{n} = \frac{2\sigma^4 n}{n^2} \]
Now we compare:
\[ \frac{\sigma^4 (2n - 1)}{n^2} \quad \text{vs.} \quad \frac{2\sigma^4 n}{n^2} \]
Clearly:
\[ 2n - 1 < 2n \quad \Rightarrow \quad \text{MSE}(T^2) < \text{CRLB} \]
🚫 But this comparison is invalid because the CRLB is only for unbiased estimators, and \( T^2 \) is biased. So the CRLB doesn't directly bound the MSE of \( T^2 \). Still, it's useful to check for insight.
✅ Conclusion:
The MSE of \( T^2 \) is less than the Cramér–Rao lower bound for the variance of any unbiased estimator of \( \sigma^2 \), but this is allowed, because \( T^2 \) is not unbiased.