Assignment: Week-1

Question:

Let \( X_1, X_2, \ldots, X_n \) be i.i.d. random variables with pdf \[ f(x \mid \theta) = \frac{x^{1/\theta - 1}}{\theta}, \quad 0 \le x \le 1,\ \theta > 0. \] Is the maximum likelihood estimator (MLE) of \( \theta \) unbiased?

Solution:

The likelihood is: \[ L(\theta) = \frac{1}{\theta^n} \left( \prod_{i=1}^n X_i \right)^{1/\theta - 1} \quad \Rightarrow \quad \ell(\theta) = -n \log \theta + \left( \frac{1}{\theta} - 1 \right) \sum \log X_i \] Differentiate: \[ \frac{d\ell}{d\theta} = -\frac{n}{\theta} - \frac{S}{\theta^2} \quad \Rightarrow \quad \hat{\theta}_{\text{MLE}} = -\frac{1}{n} \sum \log X_i \] To check unbiasedness: \[ \mathbb{E}[\hat{\theta}_{\text{MLE}}] = -\mathbb{E}[\log X] = - \frac{1}{\theta} \int_0^1 x^{1/\theta - 1} \log x\, dx = - \frac{1}{\theta} \cdot (-\theta^2) = \theta \] ✔️ The MLE is unbiased.

Related concepts / Other stuff:

MLE (Maximum Likelihood Estimation): Estimator derived by maximizing the likelihood function.
Unbiased Estimator: Satisfies \( \mathbb{E}[\hat{\theta}] = \theta \).
Useful identity: \( \int_0^1 x^{a - 1} \log x\, dx = -1/a^2 \).
Transformation Insight: \( -\log X \sim \text{Exp}(1/\theta) \) is key to understanding the unbiasedness.
Rare Case: MLE here is unbiased for all \( n \), not just asymptotically.

Question:

In the previous problem, we found that the MLE of \( \theta \) is:

\[ \hat{\theta}_{\text{MLE}} = -\frac{1}{n} \sum_{i=1}^{n} \log X_i. \]

What is the variance of \( \hat{\theta}_{\text{MLE}} \)?
Does the variance go to zero as \( n \to \infty \)?

Solution:

Step 1: Recall the MLE expression

\[ \hat{\theta}_{\text{MLE}} = -\frac{1}{n} \sum_{i=1}^{n} \log X_i. \] Define \( Y_i = -\log X_i \). Then: \[ \hat{\theta}_{\text{MLE}} = \frac{1}{n} \sum_{i=1}^n Y_i. \] So, \[ \text{Var}(\hat{\theta}_{\text{MLE}}) = \frac{1}{n^2} \sum_{i=1}^n \text{Var}(Y_i) = \frac{1}{n} \text{Var}(Y_1). \]

Step 2: Distribution of \( Y = -\log X \)

Given: \( X \sim f(x|\theta) = \frac{x^{1/\theta - 1}}{\theta},\ 0 \le x \le 1 \)
Let \( X = e^{-Y} \), then \( dx = -e^{-Y} dY \) \[ f_Y(y) = f_X(x(y)) \cdot \left| \frac{dx}{dy} \right| = \frac{e^{-(1/\theta - 1)y}}{\theta} \cdot e^{-y} = \frac{e^{-y/\theta}}{\theta}, \quad y \ge 0. \] Thus, \( Y = -\log X \sim \text{Exponential}\left( \frac{1}{\theta} \right) \)

Step 3: Mean and Variance of Exponential Distribution

If \( Y \sim \text{Exp}(\lambda) \), then:

\( \mathbb{E}[Y] = \frac{1}{\lambda} \)
\( \text{Var}(Y) = \frac{1}{\lambda^2} \)

Here \( \lambda = 1/\theta \), so:

\( \mathbb{E}[Y] = \theta \)
\( \text{Var}(Y) = \theta^2 \)

Thus, \[ \text{Var}(\hat{\theta}_{\text{MLE}}) = \frac{1}{n} \theta^2 \]

Final Answer:

\[ \boxed{ \text{Var}(\hat{\theta}_{\text{MLE}}) = \frac{\theta^2}{n} } \]

Does variance → 0 as \( n \to \infty \)?

Yes — clearly: \[ \lim_{n \to \infty} \frac{\theta^2}{n} = 0 \] So, the MLE is consistent.

Related concepts / Other stuff:

Variance of Sample Mean: If \( Y_1, \ldots, Y_n \) are i.i.d. with variance \( \sigma^2 \), then \( \text{Var}(\bar{Y}) = \frac{\sigma^2}{n} \)
Consistency of Estimators: An estimator \( \hat{\theta}_n \) is consistent if \( \hat{\theta}_n \xrightarrow{P} \theta \). True if unbiased and variance → 0.

🧠 Visual Intuition:

As \( n \) increases, the sample mean of i.i.d. exponential variables concentrates around the true mean \( \theta \). The variance shrinks as \( 1/n \), leading to a tighter distribution around \( \theta \). Imagine a histogram of \( \hat{\theta}_{\text{MLE}} \) values from repeated sampling — it gets narrower around \( \theta \) as \( n \) grows.

💡 Worth Mentioning:

This is a rare case where the MLE is unbiased, has finite variance, and is consistent — all in closed form.
Transformation via \( -\log X \) is a classic estimation technique.
This MLE is also asymptotically efficient (achieves CRLB).

Question:

Let \( X_1, X_2, \ldots, X_n \) be i.i.d. exponential random variables with mean \( \theta \), i.e.,

\[ f(x \mid \theta) = \frac{1}{\theta} e^{-x/\theta}, \quad x \ge 0. \]

Find an unbiased estimator of \( \theta \) that depends only on \( \min\{X_1, X_2, \ldots, X_n\} \).

Solution:

Step 1: Let’s define the statistic

Let: \[ Y = \min\{X_1, X_2, \ldots, X_n\} \] We aim to find a constant \( c_n \) such that: \[ \mathbb{E}[c_n Y] = \theta \quad \Rightarrow \quad \text{then } c_n Y \text{ is unbiased.} \] So we must find \( c_n = \frac{1}{\mathbb{E}[Y]} \cdot \theta \).

Step 2: Distribution of the Minimum

Let \( Y = X_{(1)} = \min\{X_1, \ldots, X_n\} \).
Then for exponential \( \text{Exp}(\theta) \), the minimum has the distribution: \[ Y \sim \text{Exponential}\left(\frac{1}{\theta} \cdot n\right) \] That is: \[ f_Y(y) = \frac{n}{\theta} e^{-ny/\theta}, \quad y \ge 0. \] So: \[ \mathbb{E}[Y] = \frac{\theta}{n} \]

Step 3: Construct Unbiased Estimator

We want: \[ \mathbb{E}[c_n Y] = \theta \Rightarrow c_n \cdot \frac{\theta}{n} = \theta \Rightarrow c_n = n \]

Final Answer:

\[ \boxed{ \hat{\theta} = n \cdot \min\{X_1, \ldots, X_n\} } \] This is an unbiased estimator of \( \theta \) using only the minimum.

Related concepts / Other stuff:

Exponential Distribution and Order Statistics:
- The minimum of \( n \) i.i.d. \( \text{Exp}(\theta) \) is again exponential, with rate \( \frac{n}{\theta} \).
- This is because the exponential distribution has the memoryless property.
Sufficiency and Completeness:
- The full data \( X_1, \ldots, X_n \) gives the sufficient statistic \( \sum X_i \), but the minimum alone is not sufficient.
- Still, one can construct an unbiased estimator from a function of a statistic, even if it's not sufficient.

🧠 Visual Intuition:

Imagine sampling 100 exponential lifetimes, and just recording the first failure time. Since failures happen randomly, and exponentially, the first failure tends to occur faster than the average lifetime.

Hence, the minimum is biased low, but scaling it up by \( n \) compensates for this — making it an unbiased estimate.

Think of \( \min(X_1, \dots, X_n) \approx \frac{\theta}{n} \), so multiplying by \( n \) gets you back to \( \theta \).

💡 Worth Mentioning:

This estimator is not the most efficient (in terms of variance). The MLE \( \hat{\theta}_{\text{MLE}} = \bar{X} \) has lower variance.
But this question is important in contexts where only partial information is available — like censoring or online streaming.
The estimator \( n \cdot X_{(1)} \) is minimal sufficient in some practical settings, like reliability theory.

Qsn: Better Estimator for \(\theta\) in Exponential Case

Let \(X_1, \dots, X_n \overset{iid}{\sim} \text{Exp}(\theta)\). You’re given:

\[ \hat{\theta}_1 = n \cdot \min\{X_1, \dots, X_n\} \]

which is unbiased. Find a better estimator for \(\theta\) and prove that it is better.

Soln:

Known facts:

\(\mathbb{E}[X_i] = \theta\), \(\text{Var}(X_i) = \theta^2\)
Sample mean: \(\bar{X} = \frac{1}{n} \sum X_i\)
\(T = \sum X_i \sim \text{Gamma}(n, \theta)\)
\(\hat{\theta}_2 := \bar{X}\) is unbiased, MLE, and MVUE (by completeness and sufficiency of \(T\))

Variance comparison:

\(\hat{\theta}_1 = n X_{(1)}\), where \(X_{(1)} = \min(X_i)\)
\(X_{(1)} \sim \text{Exp}(n/\theta)\) ⟹ \(\mathbb{E}[X_{(1)}] = \theta/n\), \(\text{Var}(X_{(1)}) = \theta^2/n^2\)
\(\text{Var}(\hat{\theta}_1) = n^2 \cdot \frac{\theta^2}{n^2} = \theta^2\)
\(\hat{\theta}_2 = \bar{X}\), \(\text{Var}(\bar{X}) = \frac{\theta^2}{n}\)

Conclusion:

Since \(n > 1\), \(\text{Var}(\hat{\theta}_2) < \text{Var}(\hat{\theta}_1)\).

Both are unbiased ⟹ lower variance ⇒ lower MSE.

Final Answer:

\[ \boxed{ \hat{\theta}_2 = \bar{X} \text{ is better than } \hat{\theta}_1 = n X_{(1)} } \]

Unbiased ✅
Lower variance ✅
MLE and MVUE ✅

Related concepts/ other stuffs:

1. Sufficient & Complete Statistic:

\(T = \sum X_i\) is sufficient and complete in exponential family
\(\bar{X}\) is a function of \(T\) ⇒ MVUE by Lehmann–Scheffé

2. CRLB:

\[ I(\theta) = \frac{n}{\theta^2} \Rightarrow \text{CRLB} = \frac{\theta^2}{n} \] \[ \text{Var}(\bar{X}) = \frac{\theta^2}{n} = \text{CRLB} \Rightarrow \bar{X} \text{ is efficient} \]

3. Visual Intuition:

Imagine observing only the first failure among \(n\) bulbs vs observing all failures and averaging them. Clearly, averaging gives a better estimate.

4. Practical insight:

\(\hat{\theta}_1 = n X_{(1)}\) is not sufficient
Use Rao–Blackwell: condition on \(\sum X_i\) ⇒ get \(\bar{X}\)
In censored or incomplete data, \(X_{(1)}\) may be all you have

Question:

Let \( X_1, X_2, \ldots, X_n \) be i.i.d. with distribution \( \mathcal{N}(\theta, \theta^2) \), where \( \theta > 0 \).

Define:

\( \bar{X} = \frac{1}{n} \sum_{i=1}^n X_i \) (sample mean)
\( S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 \) (sample variance)

For which value of \( c \) is \( cS \) an unbiased estimator of \( \theta \)?

Solution:

We are to find \( c \) such that:

\[ \mathbb{E}[cS] = \theta. \]

So, we need:

\[ c = \frac{\theta}{\mathbb{E}[S]} \]

We know:

\( X_i \sim \mathcal{N}(\theta, \theta^2) \)
\( \text{Var}(X_i) = \theta^2 \)

The sample variance:

\[ S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 \]

is an unbiased estimator of \( \theta^2 \):

\[ \Rightarrow \mathbb{E}[S^2] = \theta^2 \]

But:

\[ S = \sqrt{S^2} \Rightarrow \mathbb{E}[S] \ne \sqrt{\mathbb{E}[S^2]} \Rightarrow \mathbb{E}[S] < \theta \]

So we must explicitly calculate \( \mathbb{E}[S] \) when \( X_i \sim \mathcal{N}(\theta, \theta^2) \).

🔹 Step 1: Use Chi-Square Distribution

We know that for \( X_i \sim \mathcal{N}(\mu, \sigma^2) \), the statistic:

\[ \frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1} \]

This holds even if \( \mu \ne 0 \).

Here, \( \sigma^2 = \theta^2 \), so:

\[ \frac{(n-1)S^2}{\theta^2} \sim \chi^2_{n-1} \Rightarrow S = \theta \cdot \sqrt{\frac{1}{n-1} \cdot \chi^2_{n-1}} \]

Let:

\[ Z = \chi^2_{n-1}, \quad \Rightarrow S = \theta \cdot \sqrt{\frac{Z}{n-1}} \]

So:

\[ \mathbb{E}[S] = \theta \cdot \mathbb{E}\left[\sqrt{\frac{Z}{n-1}}\right] = \theta \cdot \frac{1}{\sqrt{n-1}} \cdot \mathbb{E}\left[\sqrt{Z}\right] \]

We now need the expected value of \( \sqrt{Z} \), where \( Z \sim \chi^2_{n-1} \).

🔹 Step 2: Known Result: Expected value of square root of chi-square

For \( Z \sim \chi^2_k \), there’s a known result:

\[ \mathbb{E}[\sqrt{Z}] = \sqrt{2} \cdot \frac{\Gamma\left(\frac{k+1}{2}\right)}{\Gamma\left(\frac{k}{2}\right)} \]

Apply this to \( Z \sim \chi^2_{n-1} \):

\[ \mathbb{E}[S] = \theta \cdot \frac{1}{\sqrt{n-1}} \cdot \mathbb{E}[\sqrt{Z}] = \theta \cdot \frac{\sqrt{2}}{\sqrt{n-1}} \cdot \frac{\Gamma\left(\frac{n}{2}\right)}{\Gamma\left(\frac{n-1}{2}\right)} \]

🔹 Step 3: Solve for \( c \)

We want:

\[ \mathbb{E}[cS] = \theta \Rightarrow c \cdot \mathbb{E}[S] = \theta \Rightarrow c = \frac{\theta}{\mathbb{E}[S]} \]

Substitute:

\[ \mathbb{E}[S] = \theta \cdot \frac{\sqrt{2}}{\sqrt{n-1}} \cdot \frac{\Gamma\left(\frac{n}{2}\right)}{\Gamma\left(\frac{n-1}{2}\right)} \]

Cancel \( \theta \), we get:

\[ c = \boxed{ \frac{\sqrt{n-1}}{\sqrt{2}} \cdot \frac{\Gamma\left(\frac{n-1}{2}\right)}{\Gamma\left(\frac{n}{2}\right)} } \]

Final Answer:

\[ \boxed{ c = \frac{\sqrt{n - 1}}{\sqrt{2}} \cdot \frac{\Gamma\left( \frac{n - 1}{2} \right)}{\Gamma\left( \frac{n}{2} \right)} } \quad \text{makes } cS \text{ an unbiased estimator of } \theta. \]

Related concepts / Other stuff:

Chi-square and Gamma functions:
- If \( Z \sim \chi^2_k \), then it’s a special case of \( \text{Gamma}(k/2, 1/2) \)
- \( \mathbb{E}[\sqrt{Z}] \) involves non-linear expectations → handled with gamma functions.
Bias of the Sample Standard Deviation:
- \( \mathbb{E}[S] < \sqrt{\text{Var}(X)} \)
- This is because square root is concave → Jensen’s inequality implies bias downward.

🧠 Visual Intuition:

The sample variance \( S^2 \) is unbiased for \( \theta^2 \)
But taking the square root pulls the expectation down, because of concavity.
To compensate, you scale up by \( c > 1 \), computed exactly via gamma ratios.

💡 Worth Mentioning:

This is a rare exact correction for bias — many times people use approximate corrections (e.g., in small-sample t-tests).
For large \( n \), \( c \to 1 \), so \( S \approx \theta \), and bias becomes negligible.
This \( cS \) is not MVUE — \( \bar{X} \) is a better candidate due to sufficiency and completeness, but it doesn’t depend solely on \( S \).

🔷 QUESTION

Suppose, in the above problem, the values of c is chosen as the value that makes cS an unbiased estimator of \(\theta\). Then

\[ \lambda \bar{X} + (1 - \lambda)cS \]

is also an unbiased estimator of \(\theta\) for every value of \(\lambda \in [0, 1]\).
For which value of \(\lambda\) in this interval does the unbiased estimator have the minimum variance?

📘 DETAILED SOLUTION

We are given:

\(X_1, \dots, X_n \overset{iid}{\sim} \mathcal{N}(\theta, \theta^2)\) where \(\theta > 0\)
\(\bar{X} = \frac{1}{n} \sum X_i\)
\(S^2 = \frac{1}{n-1} \sum (X_i - \bar{X})^2\)
\(S = \sqrt{S^2}\)
Let c be such that cS is unbiased for \(\theta\)
Consider \(\delta_\lambda = \lambda \bar{X} + (1 - \lambda)cS\)

✅ Step 1: Check Unbiasedness

Both \(\bar{X}\) and \(cS\) are unbiased estimators of \(\theta\). So:

\[ \mathbb{E}[\delta_\lambda] = \lambda \theta + (1 - \lambda)\theta = \theta \]

✅ Step 2: Compute Variance

Since \(\bar{X}\) and \(S\) are independent:

\[ \operatorname{Var}(\delta_\lambda) = \lambda^2 \operatorname{Var}(\bar{X}) + (1 - \lambda)^2 \operatorname{Var}(cS) \]

We know:

\(\operatorname{Var}(\bar{X}) = \frac{\theta^2}{n}\)
\(\mathbb{E}[S] = b_n \theta \Rightarrow c = \frac{1}{b_n}\)
\(\operatorname{Var}(S) \approx \frac{\theta^2}{2(n-1)}\)

So:

\[ \operatorname{Var}(cS) = \frac{1}{b_n^2} \cdot \frac{\theta^2}{2(n-1)} \]

Putting all together:

\[ \operatorname{Var}(\delta_\lambda) = \theta^2 \left[\lambda^2 \cdot \frac{1}{n} + (1 - \lambda)^2 \cdot \frac{1}{b_n^2 \cdot 2(n - 1)}\right] \]

✅ Step 3: Minimize Variance w.r.t. \(\lambda\)

Let:

\[ V(\lambda) = A\lambda^2 + B(1 - \lambda)^2 \]

where: \(A = \frac{1}{n}, \quad B = \frac{1}{b_n^2 \cdot 2(n - 1)}\)

Then:

\[ V(\lambda) = (A + B)\lambda^2 - 2B\lambda + B \]

Minimize by setting derivative to 0:

\[ \lambda^* = \frac{B}{A + B} \]

✅ Final Answer

\[ \boxed{\lambda^* = \frac{\displaystyle \frac{1}{b_n^2 \cdot 2(n - 1)}}{\displaystyle \frac{1}{n} + \frac{1}{b_n^2 \cdot 2(n - 1)}}} \]

📌 Related Concepts

Linear combinations of unbiased estimators stay unbiased.
To minimize variance among unbiased estimators, exploit independence and variance structure.
Convex combinations are common when fusing noisy signals.
Chi-square variance and Gamma correction methods appear in finite-sample theory.

🧠 Visual Intuition

Picture a slider between \(\bar{X}\) and \(cS\). Moving it shifts weight from one to another. The optimal point balances their variance contributions.

💡 Worth Mentioning

As n \to \infty, \(\lambda^* \to 1\).
This derivation is symbolic — no need for data values.
You can evaluate b_n numerically for practical usage.

❓ Question

Let \( X_1, X_2, \ldots, X_n \) be a random sample from a population with mean \( \mu \) and variance \( \sigma^2 \). (The distribution is not specified.) Show that the statistic

\[ \sum_{i=1}^{n} a_i X_i \]

is an unbiased estimator of \( \mu \) if and only if

\[ \sum_{i=1}^{n} a_i = 1. \]

✅ Detailed Solution

Let’s denote the statistic:

\[ T = \sum_{i=1}^{n} a_i X_i \]

We want to find the condition under which \( T \) is unbiased for \( \mu \). That is:

\[ \mathbb{E}[T] = \mu \]

🔹 Step 1: Use Linearity of Expectation

\[ \mathbb{E}[T] = \mathbb{E}\left[\sum_{i=1}^n a_i X_i\right] = \sum_{i=1}^n a_i \mathbb{E}[X_i] \]

Since \( X_i \) are i.i.d. with mean \( \mu \),

\[ \mathbb{E}[X_i] = \mu \quad \text{for all } i \]

So:

\[ \mathbb{E}[T] = \sum_{i=1}^n a_i \mu = \mu \sum_{i=1}^n a_i \]

🔹 Step 2: Impose the Unbiasedness Condition

We want \( \mathbb{E}[T] = \mu \), so:

\[ \mu \sum_{i=1}^n a_i = \mu \]

Now divide both sides by \( \mu \):

\[ \sum_{i=1}^n a_i = 1 \]

🔁 “If and only if” Justification

If \( \sum a_i = 1 \), then \( \mathbb{E}[T] = \mu \), so the estimator is unbiased.
Only if the estimator is unbiased, then from above, \( \mu \sum a_i = \mu \Rightarrow \sum a_i = 1 \)

Hence, the condition is necessary and sufficient.

🧠 Visual Intuition

Think of each \( X_i \) as a “vote” on the value of \( \mu \), and \( a_i \) as the “weight” of that vote. To get an unbiased estimate, the total weight has to be 1 — like averaging with possibly unequal weights. If you give too little or too much total weight, the final value will be systematically too low or too high, respectively.

💡 Worth Mentioning

If all weights are equal \( a_i = \frac{1}{n} \), then \( T = \bar{X} \), the sample mean — which is always an unbiased estimator of \( \mu \).
The variance of \( T = \sum a_i X_i \) is:
\[ \mathrm{Var}(T) = \sigma^2 \sum a_i^2 \]
So you can minimize this (i.e., make it most efficient) subject to \( \sum a_i = 1 \).

🔹 Question

In the above problem, consider all unbiased estimators of

\[ \mu, \quad \text{which are of the form} \quad \sum_{i=1}^n a_i X_i. \]

Which one of these estimators has the minimum variance? What is the minimum variance?

✅ Detailed Solution

We are given:

\( X_1, X_2, \dots, X_n \) are i.i.d. with unknown mean \( \mu \) and unknown variance \( \sigma^2 \).
The estimator is \( \hat{\mu}_a = \sum_{i=1}^n a_i X_i \), where \( \sum a_i = 1 \) to ensure unbiasedness.

We are to minimize the variance of such an unbiased estimator.

🔹 Step 1: Compute Variance of the Estimator

Since the \( X_i \)'s are uncorrelated (i.i.d.),

\[ \operatorname{Var}(\hat{\mu}_a) = \operatorname{Var}\left(\sum_{i=1}^n a_i X_i \right) = \sum_{i=1}^n a_i^2 \operatorname{Var}(X_i) = \sigma^2 \sum_{i=1}^n a_i^2 \]

So, minimizing the variance of the estimator \( \hat{\mu}_a \) reduces to minimizing

\[ \sum_{i=1}^n a_i^2 \]

subject to the constraint \( \sum_{i=1}^n a_i = 1 \).

🔹 Step 2: Use Lagrange Multipliers

Let us minimize \( L = \sum_{i=1}^n a_i^2 - \lambda \left( \sum_{i=1}^n a_i - 1 \right) \)

Taking derivative w.r.t. \( a_i \):

\[ \frac{\partial L}{\partial a_i} = 2a_i - \lambda = 0 \Rightarrow a_i = \frac{\lambda}{2}, \quad \text{for all } i \]

Apply the constraint:

\[ \sum_{i=1}^n a_i = n \cdot \frac{\lambda}{2} = 1 \Rightarrow \lambda = \frac{2}{n} \Rightarrow a_i = \frac{1}{n} \quad \text{for all } i \]

✅ Answer:

The minimum variance unbiased estimator of \( \mu \) among all linear combinations of the form \( \sum a_i X_i \) is:
\[ \boxed{\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i} \]
Its minimum variance is:
\[ \operatorname{Var}(\bar{X}) = \sigma^2 \sum_{i=1}^n a_i^2 = \sigma^2 \cdot \sum_{i=1}^n \left(\frac{1}{n}\right)^2 = \sigma^2 \cdot \frac{n}{n^2} = \frac{\sigma^2}{n} \]

🧠 Visual Intuition

Think of all unbiased estimators as lying on a plane defined by \( \sum a_i = 1 \). Among them, the one that "spreads" weight equally (i.e., \( a_i = 1/n \)) is the most "stable" because it averages out randomness optimally. Any deviation from equal weights increases sensitivity and thus variance.

💡 Worth Mentioning

This result is distribution-free – it only requires i.i.d. with finite variance.
This also justifies why the sample mean \( \bar{X} \) is typically the default estimator of the population mean in practice.
This is closely tied to the Gauss-Markov theorem (in linear models) where the least squares estimator is BLUE (Best Linear Unbiased Estimator).

❓ Question

Let \( X_1, X_2, \ldots, X_n \) be i.i.d. Bernoulli(\( p \)). Show that the variance of the sample mean \( \bar{X} \) attains the Cramér–Rao Lower Bound (CRLB), and hence is the best unbiased estimator of \( p \).

✅ Detailed Solution

Step 1: Define the setup

\( X_i \sim \text{Bernoulli}(p) \), so \( X_i \in \{0,1\} \)
\( \mathbb{E}(X_i) = p \), \( \operatorname{Var}(X_i) = p(1-p) \)
\( \bar{X} = \frac{1}{n} \sum_{i=1}^n X_i \)

We are asked to:

Compute \( \operatorname{Var}(\bar{X}) \)
Compute the CRLB for unbiased estimators of \( p \)
Check if \( \operatorname{Var}(\bar{X}) = \text{CRLB} \)

Step 2: Compute variance of the sample mean

\[ \operatorname{Var}(\bar{X}) = \operatorname{Var}\left( \frac{1}{n} \sum_{i=1}^n X_i \right) = \frac{1}{n^2} \sum_{i=1}^n \operatorname{Var}(X_i) = \frac{1}{n^2} \cdot n \cdot p(1-p) = \frac{p(1-p)}{n} \]

Step 3: Compute the CRLB

We use the CRLB formula for i.i.d. samples:

\[ \text{CRLB} = \frac{1}{n \cdot I(p)} \]

where \( I(p) \) is the Fisher Information for a single observation \( X_i \).

For a Bernoulli distribution:

\[ f(x|p) = p^x(1-p)^{1-x}, \quad x \in \{0, 1\} \]

Compute the log-likelihood:

\[ \log f(x|p) = x \log p + (1-x) \log(1-p) \]

Now compute the score:

\[ \frac{d}{dp} \log f(x|p) = \frac{x}{p} - \frac{1-x}{1-p} \]

Then the Fisher Information is:

\[ I(p) = \mathbb{E} \left[ \left( \frac{d}{dp} \log f(x|p) \right)^2 \right] = \mathbb{E} \left[ \left( \frac{x}{p} - \frac{1 - x}{1 - p} \right)^2 \right] \]

Evaluate this expectation:

\[ = \left( \frac{1}{p} \right)^2 \cdot p + \left( \frac{1}{1-p} \right)^2 \cdot (1 - p) = \frac{1}{p} + \frac{1}{1-p} \]

So:

\[ I(p) = \frac{1}{p(1 - p)} \Rightarrow \text{CRLB} = \frac{1}{n \cdot \frac{1}{p(1 - p)}} = \frac{p(1 - p)}{n} \]

Step 4: Compare with variance of \( \bar{X} \)

We found earlier:

\[ \operatorname{Var}(\bar{X}) = \frac{p(1-p)}{n} = \text{CRLB} \]

✔️ So the sample mean achieves the Cramér–Rao Lower Bound.

🔚 Conclusion:

Since \( \bar{X} \) is unbiased for \( p \) and achieves the CRLB, it is the Minimum Variance Unbiased Estimator (MVUE) for \( p \). That is, it is the best unbiased estimator of \( p \).

🧠 Visual Intuition

Imagine you’re trying to estimate the true probability \( p \) of getting heads in a biased coin using sample proportions. As you increase the number of tosses \( n \), the sample mean \( \bar{X} \) (i.e., proportion of heads) becomes more accurate.

The CRLB says: “No matter how clever your estimator is, if it’s unbiased, its variance can’t be better than this limit.” And the sample mean exactly hits that limit here.

💡 Worth Mentioning

Since \( \bar{X} \) is both sufficient and complete for Bernoulli(\( p \)), Lehmann–Scheffé theorem also confirms it’s the unique MVUE.
The result generalizes to Binomial distributions too: sample mean of i.i.d. Binomials is the MVUE for \( p \).

❓ Question

We are given:

\( X_1, X_2, \ldots, X_n \sim \text{iid } \mathcal{N}(\mu, \sigma^2) \)
Sample mean: \( \bar{X} = \frac{1}{n} \sum_{i=1}^n X_i \)
Estimator:
\[ T^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})^2 \]

We are asked to compute \( \text{Var}(T^2) \), and compare it with the Cramér–Rao Lower Bound (CRLB) for an unbiased estimator of \( \sigma^2 \).

✅ Detailed Solution

🔹 Step 1: Bias of \( T^2 \)

Recall that:

\[ \mathbb{E}[T^2] = \frac{n-1}{n} \sigma^2 \]

So, \( T^2 \) is a biased estimator of \( \sigma^2 \).

Let’s define:

\[ S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 \]

Then \( \mathbb{E}[S^2] = \sigma^2 \), so \( S^2 \) is unbiased for \( \sigma^2 \). And clearly:

\[ T^2 = \frac{n-1}{n} S^2 \]

🔹 Step 2: Variance of \( T^2 \)

Let us find \( \text{Var}(T^2) \) using the fact that \( S^2 \) is a scaled chi-squared variable:

\[ (n-1)\frac{S^2}{\sigma^2} \sim \chi^2_{n-1} \Rightarrow \text{Var}(S^2) = \frac{2\sigma^4}{n-1} \]

Now use \( T^2 = \frac{n-1}{n} S^2 \):

\[ \text{Var}(T^2) = \left(\frac{n-1}{n}\right)^2 \text{Var}(S^2) = \left(\frac{n-1}{n}\right)^2 \cdot \frac{2\sigma^4}{n-1} = \frac{(n-1) \cdot 2\sigma^4}{n^2} = \frac{2(n-1)\sigma^4}{n^2} \]

🔹 Step 3: Cramér–Rao Lower Bound for \( \sigma^2 \)

For \( X_1, \ldots, X_n \sim \mathcal{N}(\mu, \sigma^2) \), the CRLB for an unbiased estimator of \( \sigma^2 \) is:

\[ \text{Var}(\hat{\sigma}^2) \geq \frac{2\sigma^4}{n} \]

This is the minimum possible variance for any unbiased estimator of \( \sigma^2 \).

🔹 Step 4: Comparison

We found:

\( \text{Var}(T^2) = \frac{2(n-1)\sigma^4}{n^2} \)
\( \text{CRLB} = \frac{2\sigma^4}{n} \)

Now compare:

\[ \frac{2(n-1)\sigma^4}{n^2} < \frac{2\sigma^4}{n} \quad \text{since } \frac{n-1}{n} < 1 \]

BUT! This comparison is misleading. CRLB applies only to unbiased estimators, and \( T^2 \) is biased. So even though \( \text{Var}(T^2) < \text{CRLB} \), this does not contradict the CRLB, because the bound doesn’t apply to biased estimators.

✅ Conclusion

\( T^2 \) is biased for \( \sigma^2 \).
Its variance is:
\[ \text{Var}(T^2) = \frac{2(n-1)}{n^2} \sigma^4 \]
The CRLB for unbiased estimators of \( \sigma^2 \) is:
\[ \frac{2\sigma^4}{n} \]
So, \( T^2 \) has smaller variance than the CRLB, but that’s permissible because it is biased.
The best unbiased estimator of \( \sigma^2 \) is \( S^2 \), and it attains the CRLB.

❓ Question

We are given the setup:

\( X_1, X_2, \dots, X_n \overset{iid}{\sim} N(\mu, \sigma^2) \)
Sample mean: \( \bar{X} = \frac{1}{n} \sum_{i=1}^n X_i \)
Estimator:
\[ T^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})^2 \]

✅ Solution

🔍 Step 1: Bias and MSE of \( T^2 \)

We know that:

\[ E[T^2] = \left( \frac{n-1}{n} \right) \sigma^2 \]

Hence, \( T^2 \) is biased for \( \sigma^2 \). Its bias is:

\[ \text{Bias}(T^2) = E[T^2] - \sigma^2 = \left( \frac{n-1}{n} - 1 \right) \sigma^2 = -\frac{\sigma^2}{n} \]

Now compute the Mean Squared Error (MSE):

\[ \text{MSE}(T^2) = \text{Var}(T^2) + \left( \text{Bias}(T^2) \right)^2 \]

We also know that:

\[ \text{Var}(T^2) = \frac{2\sigma^4(n - 1)}{n^2} \quad \text{(from standard results on sample variance)} \]

So,

\[ \text{Bias}(T^2)^2 = \left( \frac{\sigma^2}{n} \right)^2 = \frac{\sigma^4}{n^2} \]

Putting it together:

\[ \text{MSE}(T^2) = \frac{2\sigma^4(n - 1)}{n^2} + \frac{\sigma^4}{n^2} = \frac{\sigma^4}{n^2} (2(n - 1) + 1) = \frac{\sigma^4}{n^2} (2n - 1) \]

📉 Step 2: Cramér–Rao Lower Bound (CRLB) for estimating \( \sigma^2 \)

The Fisher Information for \( \sigma^2 \) in the normal case is:

\[ I(\sigma^2) = \frac{n}{2\sigma^4} \]

Therefore, the CRLB for any unbiased estimator of \( \sigma^2 \) is:

\[ \text{Var}(\hat{\sigma}^2) \geq \frac{1}{I(\sigma^2)} = \frac{2\sigma^4}{n} \]

✅ Step 3: Compare MSE of \( T^2 \) with CRLB

We already found:

\[ \text{MSE}(T^2) = \frac{\sigma^4}{n^2} (2n - 1) \]

Let's compare this to the CRLB:

\[ \frac{\sigma^4}{n^2} (2n - 1) \quad \text{vs.} \quad \frac{2\sigma^4}{n} \]

Multiply both sides of the CRLB by \( \frac{n}{n} \) to make the denominator same:

\[ \frac{2\sigma^4}{n} = \frac{2\sigma^4 n}{n^2} \]

Now we compare:

\[ \frac{\sigma^4 (2n - 1)}{n^2} \quad \text{vs.} \quad \frac{2\sigma^4 n}{n^2} \]

Clearly:

\[ 2n - 1 < 2n \quad \Rightarrow \quad \text{MSE}(T^2) < \text{CRLB} \]

🚫 But this comparison is invalid because the CRLB is only for unbiased estimators, and \( T^2 \) is biased. So the CRLB doesn't directly bound the MSE of \( T^2 \). Still, it's useful to check for insight.

🧠 Final Verdict

\( T^2 \) is a biased estimator of \( \sigma^2 \), with bias \( -\frac{\sigma^2}{n} \).
Its MSE is:
\[ \frac{\sigma^4}{n^2}(2n - 1) \]
The CRLB is \( \frac{2\sigma^4}{n} \), but only applies to unbiased estimators.
So yes: the MSE of \( T^2 \) is less than the CRLB, but this does not violate the CRLB because the CRLB doesn't apply to biased estimators.

✅ Conclusion:

The MSE of \( T^2 \) is less than the Cramér–Rao lower bound for the variance of any unbiased estimator of \( \sigma^2 \), but this is allowed, because \( T^2 \) is not unbiased.

Contents

Question:

Solution:

Related concepts / Other stuff:

Question:

Solution:

Related concepts / Other stuff:

🧠 Visual Intuition:

💡 Worth Mentioning:

Question:

Solution:

Related concepts / Other stuff:

🧠 Visual Intuition:

💡 Worth Mentioning:

Qsn: Better Estimator for \(\theta\) in Exponential Case

Soln:

Related concepts/ other stuffs:

Question:

Solution:

🔹 Step 1: Use Chi-Square Distribution

🔹 Step 2: Known Result: Expected value of square root of chi-square

🔹 Step 3: Solve for \( c \)

Final Answer:

Related concepts / Other stuff:

🧠 Visual Intuition:

💡 Worth Mentioning:

🔷 QUESTION

📘 DETAILED SOLUTION

✅ Step 1: Check Unbiasedness

✅ Step 2: Compute Variance

✅ Step 3: Minimize Variance w.r.t. \(\lambda\)

✅ Final Answer

📌 Related Concepts

🧠 Visual Intuition

💡 Worth Mentioning

❓ Question

✅ Detailed Solution

🔹 Step 1: Use Linearity of Expectation

🔹 Step 2: Impose the Unbiasedness Condition

🔁 “If and only if” Justification

📌 Related Concepts

🧠 Visual Intuition

💡 Worth Mentioning

🔹 Question

✅ Detailed Solution

🔹 Step 1: Compute Variance of the Estimator

🔹 Step 2: Use Lagrange Multipliers

✅ Answer:

📌 Related Concepts

🧠 Visual Intuition

💡 Worth Mentioning

❓ Question

✅ Detailed Solution

Step 1: Define the setup

Step 2: Compute variance of the sample mean

Step 3: Compute the CRLB

Step 4: Compare with variance of \( \bar{X} \)

🔚 Conclusion:

📌 Related Concepts

🧠 Visual Intuition

💡 Worth Mentioning

❓ Question

✅ Detailed Solution

🔹 Step 1: Bias of \( T^2 \)

🔹 Step 2: Variance of \( T^2 \)

🔹 Step 3: Cramér–Rao Lower Bound for \( \sigma^2 \)

🔹 Step 4: Comparison

✅ Conclusion

❓ Question

✅ Solution

🔍 Step 1: Bias and MSE of \( T^2 \)

📉 Step 2: Cramér–Rao Lower Bound (CRLB) for estimating \( \sigma^2 \)

✅ Step 3: Compare MSE of \( T^2 \) with CRLB

🧠 Final Verdict