Mathsy Coding

Unbiased Estimators for Mean and Variance

Unbiased estimators for mean and variance, along with proofs
Edits:
  • : Changing image and file paths

When we are presented with data, we often want to try to get some sort of grasp on how it is shaped. To this end, we would like to be able to estimate the mean and variance (or standard deviation). Let’s dive in to these estimators, and look at proofs that they are unbiased!

Estimators

When we are examining a population, we are often interested in parameters - that is, things that are true about the entire population. For instance, we might care about the average length of a population of snakes or the variance in the lifespan of a population of mice. At least from a frequentist perspectives, both of these are actual quantities which we could, with unlimited time and energy, determine. However, instead of the labourious process of finding population parameters, we use an estimator to get close. An estimator is something which we can calculate as a proxy for the actual parameter of interest.

Of course, we would like our estimators to eventually get close to the actual parameter. Ideally, we would like an unbiased estimator. Like the name suggests, an estimator θ^\hat{\theta} of the parameter θ\theta is unbiased if E[θ^]=θE[\hat{\theta}] = \theta . Remember also that the expected value of a random variable XX is E[X]=i=1nxipiE[X] = \sum_{i=1}^n x_i p_i, where xix_i are the outcomes (values which XX could take on) and pip_i is the probability that the outcome xix_i occurs. Of course, if the distribution from which XX is drawn is continuous, then it would be an integral instead, but for the purpose of today’s proofs we’ll focus on discrete distributions.

An Estimator for the Mean

Assume that you have drawn a sample X1X_1, X2X_2, \dots XnX_n, all independent, from a distribution with mean μ\mu and variance σ2\sigma^2. Then an unbiased estimator of the mean is the sample mean X=1ni=1nXi\overline{X} = \frac{1}{n} \sum_{i=1}^n X_i.

Proof:

E[X]=E[i=1nXin]=1ni=1nE[Xi]=1ni=1nμ=1nnμ=μ\begin{align*} E[\overline{X}] &= E\left[ \frac{ \sum_{i=1}^n X_i }{n} \right] \\ &= \frac{1}{n} \sum_{i=1}^n E[X_i] \\ &= \frac{1}{n}\sum_{i=1}^n \mu \\ &= \frac{1}{n} \ast n\mu \\ &= \mu \end{align*}

And so we have it.

An Estimator for the Variance

Finding an unbiased estimator for the variance is substantially more complicated than for the mean. First, a reminder: the variance of a random variable XX is Var(X)=1ni=1npi(xiμ)2=E[(xiμ)2]\text{Var}(X) = \frac{1}{n} \sum_{i=1}^n p_i (x_i - \mu)^2 = E[(x_i - \mu)^2], where again the xix_i are the possible values which XX can take on. Naively, we would imagine that with our same sample X1, X2, , XnX_1,\ X_2,\ \dots,\ X_n drawn from a distribution with mean μ\mu and variance σ2\sigma^2, an unbiased estimator would simply be 1ni=1n(XiX)2\frac{1}{n} \sum_{i=1}^n (X_i - \overline{X})^2. However, it turns out that is not unbiased! The correct (or at least a correct) unbiased estimator is called the sample variance, s2=1n1i=1n(XiX)2s^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \overline{X})^2. The proof that this is unbiased relies on a few other facts - let’s derive these!

Var(X)=E[X2]E2[X]\begin{equation} \text{Var}(X) = E[X^2] - E^2[X] \tag{1} \end{equation}

Proof:

Var(X)=E[(Xμ)2]=E[X22Xμ+μ2]=E[X2]2μE[X]+E[μ2]=E[X2]2μ2+μ2=E[X2]μ2=E[X2]E2[X]\begin{align*} \text{Var}(X) &= E[(X - \mu)^2] \\ &= E[X^2 - 2X\mu + \mu^2] \\ &= E[X^2] - 2\mu E[X] + E[\mu^2] \\ &= E[X^2] - 2\mu^2 + \mu^2 \\ &= E[X^2] - \mu^2 \\ &= E[X^2] - E^2 [X] \\ \end{align*}

We will also need the following:

E[X2]=1nE[X2]+n1nE2[X]\begin{equation} E\left[\overline{X}^2\right] = \frac{1}{n}E[X^2] + \frac{n-1}{n} E^2[X] \tag{2} \end{equation}

Proof

E[X2]=E[(i=1nXin)2]=1n2E[(i=1nXi)2]=1n2E[i=1nj=1nXiXj]=1n2E[i=1nXi2+ijXiXj]=1n2[i=1nE[Xi2]+ijE[Xi]E[Xj]]=1n2[i=1nE[X2]+ijE[X]E[X]]Because they are all identically distributed and independent=1n2[nE[X2]+(n2n)E2[X]]=1nE[X2]+n1nE2[X]\begin{align*} E[\overline{X}^2] &= E\left[ \left( \frac{\sum_{i=1}^n X_i}{n} \right)^2 \right] \\ &= \frac{1}{n^2} E\left[ \left( \sum_{i=1}^n X_i \right)^2 \right] \\ &= \frac{1}{n^2} E\left[ \sum_{i=1}^n \sum_{j=1}^n X_i X_j \right] \\ &= \frac{1}{n^2} E\left[ \sum_{i = 1}^n X_i ^2 + \sum_{i \neq j} X_i X_j \right] \\ &= \frac{1}{n^2} \left[ \sum_{i=1}^n E[X_i ^2] + \sum_{i \neq j} E[X_i] E[X_j] \right] \\ &= \frac{1}{n^2} \left[ \sum_{i=1}^n E[X^2] + \sum_{i \neq j} E[X] E[X] \right] & \text{Because they are all identically distributed and independent}\\ &= \frac{1}{n^2} \left[ nE[X^2] + (n^2 - n)E^2[X] \right] \\ &= \frac{1}{n} E[X^2] + \frac{n-1}{n} E^2[X] \end{align*}

Now that we have those two fact all settled, we can prove that the sample variance is an unbiased estimator of the variance! That means that the expected value of the estimator s2s^2 should be the target parameter, σ2\sigma^2.

So, assume that you have some distribution with mean μ\mu and variance σ2\sigma^2, and that we draw nn observations X1, X2,, XnX_1,\ X_2, \dots,\ X_n, all independent, from this distribution, and from it we calculate the sample variance s2=1n1i=1n(XiX)2s^2 = \frac{1}{n-1}\sum_{i=1}^n \left( X_i - \overline{X}\right)^2. Then:

E[s2]=E[1n1i=1n(XiX)2]=1n1E[i=1n(Xi22XXi+X2)]=1n1E[i=1nXi22Xi=1nXi+i=1nX2)]=1n1E[i=1nXi22XnX+nX2]=1n1E[i=1nXi2nX2]=1n1(E[i=1nXi2]nE[X2])=1n1(i=1nE[Xi2]nE[X2])=1n1(nE[X2]nE[X2])Since the expectation is the same for all Xi=1n1(nE[X2]n(1nE[X2]+n1nE2[X]))By (2)=1n1(nE[X2]E[X2](n1)E2[X])=1n1((n1)E[X2](n1)E2[X])=E[X2]E2[X]=Var(X)By (1)=σ2\begin{align*} E[s^2] &= E\left[ \frac{1}{n-1} \sum_{i=1}^n \left(X_i - \overline{X} \right)^2 \right] \\ &= \frac{1}{n-1} E\left[ \sum_{i=1}^n (X_i^2 -2\overline{X} X_i + \overline{X}^2) \right] \\ &= \frac{1}{n-1} E\left[ \sum_{i=1}^n X_i^2 -2\overline{X} \sum_{i=1}^n X_i + \sum_{i=1}^n \overline{X}^2) \right] \\ &= \frac{1}{n-1} E\left[ \sum_{i=1}^n X_i^2 -2\overline{X} * n\overline{X} + n \overline{X}^2 \right] \\ &= \frac{1}{n-1} E\left[ \sum_{i=1}^n X_i^2 - n\overline{X}^2 \right] \\ &= \frac{1}{n-1} \left( E\left[ \sum_{i=1}^n X_i^2 \right] - n E \left[\overline{X}^2 \right] \right) \\ &= \frac{1}{n-1} \left( \sum_{i=1}^n E[X_i^2] - n E \left[\overline{X}^2 \right] \right) \\ &= \frac{1}{n-1} \left( n E[X^2] - n E \left[\overline{X}^2 \right] \right) && \text{Since the expectation is the same for all $X_i$}\\ &= \frac{1}{n-1} \left( n E[X^2] - n\left( \frac{1}{n}E[X^2] + \frac{n-1}{n} E^2[X] \right) \right) && \text{By (2)}\\ &= \frac{1}{n-1} \left( n E[X^2] - E[X^2] - ( n-1 ) E^2[X] \right) \\ &= \frac{1}{n-1} \left( ( n - 1 ) E[X^2] - ( n-1 ) E^2[X] \right) \\ &= E[X^2] - E^2[X] \\ &= \text{Var}(X) && \text{By (1)} \\ &= \sigma^2 \end{align*}

And there we have it! The sample variance is an unbiased estimator for the variance.

Implementations for the Sample Variance: JavaScript, Python, R

Now that we’ve looked at the mathematics behind this, let’s look at how we can write a function for the sample variance in JavaScript, Python, and R:

In JavaScript

function calculateSampleVariance(numbers) {
	const n = numbers.length;
	const xBar = numbers.reduce((sum, current) => sum + current, 0) / n;
	const sampleVariance =
		numbers.reduce((sum, current) => sum + (current - xBar) ** 2, 0) /
		(n - 1);
	return sampleVariance;
}

console.log(
	`Sample variance of [1, 1, 1, 1, 1]: ${calculateSampleVariance([
		1, 1, 1, 1, 1,
	])}`
);
console.log(
	`Sample variance of [1, 2, 3, 4, 5]: ${calculateSampleVariance([
		1, 2, 3, 4, 5,
	])}`
);

Output:

Sample variance of [1, 1, 1, 1, 1]: 0
Sample variance of [1, 2, 3, 4, 5]: 2.5

In Python

def calculate_sample_variance(numbers: list) -> float:
    """ Calculate the sample variance """
    n = len(numbers)
    x_bar = sum(numbers) / n
    sample_variance = sum(map(lambda x: (x - x_bar) ** 2, numbers)) / (n - 1)
    return sample_variance


print(
    f"Sample variance of the set [1, 1, 1, 1, 1]: {calculate_sample_variance([1, 1, 1, 1, 1])}"
)
print(
    f"Sample variance of the set [1, 2, 3, 4, 5]: {calculate_sample_variance([1, 2, 3, 4, 5])}"
)

Output:

Sample variance of the set [1, 1, 1, 1, 1]: 0.0
Sample variance of the set [1, 2, 3, 4, 5]: 2.5

In R

calculateSampleVariance <- function(numbers) {
	n <- length(numbers)
	xBar <- sum(numbers) / n
	sampleVariance <- sum( ( numbers - xBar ) ** 2 ) / (n - 1)
	return(sampleVariance)
}

sprintf("Sample variance of [1, 1, 1, 1, 1]: %.1f", calculateSampleVariance(c(1, 1, 1, 1, 1)))
sprintf("Sample variance of [1, 2, 3, 4, 5]: %.1f", calculateSampleVariance(c(1, 2, 3, 4, 5)))

Output:

[1] "Sample variance of [1, 1, 1, 1, 1]: 0.0"
[1] "Sample variance of [1, 2, 3, 4, 5]: 2.5"

Conclusions

Above, I’ve presented unbiased estimators for the mean μ\mu and variance σ2\sigma^2 of a distribution. With these, given independent samples from a distribution, we can provide point estimates for these parameters. However, there are some questions left to ask and things to note. For instance, we might assume that since the sample variance is an unbiased estimator of the variance, s=i=1n(XiX)n1s = \sqrt{\frac{\sum_{i=1}^n (X_i - \overline{X}) }{n-1}} would be an unbiased estimator of the standard deviation. However, this is not the case.

A more pressing issue is how much we should believe the values that we get. That is, imagine that we take a sample of size 10 and get a sample mean of m=4m=4 and sample variance s2=3s^2 = 3. Then we take a much larger sample of size 100 and get the exact same results! Obviously we expect that the larger sample will be more likely to be correct, but the question is - how much more should we believe the larger sample?

Perhaps these will be ideas for later. For now, have a great day!

Sources