The most important idea of this unit is that of the **sampling distribution of the sample mean**. To help understand this, think about the following example:

**Example**

Suppose I want to estimate the average height of all eight-year-old girls. I can proceed by randomly selecting 100 eight-year-old girls, computing the sample mean *x̄*_{1} of the 100 heights, and then using *x̄*_{1} as my estimate of the average height of all eight-year-old girls. This is an example of **using the sample mean to estimate the population mean**.

Now suppose *you* conduct your own random sample of 100 eight-year-old girls and compute the mean *x̄*_{2} of your sample. You would use *your* average value *x̄*_{2} as the estimate of the average height of all eight-year-old girls. Almost surely my sample mean *x̄*_{1} and your sample mean *x̄*_{2} will be different values since *data varies*. My random sample will consist of different girls than your sample, which will lead to different estimates of the true (yet still unknown) average height.

Since our two sample means are different, we decide to work together to conduct one last random sample of 100 eight-year-old girls. We compute the mean *x̄*_{3} of the sample. What do you think will happen? Most likely *x̄*_{3} will be different from <*x̄*_{1} and *x̄*_{2}! In other words, we now have *yet another* estimate of the average height of all eight-year-old girls. In fact, each time we sample a different group of 100 girls, we will likely get a different result. This means that the sample mean *x̄* is a **random variable!**

There is only one true average height of all eight-year-old girls… but we’d need to get this info from the entire population of eight-year-old girls. Here’s the real question: How is it that we can use different values of the sample average as estimates for the one, true average height of all eight-year-old girls? The answer to this question reveals the beauty of the **sampling distribution of the sample mean**. Because the sample mean *x̄* is a random variable, the sample mean itself has a mean and a standard deviation, and thus has a probability distribution.

If we took many random samples of 100 eight-year-old girls, computed the average for each, and then created a distribution of our average values, we would see a stunning picture… our distribution of sample average values will be a normal distribution!

### The Mean and Standard Deviation of the Sampling Distribution of the Sample Mean

Suppose the random variable *X* has a normal distribution *N*(*μ*, *σ*). We need some new notation for the mean and standard deviation of the distribution of sample means, simply to differentiate from the mean and standard deviation of the distribution of individual values. Denote the mean of the distribution of sample means by *μ _{x̄}* and denote the standard deviation of the distribution of sample means by

*σ*.

_{x̄}The following figure illustrates a surprising result of a sampling distributions of sample means: it really doesn’t matter what the distribution of individual values looks like, if we make a histogram of many *sample means*, the distribution will almost always be approximately normally distributed!! The value *n* represents the size of the sample from the original population (for example, we used an *n* = 100 for the above example of 8-year-old-girl heights.

*Image from: Nature Methods 10, 809–810 (2013); doi:10.1038/nmeth.2613. Published online 29 August 2013*

If you pay attention to the shapes of the distribution of sample means from the figure, you should notice two things:

- The distribution of sample means seems to have the same center as the distribution of individual values, and
- The variation in the distribution of sample means is
*much*less than the original distribution. (In other words, the distribution of sample means is much skinnier than the distribution of individual values.)

The first point tells us that the mean of the distribution of sample values, denoted *μ _{x̄}*, is the same as the mean of the individual values,

*μ*! Therefore,

*μ*=

_{x̄}*μ*.

What about the second point? The process of averaging takes many *individual* values that are spread out and reduces them to one value, namely the sample mean *x̄*. All that variation in the individual values is significantly reduced! Therefore, it should be obvious that *σ _{x̄}* <

*σ*. In fact, due to calculations and theory that go beyond this course, it turns out that:

where *n* is the size of the random sample.

**Important Aside Note:** You can see how the law of large numbers operates now… as the size *n* of your sample increases, the standard deviation of the distribution of sample means decreases. (A fraction with a fixed numerator and an increasingly large denominator becomes a very small fraction.) In other words, as *n* increases, the value

gets smaller, and thus *x̄* is algebraically “forced” to fall closer and closer to the true population mean *μ*.

Putting the above two points together tells us that the distribution of sample means, *x̄*, follows the normal distribution: