## 3. The Central Limit Theorem (CLT)

We saw above that if the individual values from a data set follow a normal distribution N(μ, σ), then the distribution of sample means also has a normal distribution, but with a smaller standard deviation What if the set of individual data values does not come from a normal distribution? Another important idea from taken from the above picture is the Central Limit Theorem (CLT), which states that as the sample size n increases, the sampling distribution of  becomes approximately normal. Therefore, even if the individual data values come from a continuous distribution that is skewed, by averaging enough values from a sample, the distribution of sample means will become normal.

Example (from Fundamentals of Statistics, by Sullivan)

This problem deals with the non-preventable contamination of food with certain particles. The Food and Drug Administration (FDA) sets acceptable levels of foreign substances that end up in our food and drink. For example, the acceptable level for insect fragments in peanut butter is 3 fragments per 10 grams. Suppose a random sample of n = 50 ten-gram portions of peanut butter is collected and it is found that = 3.6 for the 50 samples.

1. We can be sure that the sampling distribution of the sample mean  will be approximately normal because the sample size n = 50 is rather large. See the CLT for more information.
2. If we know that the mean and standard deviation of the individual data values is μ = 3 and σ = standard deviation = √3, then the mean and standard deviation of the sample mean  will be μ = 3 and Thus, the sampling distribution of sample means is approximately Normally distributed according to N(3, 0.245).
3. Since our sample of size n = 50 results in a sample mean 3.6, we want to know the probability of seeing such a sample mean value or larger; in symbols:

P( ≥ 3.6)

First we need to sketch a normal curve with the mean and standard deviation values from part (b). Notice how tiny the area is in the upper (right) tail. To compute this probability, we first need to convert the value 3.6 into a standard normal value using the Z-transformation: Remember, this z-value tells us that our sample mean result from the 50 samples is a value that falls 2.45 standard deviations ABOVE the intended target mean of μ = 3.

Using a table of standard normal values, we find that with z = 2.45, the associated area (or probability) is 0.9929. Does this seem correct? No! Since our area is to the right of the z-score, we need to subtract 0.9929 from 1. Therefore:

P( ≥ 3.6) = 0.0071 = 0.71%

In other words, only 71 times out of 10,000 would we ever expect that a sample of n = 50 ten-gram samples of peanut butter would result in a sample average of more than 3.6 insect fragments. This is very unusual. In fact, it’s so unusual that would we never expect it to occur. In other words, something could be very wrong at the plant responsible for packaging the peanut butter!

Notice how the work in part (c) differs from the work we did in sub-competency 3. At that point we wanted to know about the probability of finding one individual data value that satisfied some condition. In part (c) we are asking about the probability of taking a random sample of size n = 50 and having the average of the sample be larger than a specified condition. What do you think is harder to find, one value that satisfies a condition, or an average of a bunch of values that satisfies the condition?