3. Measures of Spread

The purpose of identifying a “central” value from a data set was to describe a typical value in the data set. Once we know this, we can measure the amount of dispersion or spread of the data values from the typical, central, value. In other words, we’re going to calculate how “spread out” our data is. Three main measures of dispersion for a data set are the range, the variance, and the standard deviation.

The Range

The range of a variable is simply the “distance” between the largest data value and the smallest data value. In math symbols:

Range = largest data value – smallest data value

section2-9
The table shown provides the first exam scores for a class with 11 students. The range for this data set is:

Range = 94 – 64 = 30

Calculating the range requires the use of only two values: the smallest and largest data values. If either of the two values changes, so does the range. Therefore, the range clearly is not resistant to extreme values in the data set. No other data values affect the range.

Sample Variance

The variance of a data set is a numerical summary that indicates the average deviation of each data value from the mean of a data set. The calculation of the variance of a data set requires us to compare each data value from our raw list, {x1,x2,x3,…,x(n-1),xn}, to the mean . The idea of deviation is just the difference, as computed by subtraction. In symbols, the deviation about the mean for the ith data value, xi, is the value: (xi – x̄).

Because of the definition of the mean of a data set, if you add up the deviation from the mean for each data value, you will always get zero. In symbols, Σ(xi – x̄) = 0. This is a bit technical, but what it basically means is that we cannot just average the sum of deviations. We’d always get zero!!

To get around this, we need a way to make all deviations from the mean positive, regardless of whether a data value is below or above the mean. For example, if you live two miles north of a city and I live two miles south of the same city, it would be ridiculous to say, “I live negative 2 miles from the city.” We both live two miles away.

Mathematically, one way to make all deviations positive is to use an absolute value. Another way, which we’ll use for calculating both the variation and standard deviation of a data set, is to square each deviation. For the city example, your deviation value would be 22 = 4 and my deviation value would be (-2)2 = 4. Therefore, our deviation, regardless of being positive or negative, would be the same! So, to treat positive differences and negative differences as the same, we square the deviations: (xi – x̄)2.

Finally, since the variance measures the average deviation of each data value from the mean of the entire data set, we add up the squared-deviation value for each data point and divide by the value (n – 1), one fewer than the number of data values. This is another techical “difficulty” that we’ll deal with later. The value (n – 1) is given the special designation degrees of freedom of a data set. The reason for this will be made more clear throughout the class, but imagine the following simple scenario: You and four friends go to a Chinese restaurant, and at the end of the meal, your server brings your group 5 fortune cookies, setting them in a pile in the middle of the table. How many of your party of 5 get to actually choose their fortune? Only 4. The reason is obvious: after 4 people have had their choice of fortune cookies, only one remains. The fifth person has no choice of fortune. The degrees of freedom for this “problem” is thus 5 – 1 = 4.

Here’s another example, this time from a math standpoint: if someone tells you that they are thinking of 3 numbers whose average is 5, how many of the three numbers do you need to know before you know all 3? After a little thought, you’ll realize the answer is 2. If you are told that two of the numbers are 2 and 10, a little thought (and some algebra) will help you find that the last number must be

section2-10
Again, the degrees of freedom for this problem is 3 – 1 = 2.

In all its glory, the math formula for calculating the sample variance is:

section2-11

where n is the size of the sample.

Example: Calculations for a Sample Variance

Returning to the population of exam scores for a class with 11 students, the table above illustrates the (sometimes tedious!) calculations for the population variance. The mean of this population of data is = 82.

The total squared deviation for the population data is 1272. Therefore, the variance for the data set is:

section2-12

Score Deviation From Mean Squared Deviation
94 94 – 82 = 12 122 = 144
87 87 – 82 = 5 52 = 25
95 95 – 82 = 13 132 = 169
68 – 14 196
72 – 10 100
75 – 7 49
88 6 36
89 7 49
94 12 144
76 – 6 36
64 – 18 324
SUM = 1272