## 1. Chi-Squared Test

In this sub-competency, we will build on the notations begun in sub-competency 11, where we looked at comparing two groups or populations. We will also be using what you learned about two-way tables in sub-competency 6 as well. You may wish to review these sections before beginning here.

In sub-competency 11, we compared two groups of numerical data, two populations or two treatment groups. However, this was a way to compare quantitative variables. What if instead we wished to compare two categorical variables? To do this we will need a new test and a new distribution, called the chi-squared test or x-squared test.

## 2. Two-Way Tables

Recall that as we saw in sub-competency 6 we can organize two categorical variables in a two-way table of counts. Here is an example to start us down the path to analyzing this type of data.

Example: Asthma and Smoking

The table below describes the smoking habits of a group of asthma sufferers in comparison to their continent of residence.

 Location Nonsmoker Occasional Smoker Regular Smoker Heavy Smoker Total North America 339 33 61 34 467 South America 377 132 184 136 829 Total 716 165 245 170 1296

Since there are 2 genders to consider and 4 possible smoking habits, there are 8 possible counts that occupy the cells of the table. Now if we wanted to compare these counts, this can be hard to do since there are so many more people responding from South America, than the other category. One of the ways to adjust this is to rewrite the table in terms of percentages, using the row totals.

 Location Nonsmoker Occasional Smoker Regular Smoker Heavy Smoker Total North America 72.59% 7.07% 13.06% 7.28% 100% South America 45.47% 15.92% 22.20% 16.41% 100%

Just as a reminder, this is the conditional distribution of the smoking habits, given their continent of residence. If we organize this into a graph, we can more easily compare the categorical data. If we want to compare the Continent category, we will begin with the assumption that there is no difference in distribution in the outcomes for North and South America. This will form our null hypothesis. This creates a problem of multiple comparison. It is a misconception to look at the nonsmoking category only and declare that North America is greater, since it contains the largest difference. Hopefully you are not fooled, since in all of the other smoking categories South America is greater. By looking at the entire graph, you may be tempted to say that there is a difference since in three categories South America is always higher, but you should wonder is that difference significant? You should wonder even more because this difference is not consistent through the categories.

### Expected Counts of Two-Way Tables

Our null hypothesis is that there is no relationship between the two categorical variables in our two-way table. Acting under this assumption, we can ask what value would we expect to get in the cells, provided that this is true? These are called the expected counts, and are found by multiplying the row and column totals and then dividing by the tables total.

 Expected Counts Table Location Nonsmoker Occasional Smoker Regular Smoker Heavy Smoker Total North America 716 ⋅ 467 / 1296 = 258.00 165 ⋅ 467 / 1296 = 59.46 245 ⋅ 467 / 1296 = 88.28 170 ⋅ 467 / 1296 = 61.26 467 South America 716 ⋅ 829 / 1296 = 458.00 165 ⋅ 829 / 1296 = 105.54 245 ⋅ 829 / 1296 = 165.72 170 ⋅ 829 / 1296 = 108.74 829 Total 716 165 245 170 1296

The calculations in expected counts work because you are using the count for the total in comparison to the total per column and row. Alternatively, you can view the column total over the total as a probability, and then multiply by the row total, to get the expected value for that row category. This is similar to using np to estimate the expected value.

## 3. The chi-Squared Test

If one thinks back to the problems that considered the difference of two proportions, the method considered a binomial variables, with probability of success p; and compared this with another proportion in order to find a statistically significant difference. The x2-test is an extension of this concept to a multinomial trials where there are k outcomes, each with probability of success p1, p2, … , pk. The x2 compares the observed outcomes with the outcomes expected by assuming a null hypothesis H0 is true. The use of the chi-squared test statistic is only appropriate when all of the expected counts are greater than or equal to 5!

The test statistic for the chi-squared test is a measure of how far apart the observed values are from the expected in all cells of the two-way tables. Returning to our example, our test statistic is: This tells us that large values of the test statistic will indicate that the values are far apart, or rather that the distributions are different. This will give us evidence to suggest that our null hypothesis is not true. Be careful, as we will see later the x2distribution is not symmetric, and the alternative hypothesis has many options for sides and directions but chi is one-sided. As such, any violation of the null hypothesis will produce a large test statistic. However, small values of x2 are not evidence against the null hypothesis.

### The chi-Squared Distribution

The chi-squared distribution is different depending on the degrees of freedom present in our data, just like was true for the Student’s t-distribution. For this reason calculating p-values on this particular distribution is best done using technology or tables. Since there are two categorical variables to consider our degrees of freedom is k = (r – 1)(c – 1), where r and c are the number of rows and columns, respectively. The graph below is how the distribution of chi-squared looks for three different values of k. Example 2

The article “Determination of Carboxyhemoglobin Levels and Health Effect on Officers Working at the Istanbul Bosphorus Bridge”(G. Kocasoy and H. Yalin, Journal of Environmental Science and Health, 2004: 1129-1139) presents assessments of health outcomes of people working in an environment with high levels of carbon monoxide (CO). Following are the numbers of workers reporting various symptoms, categorized by work shift. Can you conclude that the proportions of workers with the various symptoms differ among shifts?

 Ailment Shift Totals Morning Evening Night Influenza 16 13 18 47 Headache 24 33 6 63 Weakness 11 16 5 32 Shortness of Breath 7 9 9 25 Total 58 71 38 167

Solution: First the null hypothesis is stated, “H0 is there is no difference in the proportion of workers with the various symptoms between the shifts.” This is used to generate the following expected table.

 Expected Ailment Shift Morning Evening Night Totals Influenza 16.232 19.982 10.695 47 Headache 21880 26.784 14.335 63 Weakness 11.114 13.605 7.281 32 Shortness of Breath 8.683 10.629 5.689 25 Total 58 71 38 167

Next we calculate the x2-test statistic. If you are familiar with Excel, you can create a table of the actual counts and a table of the expected counts, and then use the command CHISQ.TEST(actual_range,expected_range), to calculate x2-test statistic. Now we calculate the degrees of freedom k = (4 – 1) (3 – 1) = 6. Then we can use the command `CHIDIST(x,degrees_freedom`, to calculate the p-value in Excel. For this example we do CHIDIST(17.570, 6) =0.007402. This p-value is less than 5%, so one may conclude that there is enough evidence to reject the null hypothesis. So we conclude that there is evidence to suggest that the proportion of workers with the various symptoms differs among the shifts.

Alternatively, we could calculate a critical value from the distribution of x2 with 6 degrees of freedom and compare it with our test statistic 17.570. This can be done from a table or on Excel using the CHIINV(probability,degrees_freedom) command. Please note that this gives the critical value for the upper one-tailed test. In our example, the critical value is CHIINV(0.05,6) = 12.592. When we compare the critical value is less than our test statistic, as 12.592 < 17.570. Hence our test statistic falls in the rejection region, which follows our earlier conclusion.

### Additional Uses of the chi-Squared Test

The chi-squared distribution can also be used to test of significance about variance or the standard deviation of the normal distribution. It also can be used to test the goodness of fit for a theoretical model against sample data.

### Chi-Squared Testing with the TI-83/84

All of these test can be found by hitting the [STAT] button and arrowing over to the TESTS menu.

Calculator Example: The Chi-Squared Goodness of fit test.

If births were uniformly distributed across the week, we would expect that about 1/7 of all births occur during each day of the week. How closely do the observed number of births fit this expected distribution? The chi-square goodness-of-fit test is used to determine whether an observed frequency distribution is significantly different from the expected distribution, or how “good” (sic) the two distributions fit each other. If we were only interested in one day of the week, we could conduct a 1-proportion z test. However, because we have seven hypothesized proportions, we need to conduct a test that considers all of them together and gives an overall indication of whether the observed distribution differs from the expected one. The chi-square goodness-of-fit test is just what we need. Let’s consider the frequency distribution of all 2008 Wisconsin births by day of the week. NOTE: This is not an option on all calculators, yours must have the GOF test on it..

Solution for the TI-84:

1. Enter the observed data into L1.

2. Here we are hypothesizing that the births all occur in equal proportions for every day of the week. Now compute the expected frequencies as Expected=n/k, n is the total number of trials (births) and k is the number of different categories(days of the week). For this example E=116823/7=16689 for all the days since they are all the same. Enter 16689 into all the rows for L2. 3. Now hits [Stat] arrow over to the TESTS menu, arrow down to D: χ2  GOF-Test hit ENTER. Then enter in the following for the screen: 4. The degrees of freedom is k-1 highlight and hit enter to get: 5. It gives you both the value of the χ2 test statistic and its associated P-value. CNTRB provides a list of the CoNTRiButions of each category to the overall χ2 value. Use the arrow key to scroll through these numbers. Round chi-square values to 3 decimal places and P-values to 3 significant figures. You could report these results as P( χ2> 3679.867) ≈0.

What does this mean?

If births were in fact distributed uniformly across the seven days of the week, an observed χ2value of 3679.867 would occur about 0% of the time. This result is certainly unusual, so we reject H0 and conclude that the sample data are consistent with births being non-uniformly distributed across the seven days of the week.