4. A Hypothesis Test Regarding Two Population Proportions

Skipping most of the details, the null hypothesis is the assumed condition that the proportions from both populations are equal,H0: p1 = p2, and the alternative hypothesis is one of the three conditions of non-equality.

When calculating the test statistic z0 (notice we use the standard normal distribution), we are assuming that the two population proportions are the same, p1 = p2 = . Now if both Population 1 and Population 2 are the same in terms of the required proportion, they could be considered to be the “same” population. (Think about this a bit.) We define  to be the pooled population proportion:

section11-5

Substituting  into the sample standard deviation expression gives:

section11-6

The formula for the test statistic z0 becomes:

section11-7

The term p1p2 in the numerator disappears because we are assuming that p1 = p2, so p1p2 = 0.

All other steps for the hypothesis test remain the same as discussed in sub-competency 9.

Example

A nutritionist claims that the proportion of individuals who have at most an eighth-grade education and consume more than the USDA’s recommended daily allowance of 300 mg of cholesterol is higher than the proportion of individuals who have at least some college and consume too much cholesterol. In interviews with 320 individuals who have at most an eighth-grade education, she determined that 114 of them consumed too much cholesterol. In interviews with 350 individuals with at least some college, she determined that 112 of them consumed too much cholesterol per day.

The most challenging part of using inference tools on problems dealing with two populations is keeping the information straight. Before starting any of the inference methods, take a few moments to label each population. That way you won’t get confused about which information pertains to which population. For example, Population 1 is the group of people who have at most an eighth-grade education and consume more than the USDA’s recommended daily allowance of 300 mg of cholesterol. Therefore, Population 2 is the group of people who have at least some college education and consume too much cholesterol.

First, let’s perform a hypothesis test on the difference in the two population proportions using a level of significance α = 0.05 (i.e., 5%). Keeping the information straight, we find:

Population 1 Population 2
n1 = 320
p1 = 114/320
= 0.356.25
n2 = 350
p2 = 112/350
= 0.32

 

The null hypothesis, which is stated in Step 1, is the assumption that the two populations do not differ in terms of the characteristic of interest. We therefore need to determine the pooled proportion, :

section11-8

Step 1: State the null and alternative hypotheses.

  • H0: p1 = p2
  • H1: p1 > p2

Notice that this is a one-tail test, since the nutritionist claims that p1 “… is higher than…” p2.

Step 2: Determine the test statistic z0.

Using the calculated information shown in the above chart, we see:

section11-9

In other words, the two population proportions are roughly only 1 standard deviation different from each other.

Step 3: Determine the P-value and Identify the Level of Significance

Using the test statistic z0 ≈ 0.99, a table of standard normal values indicates that the P-value is 0.1611. Using technology, we get the results z0 ≈ 0.9913 and P-value = 0.1608.

The level of significance was given to us as α = 0.05.

Step 4: Make Appropriate Conclusions

Because our P-value is far greater than the level of significance, α = 0.05, the conclusion of our hypothesis test is: Do Not Reject the Null Hypothesis H0: p1 = p2. In other words, the data set does not provide significant evidence that there is a real difference in how college-educated people and those with at most an eighth-grade education consume cholesterol.

A Confidence Interval Approach

For fun, let’s continue with this example but use a 95% confidence interval about the difference between the two population proportions p1p2. Using the information in the table above, we compute:

Equation

 

In interval notation, our 95% confidence interval is:

(-0.0355,0.1080)

Since the value 0 is contained within this interval, this means there is likely no difference in the two population proportions. This supports the conclusions of our hypothesis test.