Let’s quickly investigate the four main parts of any hypothesis test:

### 1. The Null and Alternative Hypotheses

In statistics, a hypothesis is a statement, or assumption, about the characteristics of one or more variables in one or more populations. Since a statement can be either true or false, there are two hypotheses to identify.

- The
**null hypothesis**, denoted by*H*_{0}, is the statement about a value of a population parameter that we intend to test. Since*H*_{0}is the statement we (or someone else) believe to be true,*H*_{0}is the statement of “no difference,” and thuscontains the condition of equality. The conclusions of our hypothesis test will be either: “reject*always**H*_{0}” or “do not reject*H*_{0}.” Keep in mind that we always assume the null hypothesis to be true until we get evidence from sample data that suggests otherwise. - The
**alternative hypothesis**, denoted by*H*_{α}(or*H*_{1}in other resources), must be what is true if the null hypothesis is false. There are three ways to be different from the null hypothesis test: larger than, smaller than, or just plain different (not equal). Therefore, the notation for*H*_{α}will always contain a condition of inequality. If*H*_{α}contains either of the inequalities > or <, we call the hypothesis test a, since there is only one way to be greater than or one way to be less than*one-tail test**H*_{0}. If*H*_{α}contains the inequality ≠, then we call the hypothesis test a, since being not equal to something means you could be less than or greater than the given value.*two-tailed test*

To see the difference between a one-tail test and a two-tail test, imagine that I ask you to guess a number between one and 10, and in addition I tell you that the number is greater than 5. What numbers will you guess? You will choose numbers in only **one direction** from 5…those greater than 5. You won’t bother guessing numbers below five, since I specified the direction where you should start guessing numbers. Now suppose I ask you to guess a number between one and 10, and all I tell you is that the number is different from (i.e., not equal to) 5. Which numbers will you guess this time? You now have to choose numbers in **two directions** from 5…those less than five and those greater than 5.

### 2. The Test Statistic

Once we have a statement of hypothesis, the next thing that happens is we analyze sample data (collected appropriately…see sampling in subcompetency 5) using both graphical and numerical summaries (subcompetencies 1 and 2, respectively). We specifically want to use numerical summaries to compare with what is being claimed in the null hypothesis. The types of hypothesis tests you will run in this course will focus on either population means (*μ* vs. *x̄*) or population proportions (*p* vs. *p̂*).

The nice thing about computing a test statistics is that it is a computation you are already familiar with. In subcompetency 3 you were introduced to the *z*-score

which in English terms is:

For hypothesis testing, we’ll use the same form but slightly modify the terms:

### 3. Probability Values and Statistical Significance

The key to making an appropriate conclusion to a hypothesis test is to identify results that are **statistically significant**. When results observed from the sample are ** unlikely** under the assumption that the null hypothesis is true, we say the result is statistically significant. If our sample results are unlikely, then we reject the null hypothesis

*H*

_{0}. Repeat this to yourself over and over: if we have statistically significant results, we reject

*H*

_{0}. In other words, if the difference between our sample result and the null hypothesis claim is large (which results in a really small probability value), then we reject

*H*

_{0}.

There are actually TWO ways to proceed with hypothesis testing, the “classical” method and the *P*-value (probability value) approach, with both giving the same conclusions to the test. For this class we will use the *P*-value approach in our hypothesis testing, as it is far more prevalent in scientific research. Our goal is to calculate the probability of observing a sample statistic as extreme, or more extreme, than the one observed from our sample *under the assumption that the null hypothesis is true*. Since AREA = PROBABILITY, a *P*-value represents the total area that lies outside of the region(s) defined by our test statistic(s). The main question we are trying to answer is: Could random variation alone account for the difference between the null hypothesis and our observations from a random sample? A small *P*-value implies that random variation through the sampling process alone is **not** likely to account for the observed difference. Therefore, with a small *P*-value, we reject *H*_{0} and are led to believe that the true value of the population is significantly different from what was stated in *H*_{0}. Small *P*-values are strong evidence *against**H*_{0}.

Once we know the value of our test statistic and whether our hypothesis test is a one-tail or two-tail test, we determine the *P*-value. Keeping in mind the idea of statistical significance, our hypothesis test boils down to two situations:

- A small
*P*-value implies that the probability of seeing our sample mean is very unlikely if the null hypothesis is true. This is considered significant evidence that the null hypothesis istrue. Therefore, a small*not**P*-value means we need to reject the null hypothesis*H*_{0}. - A large
*P*-value implies that it is not unlikely to see our sample results given the null hypothesis is true. This is considered significant evidence that the null hypothesis could be true, and we do not reject the null hypothesis*H*_{0}. Notice,**we never conclude that the null hypothesis is indeed true!**

We call the **level of significance** of our hypothesis test the value of alpha, *α*. Typical values for *α* are: *α* = 0.10, *α* = 0.05, or *α* = 0.01. The value of *α* chosen for a hypothesis test must be reported using language such as, “Our hypothesis test has a level of significance *α* = 0.01.”

### 4. The Conclusions of Hypothesis Testing

It is important to keep in mind that the statistical results from a hypothesis test only deal with the null hypothesis *H*_{0}. There are only two statistical conclusions to make at the end of a hypothesis test: “we reject *H*_{0}” or “we fail to reject *H*_{0}.” The statistical conclusions ** never** deal with the alternative hypothesis

*H*

_{1}. Moreover, we never say that, “we accept

*H*

_{0}.” It is either, “we reject

*H*

_{0}” or “we fail to reject

*H*

_{0}.”

After stating the statistical conclusions, it is important to write a sentence or two about what are our conclusions are in a way that a non-statistics person can understand. Usually this sentence starts out with, “Our data provides sufficient evidence that…” or, “Our data does not provides sufficient evidence that…” It will take lots of practice to become comfortable making the proper conclusions from a hypothesis test.

#### Concluding Example

Returning to the example at the beginning of our discussion, a hypothesis test would look like the following:

**Statement of Null and Alternative Hypotheses***H*_{0}:*μ*= 98.6°*H*_{α}:*μ*≠ 98.6°

**Calculate the Test Statistic****Compute the Probability Value (Two-Tail Test)**Using a table of probability values, all we’d be able to say is that the probability of seeing such a test statistic**assuming that**98.6°F**is indeed the average human temperature**is much less than 0.0005 (or 0.05%). Relying on technology to calculate the probability, we find a p-value of 0.00000000142, which is as close to 0 as we’d ever care to see.**Make Conclusions**As our*P*-value is lower than any significance value (such as*α*= 0.05), we would state:- “We reject the null hypothesis
*H*_{0}:*μ*= 98.6°. Our sample data provides sufficient evidence that the typical average human body temperature is significantly different from 98.6°.”

- “We reject the null hypothesis

That’s it… the entire process of carrying out a hypothesis test. To conclude this subcompetency, it must be acknowledged that even with careful data collection and precise test statistics and p-value computations, mistakes (errors) can be made! Remember, we are relying on **sample data** on which to base our conclusions. Even using proper experimental design with randomization throughout the process, sometimes **by chance** our sample data may not adequately represent the overall population. There is nothing we can do about that! This means there is a possiblity that we could draw an inaccurate conclusion from the hypothesis test!