## 1. Probability and Two-Way Tables Intro

For this sub-competency you will be introduced to the basics of probability.

#### Basic Probability Rules

Probability will play a huge role later in this course when we start investigating the probability of obtaining certain results from a sample. An unusual event is one that has a low probability of occurring. This is not a precise definition, because how low is “low?” Typically, probabilities of 5% or less are considered low. Recall that 5% means 5 per 100 or 5 times out of 100. Therefore, an event E with a 5% chance of occurring means that in repeated trials we would expect to see E happen in only 5 trials out of every 100.  Thus, events with probabilities of 5% or lower are considered unusual. However, this cutoff point can (and will) vary by the context of the problem.

Probability is basically the science of chance behavior. Chance behavior is unpredictable in the short run but has a regular and predictable pattern in the long run. This is why we will use probability to gain useful results from random samples and randomized comparative experiments…although we don’t know exactly what we’ll see from our sampling or experimentation, if we repeat the process over and over, we gain some confidence in the outcomes we’ll see.

Here are some definitions you want to be familiar with. An experiment is a repeatable process where the results are uncertain. An outcome is one specific possible result from the experiment. The set of all possible outcomes is the sample space.

##### Example

A basketball player shoots three free throws. What are the possible sequences of hits (H) and misses (M)? The experiment in this case is a basketball player shooting 3 free throws. A possible outcome of this experiment is the sequence HHM (hit, hit, miss). The sample space of this experiment is:

S  =  {HHH, HHM, HMH, HMM, MHH, MHM, MMH, MMM}

Note that there are 8 outcomes in this sample space, as each free through has 2 possibilities (hit or miss). So 2 ⋅ 2 ⋅ 2 = 23 = 8. You can often create a sample space using a graphical approach, as shown:

## 1. Intro

In this unit you will learn about measures of position, or location, within a data set. One important measure of position, which will be used extensively later in the course, tells us the position of a data value relative to the standard deviation. Other measures tell us location in terms of groups (or percents) of the data set.

Once you have created a distribution of your data, you can use its shape, center, and spread to tell the story of your underlying data.

The most important idea that you need to take from this unit is that of a probability density curve, the graphical representation of a continuous random variable. When you are looking at a histogram of continuous data, you can almost imagine a smooth curve making the same shape as the histogram’s bars. For example, if you think back to the example from Unit 1 concerning a state’s residents living in poverty, we produced the following histogram:

A smooth curve that has (roughly) the same shape as this histogram would be something like:

The smooth curve that represents our histogram is called a density curve and it has some cool properties. First, it is always on or above our horizontal axis. Since our vertical axis represents a count or percentage of data falling in a particular class, there can’t be a negative amount of data in a class. The other property is that the total area under an entire density curve is 1 (or 100%). Since a density curve represents our data, ALL of our data…or 100% of it…must be included in the distribution. We’re going to routinely utilize the result that an area (or region) under a density curve represents the probability of obtaining results falling in that area. So remember, AREA = PERCENTAGE OR PROBABILITY. Keep reminding yourself: AREA = PERCENTAGE OR PROBABILITY.

Again, the main purpose of a density function is to be a smooth and continuous representation of our actual data. Because the density function is a “model” of our data, we will use Greek letters such as μ and σ to represent the mean and standard deviation of the density curve. Statistics is full of symbols; it is most important to remember that x  and s represent the mean and standard deviation, respectively, of a SAMPLE, while μ and σ represent the mean and standard deviation, respectively, of a POPULATION. The density curve is a stand-in for our population.

To begin, let’s investigate the distributions of two continuous random variables, the uniform distribution and the normal distribution, which will be the focus of our statistical studies from here on out.

## 2. Creating a Probability Model

If the proportion of occurrences of an outcome settles down to one value over the long run, that one value is then defined to be the probability of that outcome. Probabilities can be expressed as fractions (5/8), decimals (0.625), or percents (62.5%). There are two main rules that probabilities must satisfy for a given experiment:

1. The probability of any event must be greater than or equal to 0 and less than or equal to 1. In symbols: 0 ≤ P ≤ 1. For example, it does not make sense to say that there is a “–30%” chance of rain, nor does it make sense to say that there is a “140%” chance of rain.
2. The sum of the probabilities of all possible outcomes must equal 1. In other words, if we examine all possible outcomes from an experiment, one of them must occur! It does not make sense to say that there are two possible outcomes, one occurring with probability 20% and the other with probability 50%. What happens the other 30% of the time?

If an event is impossible, then its probability must be equal to 0 (i.e. it can never happen). If an event is a certainty, then its probability must be equal to 1 (i.e. it always happens).

A probability model is a mathematical description of long-run regularity consisting of a sample space S and a way of assigning probabilities to events. Probability models must satisfy both of the above rules. There are two main ways to assign probabilities to outcomes from a sample space:

• The empirical method, in which an experiment is repeated over and over until you have an idea what the probabilities are for each outcome.
• The classical method, which relies on counting techniques to determine the probability of an event.
##### Example

A basketball player shoots three free throws. We are interested in creating a probability model for the number of free throws that a basketball player makes when shooting three in a row. Recall from above that the sample space for this event is:

S  =  {HHH, HHM, HMH, HMM, MHH, MHM, MMH, MMM}

If we count the numbers of hits (H) for each possible outcome, we would get:

=  {3, 2, 2, 1, 2, 1, 1, 0}

The probability model for the number of free throws made, assuming this player has an equal chance of making (hitting) or missing the free throw, is:

 Hits Probability (Fraction) Probability (Decimal) Probability (Percent) 0 1 out of 8 = 1/8 0.125 12.5% 1 3 out of 8 = 3/8 0.375 37.5% 2 3 out of 8 = 3/8 0.375 37.5% 3 1 out of 8 = 1/8 0.125 12.5%

## 3. Combining Probabilities

In this section we learn about adding probabilities of events that are disjoint, i.e., events that have no outcomes in common. Two events are disjoint if it is impossible for both to happen at the same time. Another name for disjoint events is mutually exclusive. This section is relatively straightforward, so these notes will be rather short.

In the following discussion, the capital letters E and F represent possible outcomes from an experiment, and P(E) represents the probability of seeing outcome E.

For disjoint events, the outcomes of E or F can be listed as the outcomes of E followed by the outcomes of F. The Addition Rule for the probability of disjoint events is:

(E or F)=(E) + (F)

Thus we can find P (E or F) if we know both P (E) and P (F). This is also true for more than two disjoint events. If E, F, G, are all disjoint (none of them have any outcomes in common), then:

P (E or F or G or …) = P (E) + P (F) + P (G) + ⋯

The addition rule only applies to events that are disjoint. If two (or more) events are not disjoint, then this rule must be modified because some outcomes may be counted more than once. For the formula (E or F) = (E) + (F), all the outcomes that are in both E and F will be counted twice. Thus, to compute P (E or F), these double-counted outcomes must be subtracted (once), so that each outcome is only counted once.

P (E or F) = P (E) + P (F) – P (E and F),

where P (E and F) is the set of outcomes in both E and F. This rule is true both for disjoint events and for non-disjoint events, for if two events are indeed disjoint, then P (E and F) = 0, and the General Addition Formula simply reduces to the basic addition formula for disjoint events.

##### Example

When choosing a card at random out of a deck of 52 cards, what is the probability of choosing a queen or a heart? Define:

E = “choosing a queen”
F = “choosing a heart”

E and F are not disjoint because there is one card that is both a queen AND a heart, so we must use the General Addition Rule. We know the following probabilities using the classical (counting, equally-likely outcomes) method:

P (E) = P (queen) = 4/52
P (F) = P (heart) = 13/52
P (E and F) = P (queen of hearts) = 1/52

Therefore,

Finally, it is often easier to calculate the probability that something will not happen rather than determining the probability that it will happen. The complement of the event E is the “opposite” of E. We write the complement of outcome E as Ec. The complement E^c consists of all the outcomes that are not in that event E

For example, when rolling one die, if event = {even number}, then E= {odd number}. If  event = {1,2}, then Fc = {3, 4, 5, 6}.

It should make sense that the probability of the complement Ec occurring is just 1 minus the probability that event E occurs. In formula form:

P(Ec ) = 1 – P(E)

## 4. Probability of Independent Events

Two events E and F are independent if the occurrence of E in a probability experiment does not affect or alter the probability of event F occuring. In other words, knowing that E occurred does not give any additional information about whether F will or will not occur; knowing that F occurred does not give any additional information about the occurance of E. Therefore, events E and F are independent if they are totally unrelated. For example, if you are flipping a fair coin (this means the probability of getting a heads is 50% and getting a tails is 50%), does knowing that you just flipped a tails tell us anything about what will happen the next time we flip the coin? No! The coin has no “memory” to speak of. Even if you flipped 10 heads in a row, the probability of flipping heads on the 11th toss is still 50%.

If the two events are not independent, then they are said to be dependent. If two events are dependent, it does not mean that they completely rely on each other; it just means that they are not independent of each other. In other words, there is some kind of relationship between E and F, even if it is just a very small relationship. For example, you are asked to pull one card from a standard deck of 52 cards. Let E = {red card} and F = {black card}. Suppose you pull a red card from the deck. Does knowing this provide any information about whether F occurred? Yes! If we pulled a red card, then we know we didn’t pull a black card, so therefore F could not have occurred!

Let’s run a different experiment by pulling two cards from a standard deck without replacement. If the first card pulled is a red card, does that change the probability that we will pull a black card for the second card? Most definitely, because now there is one fewer red card in the deck, which actually increases the probability that the second card is black (even though the change in probability is small).

#### The Multiplication Rule for Independent Events

The Multiplication Rule for independent events states:

P (E and F) = P (E) ⋅ P (F)

Thus we can find P (E and F) if we know P (E) and P (F). This is also true for more than two independent events. So if E, F, G, … are all independent from each other, then:

P (E and F and G and ⋯) = (E) ⋅ P (F) ⋅ (G) ⋯

The ELISA is a test to determine whether the HIV antibody is present in a patient’s blood. The test is 99.5% effective. This means that the test will accurately come back negative if the HIV antibody is not present. The probability of a test coming back positive when the antibody is not present (known as a false positive) is 100% – 99.5% = 0.5% = 0.005. Suppose the ELISA is given to 5 randomly selected people who do not have the HIV antibody.

(a)

What is the probability that the ELISA comes back negative for all five people? First, testing each individual with the ELISA is an independent event, because knowing the results of the test for one person gives us no information about what the result will be for the next person. Therefore:

P (all 5 tests are negative) = (0.995) ⋅ (0.995) ⋅ (0.995) ⋅ (0.995) ⋅ (0.995)
= (0.995)5
≈ 0.9752

Therefore, there is a 97.52% that all 5 individuals will test negative for the HIV antibody when all 5 patients are indeed HIV-negative.

(b)

What is the probability that the ELISA comes back positive for at least one of the five people? First of all, “at least one” means 1 or 2 or 3 or 4 or 5 of the people receives a positive test. Another way to say “all 5 tests are negative” is “none of the 5 tests is positive.” In symbols, if E = {all 5 have a negative ELISA}, then we could also just as well say = {none of the 5 have a positive ELISA}. Therefore, we know the compliment of E to be E= {at least one of the 5 has a positive ELISA}. Using the fact that (Ec) = 1 – P(E), we see:

(at least one of the 5 tests positive) = 1 – (all 5 have negative tests)
= 1 – (0.995)5
≈ 1 – 0.9752
≈ 0.0248

There is a 2.48% chance of at least one of the 5 individuals getting a false positive reading. This is an usual event (since the probability value is very low), as it should be, as false positive results can cause an individual undue emotional stress and result in additional (often extremely expensive) testing.

## 5. Conditional Probability

In this section we learn about events that are not independent of one another. When this happens, knowing additional information can actually change the probability of a future event happening. How can this occur? Aren’t probabilities supposed to be fixed?

The easiest example deals with dice. Let’s suppose you close your eyes and roll a die. Without opening your eyes, what is the probability that you rolled the number 5? That’s easy,

Let’s change up the experiment a bit. You close your eyes and, after rolling your die, a friend in the room tells you that you rolled an odd number. Now, what is the probability that you rolled the number 5? Since there are only three odd numbers on a die, {1,3,5}, you now have a 1 in 3 chance of rolling a 5. In symbols:

If your friend tells you that an even number showed up, what is the probability that you rolled a 5? It can’t happen since 5 is an odd number.

So what is happening in these cases? Well, you are learning some additional information that leads us to change the probability of an event occurring. In effect, knowing additional information changes the sample size we use to compute the probabilities. Therefore, the probability of our event occurring must change.

The notation P(FE) means “the probability of F occurring given that (or knowing that) event E already occurred.” For the above dice example, F = {roll a 5}, and = {result is an odd number}, and we found that P(FE) = 33.33%.

Conditional probabilities are useful when presented with data that comes in tables, where different categories of data (say, Male and Female), are broken down into additional sub-categories (say, marriage status).

To compute the probabilities of dependent data, we use the Conditional Probability Rule. In symbols:

where P(E) is the probability of event E occurring and (E) is the number of ways that event E can occur.

##### Example

Consider studying the possibilities of gender for a 2-child family. The sample space for all possible outcomes is = {GG, GB, BG, BB}, where birth order is important…there is a first child and then a second child. Assume that each child is equally likely to be male or female. Each of the items in our sample space can be thought of as the outcome of a chance experiment that selects at random a family with two children. Think about the following questions:

1. What is the probability of seeing a family with two girls, given that the family has at least one girl?
2. What is the probability of seeing a family with two girls, given that the older sibling is a girl?

To most people, these questions seem to be the same. However, if we fill in the probabilities you’ll see they are different!

For Question 1, we want to compute:

(family has two girls | family has an older girl)

Using the Conditional Probability Rule we see this probability is equal to:

For Question 2, we want to compute:

P (family has two girls | family has an older girl)

Using the Conditional Probability Rule we see this probability is equal to:

Did you notice how the sample space for Question 2 changed? Since we knew the older child was a girl, we had to eliminate the outcome of {BG} since this family had a boy first.

#### Computing Probabilities Using the General Multiplication Rule

Earlier you saw the multiplication rule for independent events, which is:

P (E and F) = P (E) ⋅ P (F)

Is there such a rule if events E and F are dependent? With a slight modification, we get the General Multiplication Rule:

(E and F) = (E) ⋅ (FE)

## 2. Uniform Distribution

One simple, basic example of a continuous random variable is one where the random variable X can take any value in a given interval with an equally likely probability. The distribution of such a random variable is the uniform distribution.

Image you show up for work one morning and are told there will be a fire alarm drill sometime during the eight-hour day. Fire drills don’t make sense if everyone knows when the drill will take place, so all you know is that sometime during the day, a drill will take place. This means that at every moment there is an equally likely chance that the fire drill will take place. Together with the information that the drill will happen, i.e., there is a 100% = 1 probability that it will occur, we get the following distribution:

Why is the probability fixed at 1/8? Use the facts that (1) there are 8 hours during which the drill can take place, and (2) there is a 100% probability of the drill occurring. Since the uniform distribution is a rectangle, and the area of any rectangle is A(length× (width), we get:

1 = 8 × height

and solving for height gives us:

Now you can determine the probabilities of the drill taking place during any time interval you choose. For example, the probability that the drill will occur during your lunch hour (from 12:00 p.m. to 1:00 p.m.) is simply the area of the region shown in red:

Once again, remember that AREA = PERCENTAGE OR PROBABILITY. The discussion of probability density curves always starts with the uniform distribution, because everyone knows how to calculate areas of rectangles. And it’s easy to see how the concepts of area and probability are linked.

## 3. The Normal Distribution

For the majority of the remainder of this class, we’ll be focusing on variables that have a (roughly) normal distribution. For example, data sets consisting of physical measurements (heights, weights, lengths of bones, and so on) for adults of the same species and sex often follow a similar pattern: most individuals are clumped around the average or mean of the population, with numbers decreasing the farther values are from the average in either direction.

The shape of any normal curve is a single-peaked, symmetric distribution that is bell-shaped. A normally distributed random variable, or a variable with a normal probability distribution, is a continuous random variable that has a relative frequency histogram in the shape of a normal curve. This curve is also called the normal density curve. The actual functional notation for creating the normal curve is quite complex:

where μ and σ are the mean and standard deviation of the population of data.

What this formula tells us is that any mean μ and standard deviation σ completely define a unique normal curve. Recall that μ tells us the “center” of the peak while σ describes the overall “fatness” of the data set. A small σ value indicates a tall, skinny data set, while a larger value of σ results in a shorter, more spread out data set. Each normal distribution is indicated by the symbols N(μ,σ) . For example, the normal distribution N(0,1) is called the standard normal distribution, and it has a mean of 0 and a standard deviation of 1.

Properties of a Normal Distribution

1. A normal distribution is bell-shaped and symmetric about its mean.
2. A normal distribution is completely defined by its mean, µ, and standard deviation, σ.
3. The total area under a normal distribution curve equals 1.
4. The x-axis is a horizontal asymptote for a normal distribution curve.

A graphical representation of the Normal Distribution curve below:

Because there are an infinite number of possibilities for µ and σ, there are an infinite number of normal curves. In order to determine probabilities for each normally distributed random variable, we would have to perform separate probability calculations for each normal distribution.

One amazing fact about any normal distribution is called the 68-95-99.7 Rule, or more concisely, the empirical rule. This rule states that:

• Roughly 68% of all data observations fall within one standard deviation on either side of the mean. Thus, there is a 68% chance of a variable having a value within one standard deviation of the mean
• Roughly 95% of all data observations fall within two standard deviations on either side of the mean. Thus, there is a 95% chance of a variable having a value within two standard deviations of the mean
• Roughly 99.7% of all data observations fall within three standard deviations on either side of the mean. Thus, there is a 99.7% chance of a variable having a value within three standard deviations of the mean

A graphical representation of the empirical rule is shown in the following figure:

##### Example:

Suppose a variable has mean μ = 17   and standard deviation σ = 3.4. Then, according to the empirical rule:

• Approximately 68% of individual data values will lie between: 17 – 3.4 = 13.6 and 17 + 3.4 = 20.4. In interval notation we write: (13.6, 20.4).
• Approximately 95% of individual data values will lie between 17 – 2⋅3.4 = 10.2 and 17 + 2⋅3.4 = 23.8. In interval notation we write: (10.2, 23.8).
• Approximately 99.7% of individual data values will lie between 17 – 3⋅3.4 = 6.8 and 17 + 3⋅3.4 = 27.2. In interval notation we write: (6.8, 27.2).

The results from the third bullet point illustrate how a data value of, say, 2.1 (which is less than 6.8) or a data value of, say, 33.2 (a value greater than 27.2) would both be very unusual, since almost all data values should lie between 6.8 and 27.2.

#### Back to the Standard Normal Curve

All normal distributions, regardless of their mean and standard deviation, share the Empirical Rule. With some very simple mathematics, we can “transform” any normal distribution into the standard normal distribution. This is called a z-transform.

Using the z-transformation, any data set that is normally distributed can be converted to the same standard normal distribution by the conversion:

where X is the normally distributed random variable, and Z is a random variable following the standard normal distribution.

Notice when X = μ that Z = (μ – μ)/σ = 0, which explains how Z transforms our mean to 0.

Properties of the Standard Normal Distribution

1. The standard normal distribution is bell-shaped and symmetric about its mean.
2. The standard normal distribution is completely defined by its mean, µ = 0, and standard deviation,  σ = 1.
3. The total area under the standard normal distribution curve equals 1.
4. The x-axis is a horizontal asymptote for the standard normal distribution curve.

## 4. The z-Score

Given any data value, we can identify how far that data value is away from the mean, simply by doing a subtraction x – μ. This value will be positive if your data value lies above (to the right) of the mean, and negative if it lies below (to the left) of the mean. But what we’d really like to know is, relative to the spread of our data set, how far is x from μ? Remember that the standard deviation σ gives us a measure of how spread out our entire set of individual data values is.

The z-score for any single data value can be found by the formula (in English):

or with symbols (as seen before!):

Obviously a z-score will be positive if the data value lies above (to the right) of the mean, and negative if the data value lies below (to the left) of the mean.

Example 6.1: Calculating and Graphing z-Values

Given a normal distribution with μ = 48 and s = 5, convert an x-value of 45 to a z-value and indicate where this z-value would be on the standard normal distribution.

Solution

Begin by finding the z-score for x = 45 as follows.

Now draw each of the distributions, marking a standard score of z = −0.60 on the standard normal distribution.

The distribution on the left is a normal distribution with a mean of 48 and a standard deviation of 5. The distribution on the right is a standard normal distribution with a standard score of z = −0.60 indicated.

Z-scores measure the distance of any data point from the mean in units of standard deviations and are useful because they allow us to compare the relative positions of data values in different samples. In other words, the z-score allows us to standardize two or more normal distributions, or more appropriately, to put them on the same scale. Therefore, we’ll be able to compare relative positions of data values within their own distribution to determine which data values are closer to or farther from the mean. A prime example for this is to compare the test scores for two students, one who scored a 28 on the ACT (scores range from 1 – 36) and another who scored a 1280 on the SAT (scores range from 400 – 1600). Who, relative to their associated exam, scored better?

##### Example

Your statistics exam score was 0.67 standard deviations better than the class average; your biology score was 0.7 standard deviations better than the class average; your kayaking score was only 0.5 standard deviations better than the class average.  Therefore, even though your actual score on the biology exam was the lowest of the three exam scores, relative to the distribution of all class exam scores, your biology exam score was the highest relative grade.

#### Finding an Area (Proportion) Given a Specific Z-Value

To determine the area under the N(0, 1) curve for any data value that does not fall exactly 1, 2, or 3 standard deviations above or below the mean actually requires some calculus. Lucky for us, areas under the N(0, 1) curve can be obtained in numerous other ways, including technology (TI-83/84, Excel) and a table of values. Search the Internet for “standard normal table” and you’ll find hundreds of tables illustrating z-scores and their associated areas. The majority of these methods report the area to the left of the specified z-score z, no matter where it lies. This comes from a calculus operation of integration, which finds an area from the start of a distribution (i.e., the far left-tail) up to the z-score. Two images are provided.

There are three types of area calculations that you will be performing, each requiring slightly different work:

• For areas to the left of z: simply use the area provided by a table or technology.
• For areas to the right of z: because the total area under a density curve is 1 (100%), simply calculate: 1 − area to the left of z0.
• For areas between two z-values, say zand z1 (where z< z1): find the area to the left of z1 and subtract from it the area to the left of z0.

#### Finding a Z-Value Given an Area

This is a slightly more challenging task than calculating an area, because you basically work “backwards” from an algebraic standpoint. It’s important to realize that a Standard Normal Table has two parts: (1) the top and side margins, which form the tenths and hundredths of a z-score, and (2) the body of the table, which are all the area (probability) values. Also, remember that the Standard Normal Table only provides us information on the area (probability) to the left of a z-score. A small excerpt of Table B from Appendix A is shown below.

Notice that the z-values given in the table are rounded to two decimal places. The first decimal place of each  z-value is listed in the left column, with the second decimal place in the top row. Where the appropriate row and column intersect, we find the amount of area under the standard normal curve to the left of that particular z-value.

Example : Finding Area to the Left of a Positive z-Value Using a Cumulative Normal Table

Find the area under the standard normal curve to the left of z = 1.37.

Solution

To read the table, we must break the given z-value (1.37) into two parts: one containing the first decimal place (1.3) and the other containing the second decimal place (0.07). So, in Table B from Appendix A, look across the row labeled 1.3 and down the column labeled 0.07. The row and column intersect at 0.9147. Thus, the area under the standard normal curve to the left of z = 1.37 is 0.9147.

Using a TI-83/84 Plus calculator, we can find a value of the area to the left of a z-score. To obtain the solution using a TI-83/84 Plus calculator, perform the following steps.

• Press 2nd and then Vars to access the DISTR menu.
• Choose option 2:normalcdf( .
• Enter lower bound, upper bound, µ , σ. Note If you want to find area under the standard normal curve, as in this example, then you do not need to enter µ or σ.
• Since we are asked to find the area to the left of z, the lower bound is -∞. From the empirical rule we know that after about 3 standard deviations away from the mean we have accounted for almost all of the data, so for our lower bound we will simply use a very negative number.We cannot enter -∞ into the calculator, so we will enter a very small value for the lower endpoint, such as -1099. This number appears as -1E99 when entered correctly into the calculator. To enter -1E99, press(-) 1 [2nd][ , ]99. This appears on the screen as normcdf(-1E99,1.37,0,1).

If we are given an area (or probability) value, we need to first locate it in the body of a table, then track our way up and to the left in order to piece together the z-score that relates to the specified area. Keep in mind that you may not find the exact area value in the body of the table…so just use the closest value you can find, and then identify the proper z-score.

One calculation that will be used frequently in the coming chapters is to identify the two z-scores that separate a specific area in the middle of the standard normal distribution.

##### Example

Suppose we want to know which two z-scores separate out the middle 95% of the data. From the empirical rule, we already know the z-scores that do this are ±2 (2 standard deviations on either side of the mean). In reality, it’s not exactly ±2, but close enough for rough calculations.

To find the exact two z-scores, we use the following logic: If the middle portion is 95% = 0.95, then how much area lies outside of the middle (to the left and right)? A simple subtraction solves this! 1 – 0.95 = 0.05. The “outside” area, 0.05, must be split equally between the two tails (because of symmetry!). Therefore, dividing 0.05 by two gives us an area of 0.025 in each tail.

Using a standard normal table “backwards,” we first look through the body of the table to find an area closest to 0.025. The z-score corresponding to a left-tail area of 0.025 is z = −1.96. Now, therefore, the upper z-score will be z = 1.96, by the symmetry property of the standard normal distribution. You could also discover the upper z-score by looking up the area/probability value 0.025 + 0.95 = 0.975 in the body of the table and finding the associated z-value. By the end of the class, you will be extremely familiar with z-scores that define a central 90% (z = ± 1.645), 95% (z = ± 1.96), and 99% (z = ± 2.576).

##### Example: Find and interpret the probability of a random Normal variable

Suppose you just purchased a 2005 Honda Insight with automatic transmission. Using www.fueleconomy.gov you determine for the 2005 Honda Insights have mean highway gas milage is 56 miles per gallon with a standard deviation of 3.2. The distribution of this data has a bell-shape and is normal. You want to know the following:

a) How likely is it that your Honda Insight with automatic transition will get better than 60 miles per gallon on the highway.

b) How likely is it that your Honda Insight with automatic transition will get less than 50 miles per gallon on the highway.

c) How likely is it that your Honda Insight with aoutomatic transition will get between 52 and 62 miles per gallon on the highway.

Solution

This problem deals with data that is normally distributed with mean 56 and standard deviation 3.2, i.e., .

(a)

In symbols, we are asked to calculate P(X > 60). Sketching a normal curve and shading the area corresponding to greater than 60, gives us the graph shown. In order to calculate the appropriate area in the upper (right) tail, we must first convert our data to the standard normal distribution. The z-score for x = 60 is:

This means that 60 is 1.25 standard deviations above the mean. Notice how lining the two normal curves up as shown illustrates how the two areas are the same: P(X > 60) = P(Z > 1.25).

Using z = 1.25, we go to Table IV (or use normcdf(1.25,1E99,0,1))  to find the area to the left of z = 1.25 is 0.8943. Since we need the area to the right, we simply take 1 – 0.8943 = 0.1057.

Therefore, P(X > 60) = 0.1057 = 10.57%. There are a couple ways to interpret this answer:

• Of all the model year 2005 Honda Insight cars produced with an automatic transmission, 10.57% will get over 60 miles per gallon on the highway.
• If you went to a car lot and purchased a new model year 2005 Honda Insight cars produced with an automatic transmission, there is a 10.57% chance that your car will get over 60 miles per gallon on the highway.

(b)

In symbols, we are asked to calculate P(X < 50). Sketching a normal curve N(56, 3.2)and shading the area corresponding to less than 50, gives us the graph shown to the right.

In order to calculate the appropriate area in the lower (left) tail, we must first convert our data to the standard normal distribution. The z-score for x = 50 is:

Thus, the value 50 MPG is 1.88 standard deviations below the mean. In symbols we see: P(X < 50) = P(Z < −1.88).

Using z = -1.88, we go to Table IV (or use normcdf(-1E99,-1.88,0,1)) to find the area to the left of z = -1.88 is 0.0301. Therefore, P(X < 50) = 0.0301 = 3.01%. There are a couple ways to interpret this answer:

• Of all the model year 2005 Honda Insight cars produced with an automatic transmission, 3.01% will get less than 50 miles per gallon on the highway.
• If you went to a car lot and purchased a new model year 2005 Honda Insight cars produced with an automatic transmission, there is a 3.01% chance that your car will get less than 50 miles per gallon on the highway.

(c)

In symbols, we are asked to calculate P(58 < X < 62). Sketching a normal curve N(56, 3.2)  and shading the area corresponding to greater than 58 but less than 62, gives us the graph shown. In order to calculate the appropriate area, we must first convert both data to the standard normal distribution.

The z-score for = 58 is:

and the z-score for x = 62 is:

In terms of probability, we can now say: P(58 < X < 62) = P(0.63 < Z < 1.88).

Using z = 1.88, we go to Table IV (or use technology) to find the area to the left of z = 1.88 is 0.9699. Now, we need to remove (subtract) the area left of z = 0.63, which is 0.7357. Therefore, P(58 < X < 62) = 0.9699 – 0.7357 = 0.2342, or 23.42%. There are a couple ways to interpret this answer:

• Of all the model year 2005 Honda Insight cars produced with an automatic transmission, 23.42% will get between 58 and 62 miles per gallon on the highway.
• If you went to a car lot and purchased a new model year 2005 Honda Insight cars produced with an automatic transmission, there is a 23.42% chance that your car will get between 58 and 62 miles per gallon on the highway.

This calculation can be done with both normcdf(0.63,1.88,0,1) and normcdf(58,62,56,3.2), which will be the same.

#### Find the Value of a Random Variable Knowing a Probability Value

In these types of problems, we need to work “backwards.” Starting with a specified probability, find the specified z-score, then work our way back to the random variable. The tables of standard normal values are not a “one-way” tool! What do we mean by that? So far you’ve started with a value for a random variable (like a gas mileage value in the previous problem), turned it into a z-score, and then looked up the associated probability value for that z-score. We can use this table to work backwards! We can start with a known probability value in the body of a table, identify the z-score corresponding to that area by moving your fingers to the associated row and column, the reverse the algebra transformation from a z-score to a random variable.

If this sounds confusing, think back to the steps we took in the preceding example:

If, however, we are given an area/probability, then to work our way back to the original data value, we must first identify the appropriate z-score, and then “un-standardize” the z-score to arrive (finally!) back at the data value. How do we algebraically “undo” the z-score? Easy…just solve for the data value X:

Multiply both sides by σ to remove it from the denominator on the left side:

X – μ = Z⋅σ

Finally, add the value of μ to both sides to isolate the value of the random variable X:

X = Z⋅σ + μ

##### Example: Finding the value of a normal random variable

Instead you want to know a gas mileage for a particular probability. Find what gas mileage for your 2005 Honda Insight will get better gas mileage than 97% of all other 2005 Honda Insights with automatics transmission.

Solution

This problem again deals with data that is normally distributed with mean 56 and standard deviation 3.2, i.e., N(56, 3.2).

To find the 97% percentile gas mileage, we need to find the specific miles per gallon X that separates the bottom 97% of all gas mileages from the top 3%. So for this problem we are given a percentage/area. Sketching the normal curve gives the graph shown.

Using Table IV, we find 0.97 in the body of the table, and then identify the z-score of 1.88. Notice that the exact area 0.97 is not in the table, but the closest area of 0.9699 has the z-score of 1.88. Now we un-standardize the z-score of 1.88. In English this means we need to identify the specific gas mileage that is 1.88 standard deviations above the mean of 56. Solving for X in the Z transform gives:

Therefore, if your 2005 Honda Insight cars with an automatic transmission gets 62 mpg, it gets better miles per gallon than 97% of all 2005 Honda Insight cars with an automatic transmission.

## 5. Percentiles

Remember that the median of a data set divides the lower 50% of the data from the upper 50%. We say that the median is the 50th percentile of the data set. If a number divides the lower 34% of the data from the upper 66%, that number is the 34th percentile. In general, a kth percentile of a data set is a value that divides the data set into the lower kth percentile and the upper (1 − kth) percentile.

The computation of the kth percentile is similar to the one for the median. First arrange the n data values in ascending order, and then compute the location (or index, i ) of the kth percentile using the formula:

If i is an integer, the ith data value is the kth  percentile. If i is not an integer, take the mean of the two values on either side of i to give the kth percentile.

You’ve already seen the usefulness of the quartiles of a data set (recall the 5-Number Summary and boxplots from Unit 2?). The quartiles are the 25th, 50th, and 75th percentiles:

• Q1 = 25th percentile
• Q2 = 50th percentile = median
• Q3 = 75th percentile

You can find the quartiles using the same index formula above.

Example: Finding the z-Value That Represents a Given Percentile

What z-value represents the 90th percentile?

Solution

The 90th percentile is the z-value for which 90% of the area under the standard normal curve is to the left of z. So, we need to find the value of z that has an area of 0.9000 to its left.

Looking for 0.9000 (or an area extremely close to it) in the interior of the cumulative normal tables, we find 0.8997, which corresponds to a z-value of 1.28. Thus     z ≈ 1.28 represents the 90th percentile.

Using a TI-83/84 Plus calculator, we can find a value of z with a given area to its left. As noted previously, the 90th percentile is the z-value that has an area of 0.9000 to its left. Enter invNorm(0.9000), as shown in the screenshot, and press  ENTER. The answer is z ≈ 1.28.