3. The Normal Distribution

For the majority of the remainder of this class, we’ll be focusing on variables that have a (roughly) normal distribution. For example, data sets consisting of physical measurements (heights, weights, lengths of bones, and so on) for adults of the same species and sex often follow a similar pattern: most individuals are clumped around the average or mean of the population, with numbers decreasing the farther values are from the average in either direction.

Normal Distribution Chart

The shape of any normal curve is a single-peaked, symmetric distribution that is bell-shaped. A normally distributed random variable, or a variable with a normal probability distribution, is a continuous random variable that has a relative frequency histogram in the shape of a normal curve. This curve is also called the normal density curve. The actual functional notation for creating the normal curve is quite complex:

equation

where μ and σ are the mean and standard deviation of the population of data.

What this formula tells us is that any mean μ and standard deviation σ completely define a unique normal curve. Recall that μ tells us the “center” of the peak while σ describes the overall “fatness” of the data set. A small σ value indicates a tall, skinny data set, while a larger value of σ results in a shorter, more spread out data set. Each normal distribution is indicated by the symbols N(μ,σ) . For example, the normal distribution N(0,1) is called the standard normal distribution, and it has a mean of 0 and a standard deviation of 1.

Properties of a Normal Distribution

  1. A normal distribution is bell-shaped and symmetric about its mean.
  2. A normal distribution is completely defined by its mean, µ, and standard deviation, σ.
  3. The total area under a normal distribution curve equals 1.
  4. The x-axis is a horizontal asymptote for a normal distribution curve.

A graphical representation of the Normal Distribution curve below:

Sec03. NormalDis

Because there are an infinite number of possibilities for µ and σ, there are an infinite number of normal curves. In order to determine probabilities for each normally distributed random variable, we would have to perform separate probability calculations for each normal distribution.

Sec03.Normal Dis2

One amazing fact about any normal distribution is called the 68-95-99.7 Rule, or more concisely, the empirical rule. This rule states that:

  • Roughly 68% of all data observations fall within one standard deviation on either side of the mean. Thus, there is a 68% chance of a variable having a value within one standard deviation of the mean
  • Roughly 95% of all data observations fall within two standard deviations on either side of the mean. Thus, there is a 95% chance of a variable having a value within two standard deviations of the mean
  • Roughly 99.7% of all data observations fall within three standard deviations on either side of the mean. Thus, there is a 99.7% chance of a variable having a value within three standard deviations of the mean

A graphical representation of the empirical rule is shown in the following figure:

unit3_04

Image from: http://2.bp.blogspot.com/-J2YOCi9-1Tg/U95XGRQBS-I/AAAAAAAABKQ/y5vD4qMSJb4/s1600/stdeviation.png

Example:

Suppose a variable has mean μ = 17   and standard deviation σ = 3.4. Then, according to the empirical rule:

  • Approximately 68% of individual data values will lie between: 17 – 3.4 = 13.6 and 17 + 3.4 = 20.4. In interval notation we write: (13.6, 20.4).
  • Approximately 95% of individual data values will lie between 17 – 2⋅3.4 = 10.2 and 17 + 2⋅3.4 = 23.8. In interval notation we write: (10.2, 23.8).
  • Approximately 99.7% of individual data values will lie between 17 – 3⋅3.4 = 6.8 and 17 + 3⋅3.4 = 27.2. In interval notation we write: (6.8, 27.2).

The results from the third bullet point illustrate how a data value of, say, 2.1 (which is less than 6.8) or a data value of, say, 33.2 (a value greater than 27.2) would both be very unusual, since almost all data values should lie between 6.8 and 27.2.

Back to the Standard Normal Curve

All normal distributions, regardless of their mean and standard deviation, share the Empirical Rule. With some very simple mathematics, we can “transform” any normal distribution into the standard normal distribution. This is called a z-transform.

Equation

Sec03. StdNorm3

Using the z-transformation, any data set that is normally distributed can be converted to the same standard normal distribution by the conversion:

Equation

where X is the normally distributed random variable, and Z is a random variable following the standard normal distribution.

Notice when X = μ that Z = (μ – μ)/σ = 0, which explains how Z transforms our mean to 0.

Properties of the Standard Normal Distribution

  1. The standard normal distribution is bell-shaped and symmetric about its mean.
  2. The standard normal distribution is completely defined by its mean, µ = 0, and standard deviation,  σ = 1.
  3. The total area under the standard normal distribution curve equals 1.
  4. The x-axis is a horizontal asymptote for the standard normal distribution curve.

Sec03. StdNorm