## A Confidence Interval for a Population Proportion

The most obvious point estimate for the true (yet unknown) population proportion *p* is the sample proportion *p̂* = *x/n*, where *x* is the number of individuals in the sample *with* the characteristic and *n* is the sample size. This means we are hoping that *p̂* ≈ *p*. But how far off are we? Is our point estimate too high? Is it too low? Because *p̂* came from one sample, we have no clue! Does this mean we should give up hope? NO!

We will rely on the sampling distribution of *p̂*, which was presented previously to quantify the accuracy and precision of our point estimate *p̂*. Recall that there are two requirements on the sample size help to ensure the normality of the distribution and independence on the values in the sample: (1) our sample size is less than 5% of the population (*n* ≤ 0.05 ⋅ *N*) and (2) *np*(1 – *p*) ≥ 10.

We will never actually do this, but **if** we repeatedly obtained a random sample to get other values of *p̂*, sometimes our point estimate will be too high, other times it will be too low. But we know something about how the values of *p̂* are distributed around the true population proportion *p*: the majority of the time, *p̂* will fall close to *p*, with lower and lower chances or probabilities of *p̂* falling further away from *p*. In fact, the central limit theorem guarantees that

A **level C confidence interval** for an unknown parameter has two parts:

- An
**interval of numbers**calculated from the data, consisting of our point estimate with some error allowed on either side of the point estimate. - A
**level of confidence**that represents the probability that the interval will capture the true population value*C**in repeated samples*. In other words, our confidence level indicates the expected proportion of intervals that will contain the parameter if a large number of different samples are obtained. The level of confidence is denoted by*C*= (1 –*α*) ⋅ 100%.