This competency set will address the subject of data collection. You’ll learn how to appropriately collect meaningful data, either through sampling or experiments, from which conclusions can be drawn. Anyone can collect data. However, if the data is not collected in a way to eliminate (or reduce) bias or the data does not accurately represent the population of interest, all results and conclusions drawn from the data will be practically meaningless.

As a word of warning, there are a tremendous number of terms that form the “language of statistics.” It may be very beneficial to create your own statistics dictionary, consisting of terms, definitions, and examples. As this language will be used throughout the course, having a good understanding of terms from the start will help you succeed.

### Parameters vs. Statistics

Remember from the first unit that a **population** is the *entire* group being studied, while a **sample** is a representative *subset* of the population. By definition, a sample is always smaller than the population.

If you are able to collect data from the entire population (for example, exam scores for *all* students in a statistics course), then the descriptive measures of the population are called **parameters. **Parameters are often written using Greek letters like *μ* (pronounced “mew”) or *σ*, pronounced “sigma.” If you only have data from a sample, the descriptive measures of samples are called **statistics**. Statistics are written using Roman letters like *x̄* and *s*, as you saw in Unit 2. An easy way to remember the distinction is by:

**p**arameter ⇔ **p**opulation

**s**tatistic ⇔ **s**ample

The main reason for the difference is so you know whether someone is reporting a descriptive measure from the entire population or just a sample. This very subtle, yet extremely important, difference forms the basis for the process of *statistical inference*.