# Sampling

Who is being studied can, obviously, have an important influence on what relationships are found. The history of political polling shows what can happen when mistakes are made with sampling. A good sample is one that allows the researcher to generalize to the relevant population.

But, why sample at all? Why not just study the population directly?

A study of a population would avoid the problems of sampling altogether. We refer to a **CENSUS** as a direct study of the population. The biggest impediment to research on census data is, as you might guess, the expense. The costs in time and money required for a census, of even a relatively limited study population, are prohibitive. Even if we had an unlimited budget, and all the time in the world, it is often impossible to include every member of a population in a study.

Given these limitations, researchers know that they are stuck with the business of sampling.

The purpose of sampling is to allow the researcher to collect data from a subset of the population — the sample — and draw conclusions about the population as a whole. In quantitative terms, this means that researchers use statistics to estimate parameters. For example, if we collect household income data from a sample of residents of Brooklyn, and discover that the mean income is $18,456, we can construct an estimate of the population mean income based on the sample mean. Our estimate of the true population mean income might be something like $18,100 to $18,812. This parameter estimate is called a confidence interval.

The Logic of Sampling

There are two general kinds of sampling techniques, random and nonrandom. In a random sample, each element in the population has an equal, non-zero, chance of being selected into the sample. There are a variety of random sampling strategies, ranging from the simple to the complex. (Indeed, one of the most lucrative jobs in quantitative research is the survey sampling statistician, whose job it is to figure out how to employ the complex sampling strategies in particular studies.) Nonrandom sampling techniques include any method of drawing a sample where the population elements do not have an equal chance of being selected.

Random sampling techniques are the best method of drawing samples representative of their populations. It may seem counter-intuitive that randomization is the best way to achieve a representative sample, but it is so. When a research tries to design a **PURPOSIVE SAMPLE**, one with a certain set of attributes, the complexity of the social world works against the researcher. It is very difficult to take into account all the relevant characteristics which make a sample representative of its population. As a result, the purposive sample is often systematically distorted, or **BIASED**.

The secret of random sampling is that by letting each population element have an equal chance of being included in the sample, the complexity of the social world takes care of itself. Elements with certain characteristics are no more or less likely than other elements to be included in the sample. For sufficiently large samples, the laws of probability tend to produce samples representative of their populations.

There is no guarantee that a sample will be representative of its population. Sometimes random samples are not, and sometimes nonrandom samples are. But, since there is no way to be certain, random sampling techniques are the best way to produce representative samples.

Statistical testing involves making a judgment about the likelihood that a result is due to sampling error rather than a real social pattern. Most of the basic statistical tools used by sociologists assume that the data come from a random sample.

Problems with Sampling Techniques

When researchers talk about biased samples, they do not mean samples of individuals with prejudiced attitudes. Bias, in methodological terms, refers to any systematic difference between the sample and its population. Sampling strategies sometimes introduce bias into the sample. The result is a serious problem for the interpretation of data. Remember, sampling is used to estimate parameters from sample data. If the sample is systematically different from its population, any result may be due to bias rather than a real pattern in the population.

In our example of the survey of residents of Brooklyn, if we decided on a telephone survey, we would biased in terms of income. If we were studying attitudes about social policy, for example, this might yield results that were not truly representative of the population.

If a researcher discovers a bias in the sample, the data must be interpreted with caution. The researchers must point out the source of bias in the presentation of the data, so that others may evaluate the project in light of the problems generalizing the results. Such work is flawed, to be sure, but can still contribute to the scientific literature.

There are several specific kinds of bias with which we should be familiar. One of the most common sources is **NONRESPONSE** bias. All survey research must deal with the problem of refusal to participate. Since our professional ethics mandate that our respondents cannot be coerced into participation, researchers must make it clear to potential respondents that they are free to decline the invitation to participate. Even the best designed samples are subject to nonresponse bias. With mailed questionnaires, for example, nonresponse rates can be quite high. Major national sample research projects, like the GSS and NES, typically get response rates of 80 percent or higher.

The problem with nonresponse emerges when the refusals are not random. If one or more groups of respondents tend to refuse at higher rates, those groups will be under-represented in the sample. This is a common problem when surveys are conducted in English with populations including non-English speakers.

Another source of bias is **SELECTIVE AVAILABILITY**. This occurs when data collection tends to exclude certain groups that are hard to reach. If, for example, we were conducting a interview study of residents of Brooklyn, and we did all our interviews during the day, we would likely miss people who work in Manhattan or one of the other boroughs. Our sample would be biased as a result. Similarly, if we conducted our interviews during the summer, we are more likely to exclude vacationers. The result would probably be a bias with respect to social class.

When cluster sampling is used, **AREAL** bias is a potential problem. Areal bias occurs when certain areas are excluded from the sampling procedure. If we use neighborhoods as a cluster in a multi-stage sample of residents of Brooklyn, and certain communities are not selected, we may be under-representing certain ethnic groups, since we know that neighborhoods are often ethnically homogenous.

Finally, researchers must worry about **SELF-SELECTION** bias. Some people may choose not to participate in a study because they don't know about or don't care about the topic of the research. Surveys of political issues or politicians can be subject to this problem. Many nonrandom techniques are hampered by this source of bias.