![]() |
|
|
|
|
|
Inferring a population mean With all this variation in sample means, why do researchers use random selection of samples? Doesn't this lead to confusion? No, on the contrary, the only way we can use a statistic to estimate a parameter is by relying on known, predictable patterns of variation resulting from random selection of samples. An illustration should help you understand what how this is done. Imagine that we selected a large number of successive samples of 100 each from a population of 5,000 households. In addition, let's say that we obtained the number of persons in each household and calculated a mean for each sample. With these data, we could construct a frequency distribution of means, just as we did earlier with scores, but now the results are means. This distribution is known as the sampling distribution of the mean or simply as the sampling distribution. It tells us how many times the values for various means occurred. If we actually did this, we would find that most means clustered near the center of the distribution with fewer and fewer out toward the low or high end of the distribution. In fact, the sampling distribution would take the form of a normal distribution, like the one described at the end of Chapter 17. All the properties described earlier for a normal distribution apply to sampling distributions as well. Let's go over that point again because it is so important for understanding the process of making a statistical inference. We are referring to a distribution of means. The distribution of means from our many samples would take the form of a normal distribution. Further, the mean of the distribution — the mean of the means — would be as close as we could get to knowing the value of the mean of the population without conducting an enumeration of the population. Selecting all the necessary samples to prove this point, however, is obviously impossible. Fortunately, for us as researchers, it is also unnecessary. We can rely on the probability theory and statistical formula estimating a population mean from just one sample. We can do this because the sampling distribution, like any distribution, has a standard deviation. Also, like any standard deviation, the standard deviation of the sampling distribution indicates how the means from the many samples are distributed around the mean of the distribution. The standard deviation of the sampling distribution is referred to as the standard error of the mean or simply as the standard error, which is represented by the letters S.E.. Incidentally, standard error has nothing to do with errors made in measuring variables. It refers solely to variations among sample means arising from the chance variations that always occur in random selection of samples. Now, here comes a very important point: The standard error has the properties of a normal distribution. Approximately 68% of the means in a sampling distribution will lie within ±1 standard error of the mean of the sampling distribution; 95% will lie within ±2 standard errors; and over 99% will lie within ±3 standard errors. You might say this nice to know, but how does it help us when we have only one mean and one standard deviation from our single random sample. Here is the beauty of the underlying mathematics. We can use statistics from just the one sample, provided a random sample was used, to estimate the value of the mean in the population from which the sample was drawn. An illustration should help make this point clear. Let's say that analysis of household sizes in our random sample of 100 households produced a mean of 6.30 persons. Also, imagine that the in calculating the variance for the sample we found the following:
To complete the calculation of the variance of the sample, we would divide the numerator, called the "sum of squares" by N. To get the standard deviation for the sample, we extract the square root of the variance. As a step toward the calculation of the standard error (S.E.), we need to obtain an estimate of the standard deviation of the variable we are examining in the population from which the sample was selected. This result is represented by the symbol of ^s, called "s hat." The formula for s hat is similar to the one for standard deviation, except for the change to N-1 in the denominator: With these data, we can estimate the population mean. We know that each sample mean will differ from the population by some amount. In estimating a population mean, however, we have only one sample mean. Because of variations that occur from one random sample to the next, we know that this single mean is only one estimate of the population mean. Absent any other information, however, this single mean is our best estimate of the mean for the size of households in the population. Fortunately, the extent of variation among sample means can be estimated, using the standard deviation for the variable we are analyzing. In sampling, this random variation among sample statistics is referred to as sampling error. When it is used to estimate a parameter, it becomes standard error. This discussion of the basis
for inferring a population mean is based on the excellent presentation
by Dr. Trochim: at: Trochim, W. (2005). Selecting
Statistics.Retrieved June 7, 2005.
|