Sampling Variability of a Statistic
The statistic of a sampling distribution was discussed in Descriptive Statistics: Measures the Center of the Data. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example of a standard error. The standard error is the standard deviation of the sampling distribution. In other words, it is the average standard deviation that results from repeated sampling. You will cover the standard error of the mean in the chapter The Central Limit Theorem (not now). The notation for the standard error of the mean is , where σ is the standard deviation of the population and n is the size of the sample.
NOTE
In practice, USE A CALCULATOR OR COMPUTER SOFTWARE TO CALCULATE THE STANDARD DEVIATION. If you are using a TI-83, 83+, or 84+ calculator, you need to select the appropriate standard deviation σx or sx from the summary statistics. We will concentrate on using and interpreting the information that the standard deviation gives us. However, you should study the following step-by-step example to help you understand how the standard deviation measures variation from the mean. The calculator instructions appear at the end of this example.
Example 2.33
In a fifth-grade class, the teacher was interested in the average age and the sample standard deviation of the ages of her students. The following data are the ages for a SAMPLE of n = 20 fifth-grade students; the ages are rounded to the nearest half year:
9, 9.5, 9.5, 10, 10, 10, 10, 10.5, 10.5, 10.5, 10.5, 11, 11, 11, 11, 11, 11, 11.5, 11.5, 11.5
The average age is 10.53 years, rounded to two places.
The variance may be calculated by using a table. Then the standard deviation is calculated by taking the square root of the variance. We will explain the parts of the table after calculating s.
Data |
Frequency |
Deviations |
Deviations2 |
(Frequency)(Deviations2) |
x |
f |
(x – ) |
(x – )2 |
(f)(x – )2 |
9 |
1 |
9 – 10.525 = –1.525 |
(–1.525)2 = 2.325625 |
1 × 2.325625 = 2.325625 |
9.5 |
2 |
9.5 – 10.525 = –1.025 |
(–1.025)2 = 1.050625 |
2 × 1.050625 = 2.101250 |
10 |
4 |
10 – 10.525 = –0.525 |
(–0.525)2 = 0.275625 |
4 × 0.275625 = 1.1025 |
10.5 |
4 |
10.5 – 10.525 = –0.025 |
(–.025)2 = 0.000625 |
4 × .000625 = 0.0025 |
11 |
6 |
11 – 10.525 = 0.475 |
(.475)2 = 0.225625 |
6 × .225625 = 1.35375 |
11.5 |
3 |
11.5 – 10.525 = 0.975 |
(0.975)2 = 0.950625 |
3 × .950625 = 2.851875 |
|
|
|
|
The total is 9.7375. |
Table 2.32
The last column simply multiplies each squared deviation by the frequency for the corresponding data value.
The sample variance, s2, is equal to the sum of the last column (9.7375) divided by the total number of data values minus one (20 – 1):
The sample standard deviation s is equal to the square root of the sample variance:
which is rounded to two decimal places, s = .72.
Typically, you do the calculation for the standard deviation on your calculator or computer. The intermediate results are not rounded. This is done for accuracy.
For the following problems, recall that value = mean + (#ofSTDEVs)(standard deviation); verify the mean and standard deviation on a calculator or computer:
Note that these formulas are derived by algebraically manipulating the z-score formulas, given either parameters or statistics.
- For a sample: x = + (#ofSTDEVs)(s)
- For a population: x = μ + (#ofSTDEVs)(σ)
- For this example, use x = + (#ofSTDEVs)(s) because the data is from a sample
- Verify the mean and standard deviation on your calculator or computer.
- Find the value that is one standard deviation above the mean. Find ( + 1s).
- Find the value that is two standard deviations below the mean. Find ( – 2s).
- Find the values that are 1.5 standard deviations from (below and above) the mean.
Solution 2.33
-
Using the TI-83, 83+, 84, 84+ Calculator
- Clear lists L1 and L2. Press STAT 4:ClrList. Enter 2nd 1 for L1, the comma (,), and 2nd 2 for L2.
- Enter data into the list editor. Press STAT 1:EDIT. If necessary, clear the lists by arrowing up into the name. Press CLEAR and arrow down.
- Put the data values (9, 9.5, 10, 10.5, 11, 11.5) into list L1 and the frequencies (1, 2, 4, 4, 6, 3) into list L2. Use the arrow keys to move around.
- Press STAT and arrow to CALC. Press 1:1-VarStats and enter L1 (2nd 1), L2 (2nd 2). Do not forget the comma. Press ENTER.
- = 10.525.
- Use Sx because this is sample data (not a population): Sx=.715891.
- ( + 1s) = 10.53 + (1)(.72) = 11.25
- ( – 2s) = 10.53 – (2)(.72) = 9.09
-
- ( – 1.5s) = 10.53 – (1.5)(.72) = 9.45
- ( + 1.5s) = 10.53 + (1.5)(.72) = 11.61
Try It 2.33
On a baseball team, the ages of each of the players are as follows:
21, 21, 22, 23, 24, 24, 25, 25, 28, 29, 29, 31, 32, 33, 33, 34, 35, 36, 36, 36, 36, 38, 38, 38, 40
Use your calculator or computer to find the mean and standard deviation. Then find the value that is two standard deviations above the mean.
Explanation of the standard deviation calculation shown in the table
The deviations show how spread out the data are about the mean. The data value 11.5 is farther from the mean than is the data value 11, which is indicated by the deviations .97 and .47. A positive deviation occurs when the data value is greater than the mean, whereas a negative deviation occurs when the data value is less than the mean. The deviation is –1.525 for the data value nine. If you add the deviations, the sum is always zero. We can sum the products of the frequencies and deviations to show that the sum of the deviations is always zero. For Example 2.33, there are n = 20 deviations. So you cannot simply add the deviations to get the spread of the data. By squaring the deviations, you make them positive numbers, and the sum will also be positive. The variance, then, is the average squared deviation.
The variance is a squared measure and does not have the same units as the data. Taking the square root solves the problem. The standard deviation measures the spread in the same units as the data.
Notice that instead of dividing by n = 20, the calculation divided by n – 1 = 20 – 1 = 19 because the data is a sample. For the sample variance, we divide by the sample size minus one (n – 1). Why not divide by n? The answer has to do with the population variance. The sample variance is an estimate of the population variance. Based on the theoretical mathematics that lies behind these calculations, dividing by (n – 1) gives a better estimate of the population variance.
NOTE
Your concentration should be on what the standard deviation tells us about the data. The standard deviation is a number that measures how far the data are spread from the mean. Let a calculator or computer do the arithmetic.
The standard deviation, s or σ, is either zero or larger than zero. Describing the data with reference to the spread is called variability. The variability in data depends on the method by which the outcomes are obtained, for example, by measuring or by random sampling. When the standard deviation is zero, there is no spread; that is, all the data values are equal to each other. The standard deviation is small when all the data are concentrated close to the mean and larger when the data values show more variation from the mean. When the standard deviation is a lot larger than zero, the data values are very spread out about the mean; outliers can make s or σ very large.
The standard deviation, when first presented, can seem unclear. By graphing your data, you can get a better feel for the deviations and the standard deviation. You will find that in symmetrical distributions, the standard deviation can be very helpful, but in skewed distributions, the standard deviation may not be much help. The reason is that the two sides of a skewed distribution have different spreads. In a skewed distribution, it is better to look at the first quartile, the median, the third quartile, the smallest value, and the largest value. Because numbers can be confusing, always graph your data. Display your data in a histogram or a box plot.
Example 2.34
Use the following data (first exam scores) from Susan Dean's spring precalculus class.
33, 42, 49, 49, 53, 55, 55, 61, 63, 67, 68, 68, 69, 69, 72, 73, 74, 78, 80, 83, 88, 88, 88, 90, 92, 94, 94, 94, 94, 96, 100
- Create a chart containing the data, frequencies, relative frequencies, and cumulative relative frequencies to three decimal places.
- Calculate the following to one decimal place using a TI-83+ or TI-84 calculator:
- The sample mean
- The sample standard deviation
- The median
- The first quartile
- The third quartile
- IQR
- Construct a box plot and a histogram on the same set of axes. Make comments about the box plot, the histogram, and the chart.
Solution 2.34
- See Table 2.33.
- Entering the data values into a list in your graphing calculator and then selecting Stat, Calc, and 1-Var Stats will produce the one-variable statistics you need.
- The x-axis goes from 32.5 to 100.5; the y-axis goes from –2.4 to 15 for the histogram. The number of intervals is 5, so the width of an interval is (100.5 – 32.5) divided by 5, equal to 13.6. Endpoints of the intervals are as follows:
- the starting point is 32.5, 32.5 + 13.6 = 46.1, 46.1 + 13.6 = 59.7, 59.7 + 13.6 = 73.3
- 73.3 + 13.6 = 86.9, 86.9 + 13.6 = 100.5 = the ending value
- no data values fall on an interval boundary
The long left whisker in the box plot is reflected in the left side of the histogram. The spread of the exam scores in the lower 50 percent is greater (73 – 33 = 40) than the spread in the upper 50 percent (100 – 73 = 27). The histogram, box plot, and chart all reflect this. There are a substantial number of A and B grades (80s, 90s, and 100). The histogram clearly shows this. The box plot shows us that the middle 50 percent of the exam scores (IQR = 29) are Ds, Cs, and Bs. The box plot also shows us that the lower 25 percent of the exam scores are Ds and Fs.
Data |
Frequency |
Relative Frequency |
Cumulative Relative Frequency |
33 |
1 |
0.032 |
0.032 |
42 |
1 |
0.032 |
0.064 |
49 |
2 |
0.065 |
0.129 |
53 |
1 |
0.032 |
0.161 |
55 |
2 |
0.065 |
0.226 |
61 |
1 |
0.032 |
0.258 |
63 |
1 |
0.032 |
0.29 |
67 |
1 |
0.032 |
0.322 |
68 |
2 |
0.065 |
0.387 |
69 |
2 |
0.065 |
0.452 |
72 |
1 |
0.032 |
0.484 |
73 |
1 |
0.032 |
0.516 |
74 |
1 |
0.032 |
0.548 |
78 |
1 |
0.032 |
0.580 |
80 |
1 |
0.032 |
0.612 |
83 |
1 |
0.032 |
0.644 |
88 |
3 |
0.097 |
0.741 |
90 |
1 |
0.032 |
0.773 |
92 |
1 |
0.032 |
0.805 |
94 |
4 |
0.129 |
0.934 |
96 |
1 |
0.032 |
0.966 |
100 |
1 |
0.032 |
0.998 (Why isn't this value 1?) |
Table 2.33
Try It 2.34
The following data show the different types of pet food that stores in the area carry:
6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 9, 9, 9, 9, 10, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12
Calculate the sample mean and the sample standard deviation to one decimal place using a TI-83+ or TI-84 calculator.