Notes to Help Your Interpretation of the Standard Deviation
________________________________________________________________
1. s Is Not the Average Distance Between Points in a Data Set
It is the "average" distance between points in the data set and the mean,
m. If we wanted to average the distance between the points in the data set, we would first find all the differences between the points. Then average those. A simple example follows:

X     X-µ     (X-µ)2
1      -3         9
3      -1         1
6       2         4
     2          4
16              18
 

µ = 16/4 = 4 
 
 
X Values
1
3
6
6
1
0 = 1-1
-2 = 1=3
-5 = 1-6
-5 = 1-6
3
2 = 3-1
0 = 3-3
-3 = 3-6
-3 = 3-6
6
5 = 6-1
3 = 6-3
0 = 6-6
0 = 6-6
6
5 = 6-1
3 = 6-3
0 = 6-6
0 = 6-6

Notice that if we try to average these distance, we get zero. If we just consider distances (i.e. we use the absolute value of the distances), we would get 36/16 = 2.25, which is not the same as s = 2.1213. If we square each distance, average the squared distances, and then take the square root, we would obtain  Again, this differs from s.
________________________________________________________________

2. s is not a limit or maximum DEVIATION! It's an average deviation based on the deviations of all the data points. Generally, some values are less than s from m and some are further than s from m.
________________________________________________________________

3. Check for magnitude of s

Recall that s is the "average" distance of a point in the data set from the mean of the data set, m . Because it only considers distance, the direction of deviation (+ or -) is inconsequential.

Let sC = a computed value for s.

1. If sC > range of data set, then sC cannot be correct.
 

2. If sC > maximum possible deviation
         = { |smallest X value - m | or (largest X value - m )}.

           then sC cannot be correct
 

3. If sC < minimum possible devistion

         = |closest value to m - m |

           then sC cannot be correct
________________________________________________________________

4. The Interval m+s

The two points m - s and m + s are the endpoints of an interval. Although s is the "average" distance between points in the data set and the mean m, the points m - s and m + s do not represent average values of the original data set itself.

However, sometimes we want to know about values in this interval. If the data set is normal (bell-shaped), then 68% of all the values in the data set lie within this interval. So obviously, 32% must lie outside the interval.  Hence, m + s cannot be a limit on the range of the data set.  If the population is not normal, then we cannot generalize about the percent of the values in the interval. In fact, it is possible that more values lie outside this interval than inside it. For example, the standard deviation,s, of the data set {1 4 7} is 2.5 and m = 4, so the interval would be m - s = 4 - 2.5 = 1.5 and m + s = 4 + 2.5 = 6.5. Because 1 and 7 lie outside the interval (1.5, 6.5), then 2/3 or 66.7% of the data set values lie outside the interval.
 
 

 Return to Handout Menu