X X-µ (X-µ)2
1 -3
9
3 -1
1
6 2
4
6 2
4
16
18
µ = 16/4 = 4
X Values |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Notice that if we try to average these distance,
we get zero. If we just consider distances (i.e. we use the absolute value
of the distances), we would get 36/16 = 2.25, which is not the same as
s
= 2.1213. If we square each distance, average the squared distances, and
then take the square root, we would obtain
Again, this differs from s.
________________________________________________________________
2. s
is not a limit or maximum DEVIATION! It's an average deviation based
on the deviations of all the data points. Generally, some values are less
than s from
m
and some are further than s
from m.
________________________________________________________________
3. Check for magnitude of s
Recall that s is the "average" distance of a point in the data set from the mean of the data set, m . Because it only considers distance, the direction of deviation (+ or -) is inconsequential.
Let sC = a computed value for s.
1. If sC
>
range of data set, then sC
cannot
be correct.
2. If sC
>
maximum possible deviation
= { |smallest X value - m
| or (largest X value - m
)}.
then sC
cannot
be correct
3. If sC < minimum possible devistion
= |closest value to m - m |
then sC
cannot
be correct
________________________________________________________________
4. The Interval m+s
The two points m - s and m + s are the endpoints of an interval. Although s is the "average" distance between points in the data set and the mean m, the points m - s and m + s do not represent average values of the original data set itself.
However, sometimes we want to know about
values in this interval. If the data set is normal (bell-shaped), then
68% of all the values in the data set lie within this interval. So obviously,
32% must lie outside the interval. Hence, m
+ s cannot be
a limit on the range of the data set. If the population is not normal,
then we cannot generalize about the percent of the values in the interval.
In fact, it is possible that more values lie outside this interval than
inside it. For example, the standard deviation,s,
of the data set {1 4 7} is 2.5 and m
= 4, so the interval would be m
- s = 4 - 2.5
= 1.5 and m
+ s = 4 + 2.5
= 6.5. Because 1 and 7 lie outside the interval (1.5, 6.5), then 2/3 or
66.7% of the data set values lie outside the interval.