Intuition and Confidence Intervals

Further Expositions on Intuition and Confidence Intervals

Here are some of my ideas about the intuition questions. I thought they might be helpful to you.

First, we must understand that a confidence interval is a range of values that we hope includes the true population value, for now that is the population mean. We might think of it as an interval of guesses of the values of the population mean. The interval is based on the information we randomly select from the population and include in the sample. The size of this interval depends on the confidence that we want to have that the interval includes the population mean, the size of the sample we select, and the standard deviation of dispersion of the population values (usually we estimate this standard deviation with the standard deviation of the sample of values we have collected).

In the exercises we hold two of these three variables constant, to see how the width of the confidence interval changes.

If the confidence level changes (while S and n do not change), we are changing the likelihood or probability that our estimation process produces an interval that contains the true mean. If you want to be more confident that an interval of values includes the true value (population mean), then make the interval larger, make it contain more possible values as your guesses. For instance, if I guess your age is between 20 and 22 versus guessing it is between 15 and 35, in which case do you think I am more likely to include your age?

Basically, I emphasize that changing n changes the amount of information we have about the population. More information should improve the chances for a good estimate. Also, a larger n means our sample should be more representative of the population, again improving our changes for a good estimate. Others look at a larger sample as decreasing the impact of outliers that might appear in the sample.

The way the width of a 95% (or any other fixed level) confidence interval reflects this added accuracy as n increases, is to shrink the interval or number of values that we might include as guesses to be 95% confident we’ve got the population mean inside our interval. The confidence level doesn’t change, but we can be just as confident the interval works with a smaller interval. Also, S is not changing, so the dispersion of the population does not change here, which would make it more or less difficult to get a good estimate.

The larger the standard deviation, the more dispersed and unalike (nonhomogeneous) the population values are, so it will be more difficult to obtain good estimates. When we select values for our sample from the population with a larger standard deviation, we are more likely to get values that are further from the population mean, so it will be more difficult to come up with a good estimate of the population mean with such values.

Thus, as S increases when we work with the same amount of population information (n) and the same level of confidence, the intervals must increase to be just as sure that the interval contains the true value.

I know it is tempting to postpone writing your own versions, but it will be easier to make mistakes now and correct them to get some feedback than to do it all on the next test.