Taking Statistics Tests & Writing

It is not enough to know statistics in your head.  You must be able to communicate the information you glean from numbers and situations in a clear, concise, and precise manner to other people.  Many decisions that we make and that businesses make are based on numbers, so a clear understanding of what those numbers tell us is critical.  In addition, business people generally have little time to decipher long detailed explanations and certainly no time to wade through murky, ambiguous, and redundant writing.  They need a clear picture after one read—possibly a scan.  To help you write better for yourself and your reader, I have a few comments and examples of good answers to share with you about the first test.

1. Question 1 had several parts that related to a histogram or frequency distribution.  Parts a and b require you to visualize the actual data that is summarized.  First you write down three potential values in the data set along with words “units” that give some context and meaning to the number.  If you had trouble doing this, then turn to some data sets in chapter 2 and get some experience with actual data.  After you read the context for the problem, describe the units that go with each value.  For instance, the values we used in the first class included magnitudes such as 20, 100, and 200.  Alone these numbers do not tell us anything.  Suppose we supply the word “miles” with each value and add that they are distances from campus to your hometowns.  Now they become reasonable and understandable values.  They allow us to check for unreasonable and unusual values.  If every value were 20, it would mean everyone lived 20 miles from campus.  We might be suspicious if we wanted to use this data set to make a decision related to distances students travel to get home.

To obtain some understanding of the distances everyone travels to get home, we can construct a frequency distribution and histogram.  Instead of going through and studying each value individually, we try to organize and summarize them, so we quickly get an image of what they are like.  Is there a pattern?  To construct a frequency distribution/histogram, we make up some bins and find where the values in the data set belong in these bins.  To practice construct a few bins for the data set you chose.  Then think about how you would label the horizontal axis with the same units that you gave the values in Part a.  For instances, when we use the hometown distances, we want to know how many of the distances were less than 50 miles.  How many were more than 50 miles but less than or equal to 100 miles?

Finally, realize that the frequency reflects the number of something that you accumulate in each bin.  In our example, each student submitted a distance, so we are counting up how many students travel more than 50 miles and at most 100 miles.  If you have daily output values for a 1000 days, then the frequency for a bin tells you the number of days that there were output values that fit into this bin.  Practice with a few data sets should help you visualize this process and make frequency distributions and histograms less obscure and abstract.

2. Part c of Question 1 asked about the mean and median as indicators of the central tendency of the data set shown in the frequency distribution or histogram.  To answer this question properly, you needed to investigate the given information about the data set and compare these two values in this situation without calculating them.  From your investigation, you were to choose the mean or median (or both) and justify your choice.

From our work together and your practice on your own or with other students, I expected you to know that frequency distributions and histograms suggest patterns of the data.  In a data set that is basically symmetric, these two measures of central tendency will usually be about the same.  They begin to differ when the data becomes irregular.  If the data is skewed or there are outliers, then we can predict that the mean will move away from the mass of the values in the data set and in the direction of the outliers or the skew to accommodate such values.  It becomes a poorer measure of the center of such a data set.  Think of your grades in different classes and how different scoring behavior (bell-shaped grades, skewed grades, grades with an outlier) affects the different measures of the central tendency of the grades.

I give you the opportunity to place the general information (italicized in the last paragraph) on your 3x5 index card.  Consequently, I give little or no credit for such a response.  I need evidence of your mind working and applying this general information.  Otherwise, you could take down my words from class, copy them on your 3x5 card, and simply transcribe them to your test paper.  This is mimicry and I need evidence of your understanding and competency.  For the test question, you needed to tell me something about the pattern of the data from the given information.  Then you needed to relate the consequences of such a pattern on the mean and median as measures of central tendency for this data set.

Also you needed to lead your reader through to your conclusion.  They need to follow the logic and understand how you reached your conclusion.  It’s always a dilemma for me when I grade, when you give me some pieces of the reasoning and a correct conclusion.  If you don’t complete the argument, then the reader must or it remains incomplete and unconvincing.  Do I assume that you know the unstated pieces?  I have to watch that it’s not me supplying the missing statements and giving you credit when there is really no evidence that you recognize the omissions.  I try to carefully read what you say, follow your argument and determine if the argument is complete and leads to a correct assessment of the situation.  When it is not clear, then you establish very little credibility and a poor evaluation.
 
Below are some responses that received full credit for this question.  Notice how they meet the above criteria.

11:00 class

The median, because the histogram is asymetrically [sic] right skewed which means that the mean exceeds the median.  This would not give you an accurate measure because the movies for the most part are all under 140 but there are some longer.

The median would be a better measure of central tendency.  This is because the data set is skewed right and the outliers would pull the mean down the tail of the curve.

Median, because the mean is affected by the outliers and the median isn’t.  The graph shows that the data is skewed to the right meaning that there are outliers on the right.

2:00 class

If made into a graph, the data would show a bell-shaped curve with minimal skewity [sic]; therefore the median and mean will have a similar value.

There are no outliers which would pull the data set and skew it.  It would basically be bell shaped and this would allow the mean or median to forecast data the same, b/c in bell shaped data the mean and median are the same.

The data seems symmetrical and bell shaped.  In bell shaped graphs, the mean and median are the same number.  In this case, since there doesn’t seem to be any outliers to inflate or deflate the mean, either mean or median should equally representative of the data.