LECTURE NOTES:  WEEK 3

Topics:
      Resources--Links and Sites:  NIST and Others
      Review of Problems Set 2
      Basis of Inferential Statistics Continued
      Risk, Type I and Type II errors
      Tradition Approach to Hypothesis Testing
      Probability Distributions--Why C.L.T. is a big deal.
         When & Why to Use T vs Normal
      Statistical Tools & Procedures

I.  Basis of Inferential Statistics

STEPS IN THE EXPERIMENTAL DESIGN PROCESS

1.   Initial Observation (Identify the Problem)
2.   Information Gathering (Study the process and gather historical data)
3.   Clearly identify the purpose of the research (experiment)
4.   Make a hypothesis
5.   Design an experimental procedure to test the hypothesis
6.   Obtain materials, equipment, and authoization to conduct the experiment
7.   Conduct the experiment and record the data
8.   Analyze the data
9.   Intrepret the results
10. Draw a conclusion

               The grouping  of  data defined by a boundary function curve   f (x).

           For any given group, a unique f(x) or DISTRIBUTION exits,  however
           traditional statatistical theory and approaches have generalized groupings into
           some the following distributions:

                    NORMAL
                            STUDENT'S T
                            F
                            POISSON
                            BIONOMIAL
                            LOGNORMAL
                            CHI SQUARE

              THE ROLE OF PROBABILITY PROVIDES THE BASIS FOR DECISION MAKING.
              THEREFORE, UNDERLYING PROBABILITY DISTRIBUTIONS ARE EMPLOYED.

              WHAT IS AN UNDERLYING PROBABILITY DISTRIBUTION?
 
 

                    P = OUTCOME / POSSIBILITIES:

                    P(roll = 2) = 1/36 = .028
                    P(roll = 3) = 2/36 = .056
                    P(roll = 4) = 3/36 = .083
                    P(roll = 5) = 4/36 = .111
                    P(roll = 6) = 5/36 = .138
                    P(roll = 7) = 6/36 = .167

                  THEORETICAL SAMPLING DISTRIBUTIONS:

                   If all parameters are known, no reason to use inferential statistics.  However, if a
                   sample is taken and inference to the population is made, an underlying samplying
                   distrubition is used.

                   CENTRAL LIMIT THEORM:

                         1. Distribution of sample means approaches a normal distribution
                             (even if the population itself is NOT normal).

                         2.  The Mean of Sample Means = m

                       3.  The Standard Deviation of  the Distribution of Sample Means is equal to:

        IMPLICATIONS:

               1.  Larger sample sizes reflect more accurately "true" parameters.

               2.  If the population is normallly distributed  then,  theorm hold true
                    for even small samples.

               3.  If the population is NOT normally distributed then large sample
                    sizes are necessary to justify its (Central Limit Theorm) use.

Inferential Statistics and Hypotheses Involving Means

A statistic is a measure computed from a sample and is used to reduce
large amounts of data to more simple and manageable forms … or their
description.

Inferences, on the other hand, help researchers make reliable decisions
about a population through statistical experiments on sample data.
This normally involves tests of statistical hypotheses … with the goal
being to “establish chance expectation as our hypothesis and try to fit
the sample data to the chance model.”
Inferences can be made from a sample to a population and between
samples.

Under assumed population conditions, a statistician can use sample
data descriptive measures – means, medians, variances, standard
deviations, etc. – and inferential statistics to assess the “probability of
obtaining significant differences” in the measures.

Classical hypothesis testing involves a formal multi-step procedure that
leads from the statement of hypotheses to a conclusive statement
(decision) regarding the hypotheses:
 

Step 1 Statement of Null and Alternate Hypotheses
Step 2 Select Appropriate Statistical Tests
Step 3 Select Level of Significance
Step 4 Construct a Decision Rule or Select Rejection Criteria
Step 5 Calculate the Test Statistic
Step 6 Make the Decision regarding the Hypotheses
Step 1. Statement of Hypotheses:
STEP 1
Null Hypothesis – Ho – sometimes called the “chance hypothesis“. Simply
a statement of “no effect or difference” or no significant relationship
between the statistics being tested.
Takes on two main forms:

This represents a succinct way to express the testing of sample data
against chance expectation from the population characteristics.
Some fluctuation is expected in Ho, but it differs with sample size. The
Central Limit Theorem also allows us to state this expected variability
of sample means through the standard error of the mean.

Similar to the standard deviation measure, the standard error of the
mean indicates the typical or standard amount that a sample mean will
differ from the true population.

Simply speaking, the standard error is a basic measure of the amount of
sampling error in a problem.

Alternative Hypothesis – Ha – or the research hypothesis, this is what
you expect or what causal theory tells you might be the case.
Is either:


Ho and Ha are also mutually exclusive
Ho is the hypothesis being tested directly; Ha is supported when the null
hypothesis is rejected as being unlikely.

Because the decision to reject or accept is usually based upon a single
sample, there is a measurable chance or probability of making an
incorrect decision or reaching a wrong conclusion.
Error comes from two possible sources:


These errors are symbolized by:

Type I Error – a and is normally given as a certain level (5% or 0.05 is
most common). This states that the Ho will be erroneously rejected
approximately 5% of the time. The a level is important as it gives a level
by which you accept or reject null hypotheses (too low … indefensible
results, too high … some important relationships not reported).
Type II Error – beta and is much harder to specify than a Type I error. As a
researcher lowers a, then the probability of b goes up
.
Analogy – the decisions between limiting a Type I or II error is similar
to our judicial system … convicting a person who is truly not guilty
(Type I) is considered a more serious error than freeing a guilty person
(Type II). If reasonable doubt exists, a jury reaches a verdict of not
guilty … does not mean that the defendant is innocent.

Step 2.
Select Appropriate Statistical Test – this decision is somewhat
complicated, but ultimately is based upon a test distribution. Several
test distributions exist that are useful in geographical research – the

Poisson, Z, F, t, and X2 – which one is used depends upon:
· The type of data
· The sample size
· The characteristics of the population that the sample came from
Correct Decision
Type I Error
Type II Error
Correct Decision
Situation
Conclusion
Ho is True Ho is False
Ho is not
Rejected
Ho is
Rejected
These “tests of significance” are divided into two major groups:
Parametric – tests in which the hypotheses pertain to population
parameters based on a few assumptions:
· Population from which sample is drawn is normally distributed
· Homogeneity of variance from sample to sample
· Data must be measured on continuous metric scales … ratio or
interval
· Data are independent
Non-Parametric – tests of significance not based on normal distribution
assumptions … not inferior, just less assuming.

Two most common parametric tests of significance are:
Difference of Means Z test – used when the sample size is large and/or
the population standard deviation (sigma) is known.
t test – used when the sample size is small and/or the s is unknown.

Step 3.

Select Level of Significance – main goal is to place a probability
on the likelihood of a sampling error. Alternatively, that one expects
a sample statistic to deviate significantly only so many times out of
100 (the alpha level).

“The alpha level establishes the size of the rejection region of whatever test
distribution is being used.”

If the significance level is stated as alpha = 0.05 and a null hypothesis is
rejected, it is said “the statistical test is significant at the .05 level.”
This would mean that the chance that a Type I error has occurred is
only 5%, and it is only 5% likely that the null hypothesis has been
improperly rejected because of a random sampling error.

Step 4.
Construct a Decision Rule or Select Rejection Criteria – once
an alpha level has been selected, the value is used to create regions of
rejection and non-rejection of the null hypothesis.
The total area in which Ho is rejected, as represented by the significance
level, encompasses 5% of the area under the curve.

The cut-off point is called the critical value with the rejection region
falling on the tails of the distribution and the acceptance region is the
body of the distribution (the confidence limits).
If the alternative hypothesis is non-directional, then the rejection region
is of a two-tailed test. If it is directional (as are most studies), then the
rejection region is of a one-tailed test.
Be careful … if the test is two-tailed, then the a level is divided into
both tails and becomes alpha/2.
Sample decision rule for a non-directional alternative hypothesis with
an alpha level = 0.05:

Step 5.
Calculate the Test Statistic – at this step, sample data are
evaluated using the test statistic. The general form of a parametric
test statistic (decision maker) is:

Step 6.
Make the Decision regarding the Hypotheses – once all of the
steps have been completed, a decision can be made.
If the test statistic falls between the critical values … do not reject Ho
If the test statistic falls outside the critical values … reject Ho
Decision Rule:

Two important concepts related to Steps 1 through 6:
P-value: the probability, computed assuming Ho is true, that the
test statistic will take a value at least as extreme as that actually
observed is called the P-value of the test. The smaller the P-value
is, the stronger is the evidence against Ho provided by the data.
Statistical Significance: if the P-value is as small or smaller than
alpha, we say that the data are statistically significant at the level alpha.
“Significant” in the statistical sense does not mean “important”, but
means that the evidence against the null hypothesis reached the
standard set by a.

Confidence Limits – a statement of statistical confidence! Given that
most data represent a sample of some longer time period, the record
may contain some sampling error.
The question is, do the points representing the observed data depart
from the theoretical normal curve in excess of sampling error?
To answer this question a researcher must compute and understand
confidence intervals (CI).

This comes from the question regarding how far from the population mean
would the sample mean have to deviate to cause one not to accept Ho?
In repeated sampling, the sample mean (x) follows the normal
distribution centered at the unknown population mean (mu) and having a
standard deviation:

Then, by the Empirical Rule, we know that the probability is about 0.95
that x will be within two standard deviations of the population mean score.

Therefore, 95% of all samples from the population will capture the true
mu in the interval from x - 2sigma and x + 2sigma

The language of statistical inference uses this fact about what would
happen in the long run to express our confidence in the results of any
one sample.

Confidence limits take on many forms that are dependent upon the test
statistic being used.

Determination of Sample Size for Estimating Means - the use of
confidence intervals and sample error leads one to a method for
calculating the sample size needed to estimate a population mean.
The width of a CI is determined by the test statistic score and the
standard error. If the test statistic is invariable, the interval width is
controlled by reducing the standard error, or by increasing the sample
size.

If s is not known, as is normally the case, a small sample must be
drawn from the population (pre-test sample) and its standard deviation
is used in its place.
T-test – results from what is called the “Student t Distribution” and is
the most common used test statistic in studies involving tests between:

· The difference between sample and population means
· The difference between independent sample means
· Paired comparisons tests of the difference between samples
The value, t, is the deviation of a sample mean from its population
mean measured in units of standard error. The t distribution is tabled
and represents a family of curves for differing sample sizes.
As sample size gets larger, the t distribution approaches that of a
normal distribution.
The distribution of t required for a particular experiment must take into
account something called the degrees of freedom (df) available. This
term refers to the latitude of variance that a statistical problem has. It is
usually n-1.

By examining the t distribution from a reference table, the following can be observed:

· As df shrinks (sample size gets small), the t value gets larger, to allow
   for a broader acceptance region of the null hypothesis.
· As df increases (sample size gets larger), the t value gets smaller, and
   approaches the z value.


Which test to use?

NOTE:  WHEN VARIANCE IS UNKNOW IN THE POPULATION OR
             A SMALL SAMPLE SIZE IS USED, A "t" STATISTIC IS CALCULATED,
             AND THE T-DISTRIBUTION MUST BE USED.
             IF VARIANCE IS KNOWN IN THE POPULATION, A "Z" STASTIC IS CALCULATED,
             AND THE NORMAL DISTRIBUTION IS USED.