Logic of Hypothesis Testing

Introduction to Research Design and Statistics

The Logic of Hypothesis Testing

Hypotheses

We will be discussing two kinds of hypotheses, research hypotheses and statistical hypotheses. Research hypotheses are the ones that are stated in relatively plain English about what you think will be the outcome of the research.

Examples:

Consumption of sugar makes children more active or hyperactive.
Phonics is better for teaching reading than is whole language.

These are statements concerning expected outcomes.

Statistical hypotheses are probabilistic mathematical statements concerning population values, stated in terms of the parameters used in the research.

Statistical Hypotheses

There are two types of statistical hypotheses, null hypotheses and alternative hypotheses. Null hypotheses are denoted H₀: while alternative hypotheses can be denoted as either H₁: or H_a:.

Examples:

           H₀: μ₁ = μ₂
           H₁: μ₁ ≠ μ₂
           
           H₀: μ_i = μ_j for all i and j
           H₁: μ_i ≠ μ_j for some i and j

Types of Errors

		Truth about Population
		H₀ True	H₀ False
Decision Based on Sample	Reject H₀	Type I Error	Correct Decision
Decision Based on Sample	Fail To Reject H₀	Correct Decision	Type II Error

Probabilities

α is the probability of making a Type I Error and is called the level of significance or the α-level. α is the probability that you will reject the null hypothesis when it is true.

1 - α is called the level of confidence.

β is the probability of making a Type II Error.

1 - β is known as the power of a test. The power of a test is the ability of a statistical test to detect true effects when they exist. Thus, power is the probability that you will reject the null hypothesis when it is false, i.e., the probability that you will detect true differences when they exist.

Researchers can select the alpha level they wish to use. Common alpha levels include .05 and .01. Beta and power are controlled indirectly through

Sample Size
Alpha Level
Strength of the Treatment/Effect Size
Amount of Variability (in paticular error variability)

Choosing an Alpha Level

Make α too large and you will commit too many Type I Errors.

Make α too small and you will not have enough power to detect true effects when they exist.

Abuses of Statistical Tests

Statistical inference is not valid for all sets of data.

Beware of searching for significance (kitchen sink research).

Don't overlook non-significance.

The Meaning of Statistical Significance

It does not mean that the effect is large, important or meaningful.

The observed result is unlikely to occurr by chance alone.

It means that the observed effects are unlikely due to chance.

It means that the results are reliable and likely to be repeatable.

The P-Value

The probability, computed assuming that H₀ is true.

The smaller the P-value the more likely that we will reject H₀.

Probability Regions

Rejection regions.

Failure to reject region.

One-tail vs Two-tail Tests

One-tail test have a single rejection

One-tail tests are slightly more powerful.
One-tail test should only be done when:
- Theory makes a directional prediction.
- There is strong empirical evidence of directions differnces
H₀: μ₁ >= μ₂ H₁: μ₁ < μ₂
or
H₀: μ₁ <= μ₂ H₁: μ₁ > μ₂

Two-tail tests have two rejection regions.

H₀: μ₁ = μ₂ H₁: μ₁ ≠ μ₂

Distributions based upon squared values, such as, chi-square and F, have all of the rejection region in one tail but are, in fact, two tail tests of hypotheses.

Critical Values

Critical values of a statistic indicate the beginning of the rejection regions. For example, consider some criticla values for the standard normal distribution:

One-tail Two-tail

Alpha .01 2.33 ±2.58

.05 1.645 ±1.96

		One-tail	Two-tail
Alpha	.01	2.33	±2.58
.05	1.645	±1.96

Intro Home Page

Phil Ender, 30Jun98