Introduction
Many of the statistical tests that we have looked at so far make assumptions concerning the distribution of the observations. For example, the t-test assumes that the observations come from normal populations that have equal variances. Nonparametric statistics allow the researcher to test hypotheses without making these assumptions. They are very useful when the data are badly behaved but are less powerful than parametric tests when distribution assumptions are met.
We will illustrate some nonparametric test using the hsb2 dataset, looking at the differences in writing test scores for males and females.
Students t-test (Parametric)
We will begin the the standard Students t-test (a parametric test) so that we can compare our results. The results from the t-test are statistically significant.
use http://www.philender.com/courses/data/hsb2, clear
ttest write, by(female)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
male | 91 50.12088 1.080274 10.30516 47.97473 52.26703
female | 109 54.99083 .7790686 8.133715 53.44658 56.53507
---------+--------------------------------------------------------------------
combined | 200 52.775 .6702372 9.478586 51.45332 54.09668
---------+--------------------------------------------------------------------
diff | -4.869947 1.304191 -7.441835 -2.298059
------------------------------------------------------------------------------
Degrees of freedom: 198
Ho: mean(male) - mean(female) = diff = 0
Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0
t = -3.7341 t = -3.7341 t = -3.7341
P < t = 0.0001 P > |t| = 0.0002 P > t = 0.9999
The Wilcoxen (Mann-Whitney) TestThe ranksum command performs the nonparametric equivalent of the t-test. And like the t-test it finds a significant difference with these data.
ranksum write, by(female)
Two-sample Wilcoxon rank-sum (Mann-Whitney) test
female | obs rank sum expected
-------------+---------------------------------
male | 91 7792 9145.5
female | 109 12308 10954.5
-------------+---------------------------------
combined | 200 20100 20100
unadjusted variance 166143.25
adjustment for ties -852.96
----------
adjusted variance 165290.29
Ho: write(female==male) = write(female==female)
z = -3.329
Prob > |z| = 0.0009
Looking at K-groupsWe will stay with the hsb2 dataset but focus our attention on the differences in program type (prog).
One-way Anova (Parametric)
The one-way anova, a parametric test, will be used for comparison. Note that the results are statistically significant.
anova write prog
Number of obs = 200 R-squared = 0.1776
Root MSE = 8.63918 Adj R-squared = 0.1693
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 3175.69786 2 1587.84893 21.27 0.0000
|
prog | 3175.69786 2 1587.84893 21.27 0.0000
|
Residual | 14703.1771 197 74.635417
-----------+----------------------------------------------------
Total | 17878.875 199 89.843593
The Kruskal-Wallis
tests is the nonparametric equivalent of the one-way anova.
kwallis write, by(prog)
Test: Equality of populations (Kruskal-Wallis test)
prog _Obs _RankSum
general 45 4079.00
academic 105 12764.00
vocation 50 3257.00
chi-squared = 33.870 with 2 d.f.
probability = 0.0001
chi-squared with ties = 34.045 with 2 d.f.
probability = 0.0001
Intro Home Page
Phil Ender, 24Sep01