Introduction
Many of the statistical tests that we have looked at so far make assumptions concerning the distribution of the observations. For example, the t-test assumes that the observations come from normal populations that have equal variances. Nonparametric statistics allow the researcher to test hypotheses without making these assumptions. They are very useful when the data are badly behaved but are less powerful than parametric tests when distribution assumptions are met.
We will illustrate some nonparametric test using the hsb2 dataset, looking at the differences in writing test scores for males and females.
Students t-test (Parametric)
We will begin the the standard Students t-test (a parametric test) so that we can compare our results. The results from the t-test are statistically significant.
use http://www.philender.com/courses/data/hsb2, clear ttest write, by(female) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- male | 91 50.12088 1.080274 10.30516 47.97473 52.26703 female | 109 54.99083 .7790686 8.133715 53.44658 56.53507 ---------+-------------------------------------------------------------------- combined | 200 52.775 .6702372 9.478586 51.45332 54.09668 ---------+-------------------------------------------------------------------- diff | -4.869947 1.304191 -7.441835 -2.298059 ------------------------------------------------------------------------------ Degrees of freedom: 198 Ho: mean(male) - mean(female) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = -3.7341 t = -3.7341 t = -3.7341 P < t = 0.0001 P > |t| = 0.0002 P > t = 0.9999The Wilcoxen (Mann-Whitney) Test
The ranksum command performs the nonparametric equivalent of the t-test. And like the t-test it finds a significant difference with these data.
ranksum write, by(female) Two-sample Wilcoxon rank-sum (Mann-Whitney) test female | obs rank sum expected -------------+--------------------------------- male | 91 7792 9145.5 female | 109 12308 10954.5 -------------+--------------------------------- combined | 200 20100 20100 unadjusted variance 166143.25 adjustment for ties -852.96 ---------- adjusted variance 165290.29 Ho: write(female==male) = write(female==female) z = -3.329 Prob > |z| = 0.0009Looking at K-groups
We will stay with the hsb2 dataset but focus our attention on the differences in program type (prog).
One-way Anova (Parametric)
The one-way anova, a parametric test, will be used for comparison. Note that the results are statistically significant.
anova write prog Number of obs = 200 R-squared = 0.1776 Root MSE = 8.63918 Adj R-squared = 0.1693 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 3175.69786 2 1587.84893 21.27 0.0000 | prog | 3175.69786 2 1587.84893 21.27 0.0000 | Residual | 14703.1771 197 74.635417 -----------+---------------------------------------------------- Total | 17878.875 199 89.843593The Kruskal-Wallis tests is the nonparametric equivalent of the one-way anova.
kwallis write, by(prog) Test: Equality of populations (Kruskal-Wallis test) prog _Obs _RankSum general 45 4079.00 academic 105 12764.00 vocation 50 3257.00 chi-squared = 33.870 with 2 d.f. probability = 0.0001 chi-squared with ties = 34.045 with 2 d.f. probability = 0.0001
Intro Home Page
Phil Ender, 24Sep01