Linear Statistical Models

Robust Anova

Updated for Stata 11

When data do not completely meet the assumptions underlying the analysis of variance and/or when there are outliers or influential data points robust anova procedures can be used.

The most basic robust procedures are to analyze the data using regression with robust standard errors or to use the robust regression command rreg. Regress with the robust option is more appropriate designs with heterogeneity of variance. The rreg command is less useful except in situations with extreme outliers, although there may be other ways of dealing with these kinds of outliers. Of course, it is necessary to correctly code each of the categorical variables to use the regression approach.

One-Factor Designs

In addition to robust regression techniques, the fstar and wtest commands can be use with one-factor designs. Both fstar and wtest are more robust to heteroscedasticity that regular one-way anova.

Consider an example using write as the response variable and prog as the categorical variable.

Nonparametric Test

We can also try a nonparametric test, such as the Kruskal-Wallis test. The Kruskal-Wallis is the nonparametric analog of the one-way anova.

Bootstrap Standard Errors

Least Absolute Deviation

Minimizes the absolute values of deviations from the median, which is why it is often known as median regression. We will run the example using bootstrap standard errors with 200 replications. bsqreg does not work with factor variables so we will use the xi: prefix.

Permutation Tests

Permutation tests are based on Monte Carlo simulations and can be used to test the differences among the levels of the independent variable. For each repetition, the values of group variable are randomly permuted, the test statistic is computed, and a count is kept whether this value of the test statistic is more extreme than the observed test statistic.

Stata uses the permute command along with an ado program to perform permutation tests. Here is an example of a program that can be used with one-factor designs. It can be placed in the data directory and should be named panova1.ado.

Here is how it is used with cr4new. Next we will modify the dataset so that the results are not significant and repeat the permutation tests. Two-Factor Designs

Consider a two-factor design using write as the response variable and female and prog as the categorical independent variables.

Linear Statistical Models Course

Phil Ender, 17sep10, 15may06, 4apr06, 2apr02