Linear Statistical Models: Regression

Problems with Stepwise Regression

This statement by Singer & Willet (2003) represents one of the best statements concerning the use of stepwise approaches:

These comments are from the Stata FAQ pages (

Frank Harrell's comments:

Here are some of the problems with stepwise variable selection.

Note that "all possible subsets" regression does not solve any of these problems.


Altman, D. G. and P. K. Andersen. 1989. Bootstrap investigation of the stability of a Cox regression model. Statistics in Medicine 8: 771-783.

Derksen, S. and H. J. Keselman. 1992. Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology 45: 265-282.


Roecker, Ellen B. 1991. Prediction error and its estimation for subset--selected models. Technometrics 33: 459-468. Mantel, Nathan. 1970. Why stepdown procedures in variable selection. Technometrics 12: 621-625.

Hurvich, C. M. and C. L. Tsai. 1990. The impact of model selection on inference in linear regression. American Statistician 44: 214-217.

Copas, J. B. 1983. Regression, prediction and shrinkage (with discussion). Journal of the Royal Statistical Society B 45: 311-354.

Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B 58: 267-288.

Ronan Conroy's comments:

I am struck by the fact that Judd and McClelland in their excellent book Data Analysis: A Model Comparison Approach (Harcourt Brace Jovanovich, ISBN 0-15-516765-0) devote less than 2 pages to stepwise methods. What they do say, however, is worth repeating:

They end with a quote from Henderson and Velleman's paper "Building multiple regression models interactively" (1981, Biometrics 37: 391-411): and they add Personally, I would no more let an automatic routine select my model than I would let some best-fit procedure pack my suitcase.

Summary by Steve Blinkhorn:

So here is a brief abstract of the BJMSP paper, plus odd extracts from elsewhere:

The use of automated subset search algorithms is reviewed and issues concerning model selection and selection criteria are discussed. In addition, a Monte Carlo study is reported which presents data regarding the frequency with which authentic and noise variables are selected by automated subset algorithms. In particular, the effects of the correlation between predictor variables, the number of candidate predictor variables, the size of the sample, and the level of significance for entry and deletion of variables were studied for three automated subset selection algorithms: BACKWARD ELIMINATION, FORWARD SELECTION and STEPWISE. Results indicated that: (1) the degree of correlation between the predictor variables affected the frequency with which authentic predictor variables found their way into the final model; (2) the number of candidate predictor variables affected the number of noise variables that gained entry to the model; (3) the size of the sample was of little practical importance in determining the number of authentic variables contained in the final model; and (4) the population multple coefficient of determination could be faithfully estimated by adopting a statistic that is adjusted by the total number of candidate predictor variables rather than the number of variables in the final model.

..... the degree of collinearity between predictor variables was the most important factor influencin the selection of authentic variables.... ... the number of candidate predictor variables affected the number of noise variables that gained entry to the model ...

...Even in the most favourable case investigated ..... 20 per cent of the variables finding their way into the model were noise. In the worst case .... 74 per cent of the selected variables were noise.

... the average number of authentic variables found in the final subset models was always less than half the number of available authentic predictor variables.

.... the 'data mining' approach to model building is likely to result in final models containing a large percentage of noise variables which wil be interpreted incorrectly as authentic.

Linear Statistical Models Course

Phil Ender, 14jan00