Linear Statistical Models: Regression

Selection & Prediction

Variables in Regression

Variables used in a regression model generally serve one of two purposes:

  1. Variables whose effect on the dependent variable is to be studied
  2. Variables to be controlled (partialed out)

Purposeful Selection: The Best Way to Select Variables

  • Theory
  • Past empirical research
  • a thorough knowledge of the variables in your dataset


  • Prediction and explanation are central concepts in scientific research
  • The criterion par excellence of true knowledge is ... the ability to predict... (De Groot, 1969)
  • It is possible to predict phenomena without being able to explain them...(Scriven, 1959)
  • Distinction between predictive and explanatory research.

    Predictive Research

  • The goal is to optimize prediction of criteria.
  • Choice of variables is determined by their contribution to the prediction.
  • It does not matter whether the predictor works because it is a symptom or a cause.
  • The role of predictor and criterion can sometimes be interchanged.
  • Care needs to be taken not to infer dependent variable and independent variable roles in predictive research.


  • One of the major uses of regression analysis in predictive research is for selection purposes.
  • Examples:

    Selecting Variables for Prediction

  • Manual Selection
  • All possible regressions - Example
  • Forward Selection - Example
  • Backward Elimination - Example
  • Stepwise Selection - Example
  • Hierarchical Selection - Example

    Confidence Intervals for Predicted Scores

  • One Predictor:

  • Multiple Predictors:

  • Easier to let Stata compute confidence intervals

    Stata Example

    use, clear
    regress write science
    predict hat
    predict s, stdp
    gen low = hat - 1.96*s
    gen hi  = hat + 1.96*s
    twoway (rarea low hi science, sort bcolor(gs14))(scatter write science, jitter(2))

    Linear Statistical Models Course

    Phil Ender, 29Jan98