Linear Statistical Models: Regression
Selection & Prediction
Variables in Regression
Variables used in a regression model generally serve one of two purposes:
- Variables whose effect on the dependent variable is to be studied
- Variables to be controlled (partialed out)
Purposeful Selection: The Best Way to Select Variables
Past empirical research
a thorough knowledge of the variables in your dataset
Prediction and explanation are central concepts in scientific research
The criterion par excellence of true knowledge is ... the ability to predict... (De Groot, 1969)
It is possible to predict phenomena without being able to explain them...(Scriven, 1959)
Distinction between predictive and explanatory research.
The goal is to optimize prediction of criteria.
Choice of variables is determined by their contribution to the prediction.
It does not matter whether the predictor works because it is a symptom or a cause.
The role of predictor and criterion can sometimes be interchanged.
Care needs to be taken not to infer dependent variable and independent variable
roles in predictive research.
One of the major uses of regression analysis in predictive research is for selection purposes.
- Selecting students to be admitted to a college.
- Selecting among applicants for specialized training.
- Selecting candidates to receive specialized treatment.
Selecting Variables for Prediction
All possible regressions - Example
Forward Selection - Example
Backward Elimination - Example
Stepwise Selection - Example
Hierarchical Selection - Example
Confidence Intervals for Predicted Scores
Easier to let Stata compute confidence intervals
use http://www.philender.com/courses/data/hsbdemo, clear
regress write science
predict s, stdp
gen low = hat - 1.96*s
gen hi = hat + 1.96*s
twoway (rarea low hi science, sort bcolor(gs14))(scatter write science, jitter(2))
Linear Statistical Models Course
Phil Ender, 29Jan98