### Linear Statistical Models: Regression

### Selection & Prediction

**Variables in Regression**

Variables used in a regression model generally serve one of two purposes:

- Variables whose effect on the dependent variable is to be studied
- Variables to be controlled (partialed out)

**Purposeful Selection: The Best Way to Select Variables**

Theory
Past empirical research
a thorough knowledge of the variables in your dataset
**Prediction**

Prediction and explanation are central concepts in scientific research
The criterion par excellence of true knowledge is ... the ability to predict... (De Groot, 1969)
It is possible to predict phenomena without being able to explain them...(Scriven, 1959)
Distinction between predictive and explanatory research.
**Predictive Research**

The goal is to optimize prediction of criteria.
Choice of variables is determined by their contribution to the prediction.
It does not matter whether the predictor works because it is a symptom or a cause.
The role of predictor and criterion can sometimes be interchanged.
Care needs to be taken not to infer dependent variable and independent variable
roles in predictive research.
**Selection**

One of the major uses of regression analysis in predictive research is for selection purposes.
Examples:
- Selecting students to be admitted to a college.
- Selecting among applicants for specialized training.
- Selecting candidates to receive specialized treatment.

**Selecting Variables for Prediction**

Manual Selection
All possible regressions - Example
Forward Selection - Example
Backward Elimination - Example
Stepwise Selection - Example
Hierarchical Selection - Example
**Confidence Intervals for Predicted Scores**

One Predictor:

Multiple Predictors:

Easier to let Stata compute confidence intervals
**Stata Example**

**use http://www.philender.com/courses/data/hsbdemo, clear
regress write science
predict hat
predict s, stdp
gen low = hat - 1.96*s
gen hi = hat + 1.96*s
twoway (rarea low hi science, sort bcolor(gs14))(scatter write science, jitter(2))**

Linear Statistical Models Course

Phil Ender, 29Jan98