Linear Statistical Models: Regression

Some Scaling Issues

Updated for Stata 11


The effect of scaling predictor variables can be easily demonstrated using the variable read in the hsbdemo dataset. We will begin with a model regressing write on female and read.

Example 1

The coefficient for read (.57) indicates how much change is expected in write when there is a one unit increase in read with female held constant. The concern here is that a one unit change might not be terribly meaningful. Suppose that research has indicated that a 12 point change in read is meaningful. Here is what you could do. Now a one unit change in read12 predicts a 6.8 point change in write with female held constant. A one point change in read12 is equivalent to a 12 point change in read.

Note that the standardized coefficients are identical for read and read12.

Example 2

Now, what if reading was a categorical variable? We will divide read up into five categories. Please realize that I am not suggesting that you should take a continuous variable and break it up into categories, but to show the effect of scaling read as a categorical variable.

Let's run a regression with dummy coded readcat. We see that overall readcat is a significant predictor of write. The R2 for this model is .4199 as compared to .4394 when read is continuous. Next, let's use readcat in a model but treat it as a one degree of freedom linear predictor. The linear form of readcat is still significant but the R2 for the model has gone down to .4154, a trivial difference for a gain of four degrees of freedom in the residual.

We can test to see if the difference between using read and readcat is significant by including both in a model. The significant coefficient for read (below) suggests that the continuous form of read accounts variability in reading that is not captured in the categorical form.


Linear Statistical Models Course

Phil Ender, 20sep10, 22dec00