Education 231C

Applied Categorical & Nonnormal Data Analysis

Regression with Measurement Error

As you will most likely recall, one of the assumptions of regression is that the predictor variables are measured without error. The problem is that measurement error in predictor variables in OLS regression leads to under estimation of the regression coefficients. Errors-in-variables regression models are useful when one or more of the independent variables are measured with error. One can adjust for the biases if one knows the reliability of the variable,

The model we wish to estimate is where X* are the true values and the X are the observed values. The estimates b of b are obtained by S is a diagonal matrix with elements N(1-ri)si2, where the ri are the reliability coefficients.

Stata's eivreg command uses user-specified relibility coefficents to compute the S matrix which, in turn, takes measurement error into account when estimating the coefficients for the model.

Let's look at a regression using the hsb2 dataset.

The predictor read is a standardized test score. Every test has measurement error. We don't know the exact reliability of read, but using .9 for the reliability would probably not be far off. We will now estimate the same regression model with the Stata eivreg command, which stands for errors-in-variables regression.

Note that the F-ratio and the R2 increased along with the regression coefficient for read. Additionally, there is an increase in the standard error for read.

Now, let's try a model with read, math and socst as predictors. First, we will run a standard OLS regression.

Now, let's try to account for the measurement error by using the following reliabilities: read - .9, math - .9, socst - .8.

Note that the overall F and R2 went up, but that the coefficient for read is no longer statistically significant.

Categorical Data Analysis Course

Phil Ender