Linear Statistical Models: Regression

Generalized Linear Models

In this class we have looked at two different linear models, the OLS regression model and the logistic regression model. Consider the hsb2 dataset. We will create a binary version of write called honcomp by choosing a cutoff point of 60. We could then analyzed the two models using the commands:

These two models have several things in common. First, they each have a random component that follow certain probability distributions. The normal (gaussian) distribution is used with OLS regression and the binomial (bernoulli) distribution is used with logistic regression.

Second, they each have structural component, that is, the arrangement of continuous and coded categorical variables. In this particular example the structural components are identical in the two models.

The biggest difference is that in OLS regression we use the values of the response variable and in logistic regression we use a logit transform of the response variable. It seems as if these two models are just two instances of a more general model. They are and it is called the generalized linear model.

Generalized Linear Models

Generalized linear models take the form:

where F is the distribution family and g( ) is the link function.

You might recognize this example more easily if it were rewritten as follows:

Now we can replace yhat with E(y), A link function defines a transformation of the expected (predicted) values. In OLS the distribution family is normal (gaussian), i.e., y -> {gaussian} and the link function is identity, i.e., g(y) = y. Thus, we can write g(E(y)) as just E(y). While in logistic regression the distribution family is binomial and the link function is logit, g(y) = logit(y) = ln(y/(1-y)). In addition to the normal and binomial distribution families, GLM allows inverse normal, poisson, negative binomial and gamma distributions. And in addition to the idenitity and logit link functions, GLM allows log, probit, complementary log-log, power, negative binomial link functions. Not all combinations of distribution families and link functions make sense but overall generalized linear models allows for considerable flexibility and shows the relationship among many seemingly different models.


Linear Statistical Models Course

Phil Ender, 20feb01