Education 231C

Applies Categorical & Nonnormal Data Analysis

Generalized Linear Models

Generalized Linear Models

Most students are introduced to linear models through either multiple regression or analysis of variance. With these methods the expected value of the response variable is statistically modeled, that is, it is expressed as a linear combination of the explanatory variables. With categorical and count response variables, the regression cannot be linear. The problem of nonlinearity is handled through nonlinear functions that transform the expected value of the categorical or count variable into a linear function of the explanatory variables. Such transformations are referred to as link functions.

For example, in the analysis of count data, the expected frequencies must be nonnegative. To ensure that the predicted values from the linear models fit these constraints, the log link is used to transform the expected value of the response variable. This loglinear transformation serves two purposes: it ensures that the fitted values are appropriate for count data, and it permits the unknown regression parameters to lie within the real number space.

Different types of response variables utilize different link functions: both the logit and probit link functions work with binomial response variables while the log link function works with both poisson and negative binomial response variables. Growing out of the work of Nelder & Wedderburn (1972) and McCullagh & Nelder (1989), generalized linear models provides a unified framework which can be applied to various 'linear' models.

Generalized linear models take the form:

where F is the distribution family and g( ) is the link function.

You might recognize this example more easily if it were rewritten as follows:

Now we can replace Y' with E(y), In OLS the distribution family is gaussian (normal), i.e., y -> {gaussian} and the link function is identity, i.e., g(y) = y. Thus, we can write g(E(y)) as just E(y).

Another example is poisson regression in which the distribution family is poisson, i.e., y -> {poisson} and the link function is the natural log, i.e., g(y) = ln(y). The glm model would then be written as,

Here are examples of distributions and link functions for some common estimation procedures:

type of                   distribution   link
estimation                family         function
OLS regression            gaussian       identity
logistic regression       binomial       logit
probit                    binomial       probit
cloglog                   binomial       cloglog
poisson regression        poisson        log
neg binomial regression   neg binomial   log

Stata's GLM Procedure

Stata's glm procedure estimates generalized linear models in which the user can specify both the distribution family and the link function. Here is the basic syntax of the glm procedure: where fname can take on the values gaussian | igaussian | binomial | poisson | nbinomial | gamma
and lname can take on the values identity | log | logit | probit | cloglog | nbinomial |power | opower.

An OLS regression would look like this using regress and glm:

A logistic regression would look like this: A poisson regression would look like this: A negative binomial regression would look like this: Here is a list of the allowable distribution families: And here is a list of the link functions that are available: Of course, if all that glm could do was duplicate OLS, logistic, poisson and negative binomial regression that it would not appear to be very useful. However, it is possible to combine distribution families and link functions in ways that do not duplicate existing estimation procedures. The table below give the possible combinations that make sense from a data analysis perspective:

                   iden log logit probit cloglog nbinom power opower  loglog  logc
gaussian             X   X                                X
inverse gaussian     X   X                                X
binomial             X   X    X     X       X             X      X       X      X
poisson              X   X                                X
negative binomial    X   X                          X     X
gamma                X   X                                X


Categorical Data Analysis Course

Phil Ender