Generalized Linear Models
Most students are introduced to linear models through either multiple regression or analysis of variance. With these methods the expected value of the response variable is statistically modeled, that is, it is expressed as a linear combination of the explanatory variables. With categorical and count response variables, the regression cannot be linear. The problem of nonlinearity is handled through nonlinear functions that transform the expected value of the categorical or count variable into a linear function of the explanatory variables. Such transformations are referred to as link functions.
For example, in the analysis of count data, the expected frequencies must be nonnegative. To ensure that the predicted values from the linear models fit these constraints, the log link is used to transform the expected value of the response variable. This loglinear transformation serves two purposes: it ensures that the fitted values are appropriate for count data, and it permits the unknown regression parameters to lie within the real number space.
Different types of response variables utilize different link functions: both the logit and probit link functions work with binomial response variables while the log link function works with both poisson and negative binomial response variables. Growing out of the work of Nelder & Wedderburn (1972) and McCullagh & Nelder (1989), generalized linear models provides a unified framework which can be applied to various 'linear' models.
Generalized linear models take the form:
You might recognize this example more easily if it were rewritten as follows:
Another example is poisson regression in which the distribution family is poisson, i.e., y -> {poisson} and the link function is the natural log, i.e., g(y) = ln(y). The glm model would then be written as,
Here are examples of distributions and link functions for some common estimation procedures:
type of distribution link estimation family function OLS regression gaussian identity logistic regression binomial logit probit binomial probit cloglog binomial cloglog poisson regression poisson log neg binomial regression neg binomial log |
An OLS regression would look like this using regress and glm:
iden log logit probit cloglog nbinom power opower loglog logc gaussian X X X inverse gaussian X X X binomial X X X X X X X X X poisson X X X negative binomial X X X X gamma X X X |
Examples
use http://www.gseis.ucla.edu/courses/data/hsb2 generate hon = write>=60 regress write read math female Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 72.52 Model | 9405.34864 3 3135.11621 Prob > F = 0.0000 Residual | 8473.52636 196 43.2322773 R-squared = 0.5261 -------------+------------------------------ Adj R-squared = 0.5188 Total | 17878.875 199 89.843593 Root MSE = 6.5751 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .3252389 .0607348 5.36 0.000 .2054613 .4450166 math | .3974826 .0664037 5.99 0.000 .266525 .5284401 female | 5.44337 .9349987 5.82 0.000 3.59942 7.287319 _cons | 11.89566 2.862845 4.16 0.000 6.249728 17.5416 ------------------------------------------------------------------------------ glm write read math female, link(iden) fam(gauss) nolog Generalized linear models No. of obs = 200 Optimization : ML: Newton-Raphson Residual df = 196 Scale parameter = 43.23228 Deviance = 8473.526357 (1/df) Deviance = 43.23228 Pearson = 8473.526357 (1/df) Pearson = 43.23228 Variance function: V(u) = 1 [Gaussian] Link function : g(u) = u [Identity] Standard errors : OIM Log likelihood = -658.4261736 AIC = 6.624262 BIC = 7435.056153 ------------------------------------------------------------------------------ write | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .3252389 .0607348 5.36 0.000 .2062009 .444277 math | .3974826 .0664037 5.99 0.000 .2673336 .5276315 female | 5.44337 .9349987 5.82 0.000 3.610806 7.275934 _cons | 11.89566 2.862845 4.16 0.000 6.28459 17.50674 ------------------------------------------------------------------------------ logit hon read math female, nolog Logit estimates Number of obs = 200 LR chi2(3) = 80.87 Prob > chi2 = 0.0000 Log likelihood = -75.209827 Pseudo R2 = 0.3496 ------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .0752424 .027577 2.73 0.006 .0211924 .1292924 math | .1317117 .0324607 4.06 0.000 .06809 .1953335 female | 1.154801 .4340856 2.66 0.008 .304009 2.005593 _cons | -13.12749 1.850769 -7.09 0.000 -16.75493 -9.50005 ------------------------------------------------------------------------------ logit, or Logit estimates Number of obs = 200 LR chi2(3) = 80.87 Prob > chi2 = 0.0000 Log likelihood = -75.209827 Pseudo R2 = 0.3496 ------------------------------------------------------------------------------ hon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | 1.078145 .0297321 2.73 0.006 1.021419 1.138023 math | 1.140779 .0370305 4.06 0.000 1.070462 1.215716 female | 3.173393 1.377524 2.66 0.008 1.355281 7.430502 ------------------------------------------------------------------------------ glm hon read math female, link(logit) fam(bin) nolog Generalized linear models No. of obs = 200 Optimization : ML: Newton-Raphson Residual df = 196 Scale parameter = 1 Deviance = 150.4196543 (1/df) Deviance = .7674472 Pearson = 164.2509104 (1/df) Pearson = .8380148 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u/(1-u)) [Logit] Standard errors : OIM Log likelihood = -75.20982717 AIC = .7920983 BIC = -888.0505495 ------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .0752424 .0275779 2.73 0.006 .0211906 .1292941 math | .1317117 .0324623 4.06 0.000 .0680869 .1953366 female | 1.154801 .4341012 2.66 0.008 .3039785 2.005624 _cons | -13.12749 1.850893 -7.09 0.000 -16.75517 -9.499808 ------------------------------------------------------------------------------ glm, eform Generalized linear models No. of obs = 200 Optimization : ML: Newton-Raphson Residual df = 196 Scale parameter = 1 Deviance = 150.4196543 (1/df) Deviance = .7674472 Pearson = 164.2509104 (1/df) Pearson = .8380148 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u/(1-u)) [Logit] Standard errors : OIM Log likelihood = -75.20982717 AIC = .7920983 BIC = -888.0505495 ------------------------------------------------------------------------------ hon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | 1.078145 .029733 2.73 0.006 1.021417 1.138025 math | 1.140779 .0370323 4.06 0.000 1.070458 1.21572 female | 3.173393 1.377573 2.66 0.008 1.35524 7.430728 ------------------------------------------------------------------------------ probit hon read math female, nolog Probit estimates Number of obs = 200 LR chi2(3) = 81.80 Prob > chi2 = 0.0000 Log likelihood = -74.745943 Pseudo R2 = 0.3537 ------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .0473262 .0157561 3.00 0.003 .0164449 .0782076 math | .0735256 .0173216 4.24 0.000 .0395759 .1074754 female | .6824682 .2447275 2.79 0.005 .2028112 1.162125 _cons | -7.663304 .9921289 -7.72 0.000 -9.607841 -5.718767 ------------------------------------------------------------------------------ glm hon read math female, link(probit) fam(bin) nolog Generalized linear models No. of obs = 200 Optimization : ML: Newton-Raphson Residual df = 196 Scale parameter = 1 Deviance = 149.4918859 (1/df) Deviance = .7627137 Pearson = 160.9679286 (1/df) Pearson = .8212649 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = invnorm(u) [Probit] Standard errors : OIM Log likelihood = -74.74594294 AIC = .7874594 BIC = -888.978318 ------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .0473262 .0157561 3.00 0.003 .0164448 .0782077 math | .0735256 .0173217 4.24 0.000 .0395758 .1074755 female | .6824681 .2447281 2.79 0.005 .2028098 1.162126 _cons | -7.663303 .9921345 -7.72 0.000 -9.607851 -5.718755 ------------------------------------------------------------------------------ use http://www.gseis.ucla.edu/courses/data/lahigh, clear poisson daysabs langnce gender, nolog Poisson regression Number of obs = 316 LR chi2(2) = 171.50 Prob > chi2 = 0.0000 Log likelihood = -1549.8567 Pseudo R2 = 0.0524 ------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- langnce | -.01467 .0012934 -11.34 0.000 -.0172051 -.0121349 gender | -.4093528 .0482192 -8.49 0.000 -.5038606 -.3148449 _cons | 2.646977 .0697764 37.94 0.000 2.510217 2.783736 ------------------------------------------------------------------------------ poisson, irr Poisson regression Number of obs = 316 LR chi2(2) = 171.50 Prob > chi2 = 0.0000 Log likelihood = -1549.8567 Pseudo R2 = 0.0524 ------------------------------------------------------------------------------ daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- langnce | .9854371 .0012746 -11.34 0.000 .982942 .9879384 gender | .6640799 .0320214 -8.49 0.000 .6041936 .7299021 ------------------------------------------------------------------------------ glm daysabs langnce gender, link(log) fam(poisson) nolog Generalized linear models No. of obs = 316 Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1 Deviance = 2238.317597 (1/df) Deviance = 7.151174 Pearson = 2752.913231 (1/df) Pearson = 8.79525 Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] Standard errors : OIM Log likelihood = -1549.85665 AIC = 9.828207 BIC = 436.7702841 ------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- langnce | -.01467 .0012934 -11.34 0.000 -.0172051 -.0121349 gender | -.4093528 .0482192 -8.49 0.000 -.5038606 -.3148449 _cons | 2.646977 .0697764 37.94 0.000 2.510217 2.783736 ------------------------------------------------------------------------------ glm, eform Generalized linear models No. of obs = 316 Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1 Deviance = 2238.317597 (1/df) Deviance = 7.151174 Pearson = 2752.913231 (1/df) Pearson = 8.79525 Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] Standard errors : OIM Log likelihood = -1549.85665 AIC = 9.828207 BIC = 436.7702841 ------------------------------------------------------------------------------ daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- langnce | .9854371 .0012746 -11.34 0.000 .982942 .9879384 gender | .6640799 .0320214 -8.49 0.000 .6041936 .7299021 ------------------------------------------------------------------------------ nbreg daysabs langnce gender, nolog Negative binomial regression Number of obs = 316 LR chi2(2) = 20.63 Prob > chi2 = 0.0000 Log likelihood = -880.9274 Pseudo R2 = 0.0116 ------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- langnce | -.0156493 .0039485 -3.96 0.000 -.0233882 -.0079104 gender | -.4312069 .1396913 -3.09 0.002 -.7049968 -.1574169 _cons | 2.70344 .2292762 11.79 0.000 2.254067 3.152813 -------------+---------------------------------------------------------------- /lnalpha | .25394 .095509 .0667457 .4411342 -------------+---------------------------------------------------------------- alpha | 1.289094 .1231201 1.069024 1.554469 ------------------------------------------------------------------------------ Likelihood ratio test of alpha=0: chibar2(01) = 1337.86 Prob>=chibar2 = 0.000 glm daysabs langnce gender, link(log) fam(nbin) nolog Generalized linear models No. of obs = 316 Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1 Deviance = 425.603464 (1/df) Deviance = 1.359755 Pearson = 415.6288036 (1/df) Pearson = 1.327888 Variance function: V(u) = u+(1)u^2 [Neg. Binomial] Link function : g(u) = ln(u) [Log] Standard errors : OIM Log likelihood = -884.4953535 AIC = 5.617059 BIC = -1375.943849 ------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- langnce | -.0156357 .0035438 -4.41 0.000 -.0225814 -.0086899 gender | -.4307736 .1253082 -3.44 0.001 -.6763732 -.185174 _cons | 2.702606 .2052709 13.17 0.000 2.300282 3.104929 ------------------------------------------------------------------------------ glm, eform Generalized linear models No. of obs = 316 Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1 Deviance = 425.603464 (1/df) Deviance = 1.359755 Pearson = 415.6288036 (1/df) Pearson = 1.327888 Variance function: V(u) = u+(1)u^2 [Neg. Binomial] Link function : g(u) = ln(u) [Log] Standard errors : OIM Log likelihood = -884.4953535 AIC = 5.617059 BIC = -1375.943849 ------------------------------------------------------------------------------ daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- langnce | .9844859 .0034888 -4.41 0.000 .9776716 .9913477 gender | .650006 .0814511 -3.44 0.001 .5084577 .8309596 ------------------------------------------------------------------------------ glm daysabs langnce gender, fam(gamma) link(log) nolog Generalized linear models No. of obs = 316 Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1.583724 Deviance = 251.8270233 (1/df) Deviance = .8045592 Pearson = 495.7055497 (1/df) Pearson = 1.583724 Variance function: V(u) = u^2 [Gamma] Link function : g(u) = ln(u) [Log] Standard errors : OIM Log likelihood = -856.2487643 AIC = 5.438283 BIC = -1549.72029 ------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- langnce | -.0156852 .0040626 -3.86 0.000 -.0236478 -.0077226 gender | -.4326492 .1443719 -3.00 0.003 -.7156129 -.1496854 _cons | 2.705757 .2383799 11.35 0.000 2.238541 3.172973 ------------------------------------------------------------------------------ glm, eform Generalized linear models No. of obs = 316 Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1.583724 Deviance = 251.8270233 (1/df) Deviance = .8045592 Pearson = 495.7055497 (1/df) Pearson = 1.583724 Variance function: V(u) = u^2 [Gamma] Link function : g(u) = ln(u) [Log] Standard errors : OIM Log likelihood = -856.2487643 AIC = 5.438283 BIC = -1549.72029 ------------------------------------------------------------------------------ daysabs | ExpB Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- langnce | .9844372 .0039994 -3.86 0.000 .9766296 .9923071 gender | .6487881 .0936668 -3.00 0.003 .4888924 .8609788 ------------------------------------------------------------------------------
Categorical Data Analysis Course
Phil Ender