In this class we have looked at two different linear models, the OLS regression model and the logistic regression model. Consider the hsb2 dataset. We will create a binary version of write called honcomp by choosing a cutoff point of 60. We could then analyzed the two models using the commands:
regress write read math female logit honcomp read math femaleThese two models have several things in common. First, they each have a random component that follow certain probability distributions. The normal (gaussian) distribution is used with OLS regression and the binomial (bernoulli) distribution is used with logistic regression.
Second, they each have structural component, that is, the arrangement of continuous and coded categorical variables. In this particular example the structural components are identical in the two models.
The biggest difference is that in OLS regression we use the values of the response variable and in logistic regression we use a logit transform of the response variable. It seems as if these two models are just two instances of a more general model. They are and it is called the generalized linear model.
Generalized Linear Models
Generalized linear models take the form:
You might recognize this example more easily if it were rewritten as follows:
glm write read math female, link(iden) fam(gauss)While in logistic regression the distribution family is binomial and the link function is logit, g(y) = logit(y) = ln(y/(1-y)).
glm honcomp read math female, link(logit) fam(bin)In addition to the normal and binomial distribution families, GLM allows inverse normal, poisson, negative binomial and gamma distributions. And in addition to the idenitity and logit link functions, GLM allows log, probit, complementary log-log, power, negative binomial link functions. Not all combinations of distribution families and link functions make sense but overall generalized linear models allows for considerable flexibility and shows the relationship among many seemingly different models.
Example
use http://www.philender.com/courses/data/hsbdemo, clear regress write read math female Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 72.52 Model | 9405.34864 3 3135.11621 Prob > F = 0.0000 Residual | 8473.52636 196 43.2322773 R-squared = 0.5261 -------------+------------------------------ Adj R-squared = 0.5188 Total | 17878.875 199 89.843593 Root MSE = 6.5751 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .3252389 .0607348 5.36 0.000 .2054613 .4450166 math | .3974826 .0664037 5.99 0.000 .266525 .5284401 female | 5.44337 .9349987 5.82 0.000 3.59942 7.287319 _cons | 11.89566 2.862845 4.16 0.000 6.249728 17.5416 ------------------------------------------------------------------------------ glm write read math female, link(iden) fam(gauss) Generalized linear models No. of obs = 200 Optimization : ML: Newton-Raphson Residual df = 196 Scale param = 43.23228 Deviance = 8473.526357 (1/df) Deviance = 43.23228 Pearson = 8473.526357 (1/df) Pearson = 43.23228 Variance function: V(u) = 1 [Gaussian] Link function : g(u) = u [Identity] Standard errors : OIM Log likelihood = -658.4261736 AIC = 6.624262 BIC = 8452.333087 ------------------------------------------------------------------------------ write | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .3252389 .0607348 5.36 0.000 .2062009 .444277 math | .3974826 .0664037 5.99 0.000 .2673336 .5276315 female | 5.44337 .9349987 5.82 0.000 3.610806 7.275934 _cons | 11.89566 2.862845 4.16 0.000 6.28459 17.50674 ------------------------------------------------------------------------------ logit honors read math female Logit estimates Number of obs = 200 LR chi2(3) = 80.87 Prob > chi2 = 0.0000 Log likelihood = -75.209827 Pseudo R2 = 0.3496 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .0752424 .027577 2.73 0.006 .0211924 .1292924 math | .1317117 .0324607 4.06 0.000 .06809 .1953335 female | 1.154801 .4340856 2.66 0.008 .304009 2.005593 _cons | -13.12749 1.850769 -7.09 0.000 -16.75493 -9.50005 ------------------------------------------------------------------------------ glm honors read math female, link(logit) fam(bin) Generalized linear models No. of obs = 200 Optimization : ML: Newton-Raphson Residual df = 196 Scale param = 1 Deviance = 150.4196543 (1/df) Deviance = .7674472 Pearson = 164.2509104 (1/df) Pearson = .8380148 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u/(1-u)) [Logit] Standard errors : OIM Log likelihood = -75.20982717 AIC = .7920983 BIC = 129.2263849 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .0752424 .0275779 2.73 0.006 .0211906 .1292941 math | .1317117 .0324623 4.06 0.000 .0680869 .1953366 female | 1.154801 .4341012 2.66 0.008 .3039785 2.005624 _cons | -13.12749 1.850893 -7.09 0.000 -16.75517 -9.499808 ------------------------------------------------------------------------------