Negative Binomial Models
Negative binomial regression is used to estimate count models when the poisson estimation is inappropriate due to overdispersion (which is most of the time). In a poisson distribution the mean and variance are equal. When the variance is greater than the mean the distribution is said to display overdispersion. The nbreg command estimation includes an ancillary parameter α which is an estimate of the degree of overdispersion. For computational purposes, Stata estimates lnα which is then converted to α. When α is zero, negative binomial has the same distribution as poisson. The larger α is the greater the amount of overdispersion in the data.
When there is overdispersion the poisson estimates are inefficient with standard errors biased downward yielding spuriously large z-values.
The negative binomial distribution is given by
Negative Binomial Example
We will continue with the lahigh dataset.
use http://www.gseis.ucla.edu/courses/data/lahigh nbreg daysabs gender langnce Negative binomial regression Number of obs = 316 LR chi2(2) = 20.63 Prob > chi2 = 0.0000 Log likelihood = -880.9274 Pseudo R2 = 0.0116 ------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | -.4312069 .1396913 -3.09 0.002 -.7049968 -.1574169 langnce | -.0156493 .0039485 -3.96 0.000 -.0233882 -.0079104 _cons | 2.70344 .2292762 11.79 0.000 2.254067 3.152813 -------------+---------------------------------------------------------------- /lnalpha | .25394 .095509 .0667457 .4411342 -------------+---------------------------------------------------------------- alpha | 1.289094 .1231201 1.069024 1.554469 ------------------------------------------------------------------------------ Likelihood ratio test of alpha=0: chibar2(01) = 1337.86 Prob>=chibar2 = 0.000 nbreg, irr Negative binomial regression Number of obs = 316 LR chi2(2) = 20.63 Prob > chi2 = 0.0000 Log likelihood = -880.9274 Pseudo R2 = 0.0116 ------------------------------------------------------------------------------ daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | .6497245 .0907609 -3.09 0.002 .4941101 .8543478 langnce | .9844725 .0038872 -3.96 0.000 .9768832 .9921208 -------------+---------------------------------------------------------------- /lnalpha | .25394 .095509 .0667457 .4411342 -------------+---------------------------------------------------------------- alpha | 1.289094 .1231201 1.069024 1.554469 ------------------------------------------------------------------------------ Likelihood ratio test of alpha=0: chibar2(01) = 1337.86 Prob>=chibar2 = 0.000 listcoef nbreg (N=316): Factor Change in Expected Count Observed SD: 7.4490028 ------------------------------------------------------------------ daysabs | b z P>|z| e^b e^bStdX SDofX ---------+-------------------------------------------------------- gender | -0.43121 -3.087 0.002 0.6497 0.8058 0.5006 langnce | -0.01565 -3.963 0.000 0.9845 0.7552 17.9392 ---------+-------------------------------------------------------- ln alpha | 0.25394 2.659 ------------------------------------------------------------------ listcoef, percent nbreg (N=316): Percentage Change in Expected Count Observed SD: 7.4490028 ---------------------------------------------------------------------- daysabs | b z P>|z| % %StdX SDofX -------------+-------------------------------------------------------- gender | -0.43121 -3.087 0.002 -35.0 -19.4 0.5006 langnce | -0.01565 -3.963 0.000 -1.6 -24.5 17.9392 -------------+-------------------------------------------------------- ln alpha | 0.25394 alpha | 1.28909 SE(alpha) = 0.12312 ---------------------------------------------------------------------- LR test of alpha=0: 1337.86 Prob>=LRX2 = 0.000 ---------------------------------------------------------------------- mfx compute Marginal effects after nbreg y = predicted number of events (predict) = 5.5280363 ------------------------------------------------------------------------------ variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------- gender*| -2.389162 .79423 -3.01 0.003 -3.94582 -.832503 .487342 langnce | -.0865098 .02241 -3.86 0.000 -.130442 -.042578 50.0638 ------------------------------------------------------------------------------ (*) dy/dx is for discrete change of dummy variable from 0 to 1 prchange nbreg: Changes in Predicted Rate for daysabs min->max 0->1 -+1/2 -+sd/2 MargEfct gender -2.3892 -2.3892 -2.4022 -1.1957 -2.3837 langnce -9.3413 -0.1879 -0.0865 -1.5570 -0.0865 exp(xb): 5.5280 gender langnce x= .487342 50.0638 sd(x)= .500633 17.9392 prtab gender nbreg: Predicted rates for daysabs ---------------------- gender | Prediction ----------+----------- female | 6.8208 male | 4.4316 ---------------------- gender langnce x= .48734177 50.063794 prtab langnce nbreg: Predicted rates for daysabs ---------------------- ctbs lang | nce | Prediction ----------+----------- 1.007114 | 11.9119 6.748048 | 10.8883 10.39049 | 10.2850 13.13055 | 9.8533 15.35938 | 9.5156 17.25647 | 9.2372 20.40919 | 8.7926 21.7637 | 8.6081 23.01052 | 8.4418 24.16932 | 8.2901 25.25478 | 8.1505 26.2782 | 8.0210 27.24847 | 7.9001 28.17271 | 7.7867 29.05672 | 7.6797 29.90528 | 7.5784 30.72241 | 7.4821 31.5115 | 7.3903 32.27546 | 7.3024 33.01677 | 7.2182 34.43988 | 7.0592 35.12527 | 6.9839 35.79525 | 6.9111 36.45115 | 6.8405 37.09416 | 6.7720 37.72536 | 6.7054 38.34572 | 6.6407 38.95612 | 6.5775 39.55739 | 6.5159 40.15026 | 6.4558 40.73543 | 6.3969 41.31353 | 6.3393 41.88515 | 6.2828 42.45086 | 6.2275 43.01117 | 6.1731 43.56657 | 6.1197 44.11754 | 6.0671 44.66451 | 6.0154 45.2079 | 5.9645 45.74812 | 5.9143 46.28556 | 5.8647 46.82059 | 5.8158 47.35357 | 5.7675 47.88486 | 5.7198 48.41482 | 5.6725 48.94376 | 5.6258 49.47205 | 5.5795 50 | 5.5336 50.52795 | 5.4880 51.05624 | 5.4428 51.58518 | 5.3980 52.11514 | 5.3534 52.64643 | 5.3091 53.17941 | 5.2650 53.71444 | 5.2211 54.25188 | 5.1773 54.7921 | 5.1338 55.33549 | 5.0903 55.88246 | 5.0469 56.43343 | 5.0036 56.98883 | 4.9603 57.54914 | 4.9170 58.11485 | 4.8736 58.68647 | 4.8302 59.26457 | 4.7867 59.84974 | 4.7431 60.44261 | 4.6993 61.04388 | 4.6553 62.27464 | 4.5665 63.54885 | 4.4763 64.20476 | 4.4306 64.87473 | 4.3844 65.56011 | 4.3376 66.26239 | 4.2902 66.98323 | 4.2421 67.72454 | 4.1932 68.48849 | 4.1433 69.27759 | 4.0925 70.09472 | 4.0405 70.94328 | 3.9872 71.82729 | 3.9324 73.72179 | 3.8175 74.74522 | 3.7569 78.2363 | 3.5571 79.59081 | 3.4825 81.08016 | 3.4023 82.74353 | 3.3149 84.64062 | 3.2179 86.86945 | 3.1076 89.60951 | 2.9772 93.25195 | 2.8122 98.99289 | 2.5706 ---------------------- gender langnce x= .48734177 50.063794 prtab langnce gender nbreg: Predicted rates for daysabs ---------------------------- ctbs lang | gender nce | female male ----------+----------------- 1.007114 | 14.6975 9.5493 6.748048 | 13.4347 8.7288 10.39049 | 12.6903 8.2452 13.13055 | 12.1576 7.8991 15.35938 | 11.7409 7.6283 17.25647 | 11.3974 7.4052 20.40919 | 10.8488 7.0487 21.7637 | 10.6212 6.9009 23.01052 | 10.4160 6.7675 24.16932 | 10.2288 6.6459 25.25478 | 10.0565 6.5340 26.2782 | 9.8967 6.4302 27.24847 | 9.7476 6.3333 28.17271 | 9.6076 6.2423 29.05672 | 9.4756 6.1565 29.90528 | 9.3506 6.0753 30.72241 | 9.2318 5.9981 31.5115 | 9.1185 5.9245 32.27546 | 9.0102 5.8541 33.01677 | 8.9062 5.7866 34.43988 | 8.7101 5.6592 35.12527 | 8.6172 5.5988 35.79525 | 8.5273 5.5404 36.45115 | 8.4402 5.4838 37.09416 | 8.3557 5.4289 37.72536 | 8.2736 5.3755 38.34572 | 8.1936 5.3236 38.95612 | 8.1157 5.2730 39.55739 | 8.0397 5.2236 40.15026 | 7.9655 5.1754 40.73543 | 7.8929 5.1282 41.31353 | 7.8218 5.0820 41.88515 | 7.7521 5.0367 42.45086 | 7.6838 4.9924 43.01117 | 7.6167 4.9488 43.56657 | 7.5508 4.9059 44.11754 | 7.4860 4.8638 44.66451 | 7.4222 4.8224 45.2079 | 7.3593 4.7815 45.74812 | 7.2974 4.7413 46.28556 | 7.2363 4.7016 46.82059 | 7.1759 4.6624 47.35357 | 7.1163 4.6236 47.88486 | 7.0574 4.5854 48.41482 | 6.9991 4.5475 48.94376 | 6.9414 4.5100 49.47205 | 6.8843 4.4729 50 | 6.8276 4.4361 50.52795 | 6.7714 4.3996 51.05624 | 6.7157 4.3633 51.58518 | 6.6603 4.3274 52.11514 | 6.6053 4.2916 52.64643 | 6.5506 4.2561 53.17941 | 6.4962 4.2208 53.71444 | 6.4421 4.1856 54.25188 | 6.3881 4.1505 54.7921 | 6.3343 4.1156 55.33549 | 6.2807 4.0807 55.88246 | 6.2272 4.0459 56.43343 | 6.1737 4.0112 56.98883 | 6.1203 3.9765 57.54914 | 6.0668 3.9418 58.11485 | 6.0134 3.9070 58.68647 | 5.9598 3.8722 59.26457 | 5.9061 3.8374 59.84974 | 5.8523 3.8024 60.44261 | 5.7983 3.7673 61.04388 | 5.7440 3.7320 62.27464 | 5.6344 3.6608 63.54885 | 5.5231 3.5885 64.20476 | 5.4667 3.5519 64.87473 | 5.4097 3.5148 65.56011 | 5.3520 3.4773 66.26239 | 5.2935 3.4393 66.98323 | 5.2341 3.4007 67.72454 | 5.1738 3.3615 68.48849 | 5.1123 3.3216 69.27759 | 5.0495 3.2808 70.09472 | 4.9854 3.2391 70.94328 | 4.9196 3.1964 71.82729 | 4.8520 3.1525 73.72179 | 4.7103 3.0604 74.74522 | 4.6354 3.0118 78.2363 | 4.3890 2.8516 79.59081 | 4.2969 2.7918 81.08016 | 4.1979 2.7275 82.74353 | 4.0901 2.6574 84.64062 | 3.9704 2.5797 86.86945 | 3.8343 2.4913 89.60951 | 3.6734 2.3867 93.25195 | 3.4699 2.2545 98.99289 | 3.1717 2.0607 ---------------------------- gender langnce x= .48734177 50.063794Generalized Negative Binomial
It is possible to estimate a generalized version of the negative binomial model. The gnbreg command allows lnα to be modeled as a function of one or more variables.
nbreg daysabs gender langnce if school==1, nolog Negative binomial regression Number of obs = 159 LR chi2(2) = 11.63 Prob > chi2 = 0.0030 Log likelihood = -495.81829 Pseudo R2 = 0.0116 ------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | -.5643986 .1653818 -3.41 0.001 -.888541 -.2402562 langnce | -.0061867 .0047883 -1.29 0.196 -.0155717 .0031982 _cons | 2.627209 .2509819 10.47 0.000 2.135294 3.119125 -------------+---------------------------------------------------------------- /lnalpha | -.0919676 .1319768 -.3506374 .1667022 -------------+---------------------------------------------------------------- alpha | .9121347 .1203806 .7042391 1.181402 ------------------------------------------------------------------------------ Likelihood-ratio test of alpha=0: chibar2(01) = 680.70 Prob>=chibar2 = 0.000 nbreg daysabs gender langnce if school==2, nolog Negative binomial regression Number of obs = 157 LR chi2(2) = 2.56 Prob > chi2 = 0.2778 Log likelihood = -367.07632 Pseudo R2 = 0.0035 ------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | -.2016785 .2224413 -0.91 0.365 -.6376554 .2342985 langnce | -.0102065 .0067666 -1.51 0.131 -.0234687 .0030558 _cons | 1.898195 .4318886 4.40 0.000 1.051709 2.744681 -------------+---------------------------------------------------------------- /lnalpha | .4011526 .1464837 .1140498 .6882554 -------------+---------------------------------------------------------------- alpha | 1.493545 .21878 1.120808 1.99024 ------------------------------------------------------------------------------ Likelihood-ratio test of alpha=0: chibar2(01) = 449.37 Prob>=chibar2 = 0.000 gnbreg daysabs gender langnce, lnalpha(school) Generalized negative binomial regression Number of obs = 316 LR chi2(2) = 20.22 Prob > chi2 = 0.0000 Log likelihood = -876.90377 Pseudo R2 = 0.0114 ------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- daysabs | gender | -.4731664 .1386795 -3.41 0.001 -.7449732 -.2013596 langnce | -.0142358 .003965 -3.59 0.000 -.022007 -.0064646 _cons | 2.733251 .2213494 12.35 0.000 2.299414 3.167088 -------------+---------------------------------------------------------------- lnalpha | school | .5881709 .2058519 2.86 0.004 .1847085 .9916333 _cons | -.6092282 .3159621 -1.93 0.054 -1.228502 .0100461 ------------------------------------------------------------------------------ gnbreg, irr Generalized negative binomial regression Number of obs = 316 LR chi2(2) = 20.22 Prob > chi2 = 0.0000 Log likelihood = -876.90377 Pseudo R2 = 0.0114 ------------------------------------------------------------------------------ daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | .6230264 .086401 -3.41 0.001 .474747 .8176184 langnce | .9858651 .0039089 -3.59 0.000 .9782334 .9935563 -------------+---------------------------------------------------------------- lnalpha | (type gnbreg to see ln(alpha) coefficient estimates) ------------------------------------------------------------------------------ display -2*-876.90377 1753.8075 mfx compute Marginal effects after gnbreg y = predicted number of events (predict) = 5.9892103 ------------------------------------------------------------------------------ variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------- gender*| -2.843323 .87245 -3.26 0.001 -4.5533 -1.13335 .487342 langnce | -.0852611 .02344 -3.64 0.000 -.131207 -.039316 50.0638 school | (no effect) 1.49684 ------------------------------------------------------------------------------ (*) dy/dx is for discrete change of dummy variable from 0 to 1 gnbreg daysabs gender langnce school, lnalpha(school) nolog Generalized negative binomial regression Number of obs = 316 LR chi2(3) = 45.88 Prob > chi2 = 0.0000 Log likelihood = -864.07066 Pseudo R2 = 0.0259 ------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- daysabs | gender | -.4321288 .1333284 -3.24 0.001 -.6934476 -.17081 langnce | -.0076396 .0039562 -1.93 0.053 -.0153937 .0001144 school | -.7655276 .1437565 -5.33 0.000 -1.047285 -.4837701 _cons | 3.387622 .2465595 13.74 0.000 2.904374 3.87087 -------------+---------------------------------------------------------------- lnalpha | school | .5002206 .1980169 2.53 0.012 .1121145 .8883267 _cons | -.5857683 .3032156 -1.93 0.053 -1.18006 .0085234 ------------------------------------------------------------------------------ gnbreg, irr Generalized negative binomial regression Number of obs = 316 LR chi2(3) = 45.88 Prob > chi2 = 0.0000 Log likelihood = -864.07066 Pseudo R2 = 0.0259 ------------------------------------------------------------------------------ daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | .6491258 .0865469 -3.24 0.001 .4998498 .8429818 langnce | .9923895 .0039261 -1.93 0.053 .9847242 1.000114 school | .4650885 .0668595 -5.33 0.000 .3508891 .6164549 -------------+---------------------------------------------------------------- lnalpha | (type gnbreg to see ln(alpha) coefficient estimates) ------------------------------------------------------------------------------ display -2*-864.07066 1728.1413 display -2*(-876.90377-(-864.07066)) 25.66622
Categorical Data Analysis Course
Phil Ender