In many instances the number of zeros in a count model can be increased because some of the zeros are generated by a different process than the remaining counts. Using data on doctoral publications, as an example, while many scientists are actively involved in research and publication some have jobs in which research and publishing is not required or even possible.
We will illustrate zero inflated count models using Long's data on doctoral publications.
Zero-inflated Poisson
use http://www.gseis.ucla.edu/courses/data/couart describe Contains data from http://www.gseis.ucla.edu/courses/data/couart.dta obs: 915 Scientific Productivity of Bioc vars: 7 18 Oct 2001 22:21 size: 18,300 (99.7% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- fem byte %9.0g sexlbl Sex: 1=female, 0=male. ment float %9.0g Article by mentor in last 3 yrs phd float %9.0g Prestige of PhD department. mar byte %9.0g marlbl Married: 1=yes, 0=no. kid5 byte %9.0g Number of children <= 5. art byte %9.0g Articles in last 3 yrs of PhD. lnart float %9.0g Log of art + .5. ------------------------------------------------------------------------------- summarize Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- fem | 915 .4601093 .4986788 0 1 ment | 915 8.767212 9.483915 0 76.99998 phd | 915 3.103109 .9842491 .755 4.62 mar | 915 .6622951 .473186 0 1 kid5 | 915 .495082 .76488 0 3 art | 915 1.692896 1.926069 0 19 lnart | 915 .4399161 .8566493 -.6931472 2.970414 poisson art fem mar kid5 phd ment Poisson regression Number of obs = 915 LR chi2(5) = 183.03 Prob > chi2 = 0.0000 Log likelihood = -1651.0563 Pseudo R2 = 0.0525 ------------------------------------------------------------------------------ art | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- fem | -.2245942 .0546138 -4.11 0.000 -.3316352 -.1175532 mar | .1552434 .0613747 2.53 0.011 .0349512 .2755356 kid5 | -.1848827 .0401272 -4.61 0.000 -.2635305 -.1062349 phd | .0128226 .0263972 0.49 0.627 -.038915 .0645601 ment | .0255427 .0020061 12.73 0.000 .0216109 .0294746 _cons | .3046168 .1029822 2.96 0.003 .1027755 .5064581 ------------------------------------------------------------------------------ quietly fitstat, saving(0) zip art fem mar kid5 phd ment, inflate(fem mar kid5 phd ment) vuong Zero-inflated poisson regression Number of obs = 915 Nonzero obs = 640 Zero obs = 275 Inflation model = logit LR chi2(5) = 78.56 Log likelihood = -1604.773 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ art | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- art | fem | -.2091446 .0634047 -3.30 0.001 -.3334155 -.0848737 mar | .103751 .071111 1.46 0.145 -.035624 .243126 kid5 | -.1433196 .0474293 -3.02 0.003 -.2362793 -.0503599 phd | -.0061662 .0310086 -0.20 0.842 -.066942 .0546096 ment | .0180977 .0022948 7.89 0.000 .0135999 .0225955 _cons | .6408391 .1213072 5.28 0.000 .4030814 .8785967 -------------+---------------------------------------------------------------- inflate | fem | .1097465 .2800813 0.39 0.695 -.4392028 .6586958 mar | -.3540108 .3176103 -1.11 0.265 -.9765156 .2684941 kid5 | .2171001 .196481 1.10 0.269 -.1679956 .6021958 phd | .0012702 .1452639 0.01 0.993 -.2834418 .2859821 ment | -.134111 .0452462 -2.96 0.003 -.2227918 -.0454302 _cons | -.5770618 .5093853 -1.13 0.257 -1.575439 .421315 ------------------------------------------------------------------------------ Vuong Test of Zip vs. Poisson: Std. Normal = 4.18 Pr> Z = 0.0000The vuong option is included to obtain a test of zip versus poisson, which in this case favors zip.
fitstat, using(0) force Measures of Fit for zip of art Warning: Current model estimated by zip, but saved model estimated by poisson Current Saved Difference Model: zip poisson N: 915 915 0 Log-Lik Intercept Only: -1679.391 -1742.573 63.182 Log-Lik Full Model: -1604.773 -1651.056 46.283 D: 3209.546(903) 3302.113(909) 92.567(6) LR: 149.236(10) 183.034(5) 33.798(5) Prob > LR: 0.000 0.000 0.000 McFadden's R2: 0.044 0.053 -0.008 McFadden's Adj R2: 0.037 0.049 -0.012 Maximum Likelihood R2: 0.150 0.181 -0.031 Cragg & Uhler's R2: 0.154 0.185 -0.031 AIC: 3.534 3.622 -0.088 AIC*n: 3233.546 3314.113 -80.567 BIC: -2947.943 -2896.289 -51.653 BIC': -81.047 -148.940 67.892 Note: p-value for difference in LR is only valid if models are nested.
Zero-inflated Negative Binomial
nbreg art fem mar kid5 phd ment Negative binomial regression Number of obs = 915 LR chi2(5) = 97.96 Prob > chi2 = 0.0000 Log likelihood = -1560.9583 Pseudo R2 = 0.0304 ------------------------------------------------------------------------------ art | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- fem | -.2164184 .0726724 -2.98 0.003 -.3588537 -.0739832 mar | .1504895 .0821063 1.83 0.067 -.0104359 .3114148 kid5 | -.1764152 .0530598 -3.32 0.001 -.2804105 -.07242 phd | .0152712 .0360396 0.42 0.672 -.0553652 .0859075 ment | .0290823 .0034701 8.38 0.000 .0222811 .0358836 _cons | .256144 .1385604 1.85 0.065 -.0154294 .5277174 -------------+---------------------------------------------------------------- /lnalpha | -.8173044 .1199372 -1.052377 -.5822318 -------------+---------------------------------------------------------------- alpha | .4416205 .0529667 .3491069 .5586502 ------------------------------------------------------------------------------ Likelihood ratio test of alpha=0: chibar2(01) = 180.20 Prob>=chibar2 = 0.000 quietly fitstat, saving(0) zinb art fem mar kid5 phd ment, inflate(fem mar kid5 phd ment) vuong zip Zero-inflated negative binomial regression Number of obs = 915 Nonzero obs = 640 Zero obs = 275 Inflation model = logit LR chi2(5) = 67.97 Log likelihood = -1549.991 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ art | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- art | fem | -.1955068 .0755926 -2.59 0.010 -.3436655 -.0473481 mar | .0975826 .084452 1.16 0.248 -.0679402 .2631054 kid5 | -.1517325 .054206 -2.80 0.005 -.2579744 -.0454906 phd | -.0007001 .0362696 -0.02 0.985 -.0717872 .0703869 ment | .0247862 .0034924 7.10 0.000 .0179412 .0316312 _cons | .4167466 .1435962 2.90 0.004 .1353032 .69819 -------------+---------------------------------------------------------------- inflate | fem | .6359327 .8489175 0.75 0.454 -1.027915 2.299781 mar | -1.499469 .93867 -1.60 0.110 -3.339228 .3402907 kid5 | .6284274 .4427825 1.42 0.156 -.2394104 1.496265 phd | -.0377153 .3080086 -0.12 0.903 -.641401 .5659705 ment | -.8822932 .3162277 -2.79 0.005 -1.502088 -.2624984 _cons | -.1916864 1.322821 -0.14 0.885 -2.784368 2.400995 -------------+---------------------------------------------------------------- /lnalpha | -.9763565 .1354679 -7.21 0.000 -1.241869 -.7108443 -------------+---------------------------------------------------------------- alpha | .376681 .0510282 .288844 .4912293 ------------------------------------------------------------------------------ Likelihood ratio test of alpha=0: chibar2(01) = 109.56 Pr>=chibar2 = 0.0000 Vuong Test of Zinb vs. Neg. Bin: Std. Normal = 2.24 Pr> Z = 0.0125 fitstat, using(0) force Measures of Fit for zinb of art Warning: Current model estimated by zinb, but saved model estimated by nbreg Current Saved Difference Model: zinb nbreg N: 915 915 0 Log-Lik Intercept Only: -1609.937 -1609.937 -0.000 Log-Lik Full Model: -1549.991 -1560.958 10.967 D: 3099.982(902) 3121.917(908) 21.935(6) LR: 119.892(11) 97.957(5) 21.935(6) Prob > LR: 0.000 0.000 0.001 McFadden's R2: 0.037 0.030 0.007 McFadden's Adj R2: 0.029 0.026 0.003 Maximum Likelihood R2: 0.123 0.102 0.021 Cragg & Uhler's R2: 0.127 0.105 0.022 AIC: 3.416 3.427 -0.011 AIC*n: 3125.982 3135.917 -9.935 BIC: -3050.688 -3069.666 18.979 BIC': -44.884 -63.862 18.979 Difference of 18.979 in BIC' provides very strong support for saved model. Note: p-value for difference in LR is only valid if models are nested.We have included the vuong and zip options. zip requests that a likelihood-ratio test comparing zinb with zip be included. The results indicate that zinb is the better choice. vuong was used to obtain a test of the zinb versus nbreg models. In general, Vuong test that are significantly positive support the zero-inflated models, while those that are significantly negative favor nonzero-inflated models. The Vuong test above supports the use of a zero-inflated approach.
Let's try again and see if we can improve our model by removing some non-significant variables.
zinb art fem mar kid5 ment, inflate(ment) vuong Zero-inflated negative binomial regression Number of obs = 915 Nonzero obs = 640 Zero obs = 275 Inflation model = logit LR chi2(4) = 71.91 Log likelihood = -1553.273 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ art | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- art | fem | -.2119365 .0719188 -2.95 0.003 -.3528948 -.0709782 mar | .1389895 .0807376 1.72 0.085 -.0192532 .2972323 kid5 | -.1676594 .0524524 -3.20 0.001 -.2704641 -.0648546 ment | .024431 .0034497 7.08 0.000 .0176696 .0311923 _cons | .4101993 .0863877 4.75 0.000 .2408825 .5795161 -------------+---------------------------------------------------------------- inflate | ment | -.6096804 .2456692 -2.48 0.013 -1.091183 -.1281775 _cons | -.8053801 .3520712 -2.29 0.022 -1.495427 -.1153333 -------------+---------------------------------------------------------------- /lnalpha | -1.003111 .1427915 -7.03 0.000 -1.282977 -.7232447 -------------+---------------------------------------------------------------- alpha | .3667368 .0523669 .2772108 .4851755 ------------------------------------------------------------------------------ Vuong Test of Zinb vs. Neg. Bin: Std. Normal = 1.88 Pr> Z = 0.0299 fitstat, using(0) force Measures of Fit for zinb of art Warning: Current model estimated by zinb, but saved model estimated by nbreg Current Saved Difference Model: zinb nbreg N: 915 915 0 Log-Lik Intercept Only: -1609.937 -1609.937 -0.000 Log-Lik Full Model: -1553.273 -1560.958 7.686 D: 3106.545(907) 3121.917(908) 15.371(1) LR: 113.328(6) 97.957(5) 15.371(1) Prob > LR: 0.000 0.000 0.000 McFadden's R2: 0.035 0.030 0.005 McFadden's Adj R2: 0.030 0.026 0.004 Maximum Likelihood R2: 0.116 0.102 0.015 Cragg & Uhler's R2: 0.120 0.105 0.015 AIC: 3.413 3.427 -0.015 AIC*n: 3122.545 3135.917 -13.371 BIC: -3078.219 -3069.666 -8.552 BIC': -72.415 -63.862 -8.552 Difference of 8.552 in BIC' provides strong support for current model. Note: p-value for difference in LR is only valid if models are nested.