Ed231C: Zero-Inflated Count Models

Applied Categorical & Nonnormal Data Analysis

Zero-Inflated count Models

In many instances the number of zeros in a count model can be increased because some of the zeros are generated by a different process than the remaining counts. Using data on doctoral publications, as an example, while many scientists are actively involved in research and publication some have jobs in which research and publishing is not required or even possible.

We will illustrate zero inflated count models using Long's data on doctoral publications.

Zero-inflated Poisson

use http://www.gseis.ucla.edu/courses/data/couart

describe

Contains data from http://www.gseis.ucla.edu/courses/data/couart.dta
  obs:           915                          Scientific Productivity of Bioc
 vars:             7                          18 Oct 2001 22:21
 size:        18,300 (99.7% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
fem             byte   %9.0g       sexlbl     Sex: 1=female, 0=male.
ment            float  %9.0g                  Article by mentor in last 3 yrs
phd             float  %9.0g                  Prestige of PhD department.
mar             byte   %9.0g       marlbl     Married: 1=yes, 0=no.
kid5            byte   %9.0g                  Number of children <= 5.
art             byte   %9.0g                  Articles in last 3 yrs of PhD.
lnart           float  %9.0g                  Log of art + .5.
-------------------------------------------------------------------------------

summarize

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
         fem |     915    .4601093   .4986788          0          1
        ment |     915    8.767212   9.483915          0   76.99998
         phd |     915    3.103109   .9842491       .755       4.62
         mar |     915    .6622951    .473186          0          1
        kid5 |     915     .495082     .76488          0          3
         art |     915    1.692896   1.926069          0         19
       lnart |     915    .4399161   .8566493  -.6931472   2.970414

poisson art fem mar kid5 phd ment 
 
Poisson regression                                Number of obs   =        915
                                                  LR chi2(5)      =     183.03
                                                  Prob > chi2     =     0.0000
Log likelihood = -1651.0563                       Pseudo R2       =     0.0525

------------------------------------------------------------------------------
         art |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         fem |  -.2245942   .0546138    -4.11   0.000    -.3316352   -.1175532
         mar |   .1552434   .0613747     2.53   0.011     .0349512    .2755356
        kid5 |  -.1848827   .0401272    -4.61   0.000    -.2635305   -.1062349
         phd |   .0128226   .0263972     0.49   0.627     -.038915    .0645601
        ment |   .0255427   .0020061    12.73   0.000     .0216109    .0294746
       _cons |   .3046168   .1029822     2.96   0.003     .1027755    .5064581
------------------------------------------------------------------------------

quietly fitstat, saving(0)

zip art fem mar kid5 phd ment, inflate(fem mar kid5 phd ment) vuong

Zero-inflated poisson regression                  Number of obs   =        915
                                                  Nonzero obs     =        640
                                                  Zero obs        =        275

Inflation model = logit                           LR chi2(5)      =      78.56
Log likelihood  = -1604.773                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
         art |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
art          |
         fem |  -.2091446   .0634047    -3.30   0.001    -.3334155   -.0848737
         mar |    .103751    .071111     1.46   0.145     -.035624     .243126
        kid5 |  -.1433196   .0474293    -3.02   0.003    -.2362793   -.0503599
         phd |  -.0061662   .0310086    -0.20   0.842     -.066942    .0546096
        ment |   .0180977   .0022948     7.89   0.000     .0135999    .0225955
       _cons |   .6408391   .1213072     5.28   0.000     .4030814    .8785967
-------------+----------------------------------------------------------------
inflate      |
         fem |   .1097465   .2800813     0.39   0.695    -.4392028    .6586958
         mar |  -.3540108   .3176103    -1.11   0.265    -.9765156    .2684941
        kid5 |   .2171001    .196481     1.10   0.269    -.1679956    .6021958
         phd |   .0012702   .1452639     0.01   0.993    -.2834418    .2859821
        ment |   -.134111   .0452462    -2.96   0.003    -.2227918   -.0454302
       _cons |  -.5770618   .5093853    -1.13   0.257    -1.575439     .421315
------------------------------------------------------------------------------
Vuong Test of Zip vs. Poisson: Std. Normal  =     4.18   Pr> Z       =  0.0000

The vuong option is included to obtain a test of zip versus poisson, which in this case favors zip.

fitstat, using(0) force

Measures of Fit for zip of art

Warning: Current model estimated by zip, but saved model estimated by poisson

                             Current            Saved       Difference
Model:                           zip          poisson
N:                               915              915                0
Log-Lik Intercept Only:    -1679.391        -1742.573           63.182
Log-Lik Full Model:        -1604.773        -1651.056           46.283
D:                          3209.546(903)    3302.113(909)      92.567(6)
LR:                          149.236(10)      183.034(5)        33.798(5)
Prob > LR:                     0.000            0.000            0.000
McFadden's R2:                 0.044            0.053           -0.008
McFadden's Adj R2:             0.037            0.049           -0.012
Maximum Likelihood R2:         0.150            0.181           -0.031
Cragg & Uhler's R2:            0.154            0.185           -0.031
AIC:                           3.534            3.622           -0.088
AIC*n:                      3233.546         3314.113          -80.567
BIC:                       -2947.943        -2896.289          -51.653
BIC':                        -81.047         -148.940           67.892

Note: p-value for difference in LR is only valid if models are nested.

Zero-inflated Negative Binomial

nbreg art fem mar kid5 phd ment 

Negative binomial regression                      Number of obs   =        915
                                                  LR chi2(5)      =      97.96
                                                  Prob > chi2     =     0.0000
Log likelihood = -1560.9583                       Pseudo R2       =     0.0304

------------------------------------------------------------------------------
         art |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         fem |  -.2164184   .0726724    -2.98   0.003    -.3588537   -.0739832
         mar |   .1504895   .0821063     1.83   0.067    -.0104359    .3114148
        kid5 |  -.1764152   .0530598    -3.32   0.001    -.2804105     -.07242
         phd |   .0152712   .0360396     0.42   0.672    -.0553652    .0859075
        ment |   .0290823   .0034701     8.38   0.000     .0222811    .0358836
       _cons |    .256144   .1385604     1.85   0.065    -.0154294    .5277174
-------------+----------------------------------------------------------------
    /lnalpha |  -.8173044   .1199372                     -1.052377   -.5822318
-------------+----------------------------------------------------------------
       alpha |   .4416205   .0529667                      .3491069    .5586502
------------------------------------------------------------------------------
Likelihood ratio test of alpha=0:  chibar2(01) =  180.20 Prob>=chibar2 = 0.000

quietly fitstat, saving(0)

zinb art fem mar kid5 phd ment, inflate(fem mar kid5 phd ment) vuong zip

Zero-inflated negative binomial regression        Number of obs   =        915
                                                  Nonzero obs     =        640
                                                  Zero obs        =        275

Inflation model = logit                           LR chi2(5)      =      67.97
Log likelihood  = -1549.991                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
         art |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
art          |
         fem |  -.1955068   .0755926    -2.59   0.010    -.3436655   -.0473481
         mar |   .0975826    .084452     1.16   0.248    -.0679402    .2631054
        kid5 |  -.1517325    .054206    -2.80   0.005    -.2579744   -.0454906
         phd |  -.0007001   .0362696    -0.02   0.985    -.0717872    .0703869
        ment |   .0247862   .0034924     7.10   0.000     .0179412    .0316312
       _cons |   .4167466   .1435962     2.90   0.004     .1353032      .69819
-------------+----------------------------------------------------------------
inflate      |
         fem |   .6359327   .8489175     0.75   0.454    -1.027915    2.299781
         mar |  -1.499469     .93867    -1.60   0.110    -3.339228    .3402907
        kid5 |   .6284274   .4427825     1.42   0.156    -.2394104    1.496265
         phd |  -.0377153   .3080086    -0.12   0.903     -.641401    .5659705
        ment |  -.8822932   .3162277    -2.79   0.005    -1.502088   -.2624984
       _cons |  -.1916864   1.322821    -0.14   0.885    -2.784368    2.400995
-------------+----------------------------------------------------------------
    /lnalpha |  -.9763565   .1354679    -7.21   0.000    -1.241869   -.7108443
-------------+----------------------------------------------------------------
       alpha |    .376681   .0510282                       .288844    .4912293
------------------------------------------------------------------------------
Likelihood ratio test of alpha=0: chibar2(01) =   109.56 Pr>=chibar2 =  0.0000
Vuong Test of Zinb vs. Neg. Bin: Std. Normal  =     2.24 Pr> Z       =  0.0125

fitstat, using(0) force

Measures of Fit for zinb of art

Warning: Current model estimated by zinb, but saved model estimated by nbreg

                             Current            Saved       Difference
Model:                          zinb            nbreg
N:                               915              915                0
Log-Lik Intercept Only:    -1609.937        -1609.937           -0.000
Log-Lik Full Model:        -1549.991        -1560.958           10.967
D:                          3099.982(902)    3121.917(908)      21.935(6)
LR:                          119.892(11)       97.957(5)        21.935(6)
Prob > LR:                     0.000            0.000            0.001
McFadden's R2:                 0.037            0.030            0.007
McFadden's Adj R2:             0.029            0.026            0.003
Maximum Likelihood R2:         0.123            0.102            0.021
Cragg & Uhler's R2:            0.127            0.105            0.022
AIC:                           3.416            3.427           -0.011
AIC*n:                      3125.982         3135.917           -9.935
BIC:                       -3050.688        -3069.666           18.979
BIC':                        -44.884          -63.862           18.979

Difference of   18.979 in BIC' provides very strong support for saved model.

Note: p-value for difference in LR is only valid if models are nested.

We have included the vuong and zip options. zip requests that a likelihood-ratio test comparing zinb with zip be included. The results indicate that zinb is the better choice. vuong was used to obtain a test of the zinb versus nbreg models. In general, Vuong test that are significantly positive support the zero-inflated models, while those that are significantly negative favor nonzero-inflated models. The Vuong test above supports the use of a zero-inflated approach.

Let's try again and see if we can improve our model by removing some non-significant variables.

zinb art fem mar kid5 ment, inflate(ment) vuong 

Zero-inflated negative binomial regression        Number of obs   =        915
                                                  Nonzero obs     =        640
                                                  Zero obs        =        275

Inflation model = logit                           LR chi2(4)      =      71.91
Log likelihood  = -1553.273                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
         art |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
art          |
         fem |  -.2119365   .0719188    -2.95   0.003    -.3528948   -.0709782
         mar |   .1389895   .0807376     1.72   0.085    -.0192532    .2972323
        kid5 |  -.1676594   .0524524    -3.20   0.001    -.2704641   -.0648546
        ment |    .024431   .0034497     7.08   0.000     .0176696    .0311923
       _cons |   .4101993   .0863877     4.75   0.000     .2408825    .5795161
-------------+----------------------------------------------------------------
inflate      |
        ment |  -.6096804   .2456692    -2.48   0.013    -1.091183   -.1281775
       _cons |  -.8053801   .3520712    -2.29   0.022    -1.495427   -.1153333
-------------+----------------------------------------------------------------
    /lnalpha |  -1.003111   .1427915    -7.03   0.000    -1.282977   -.7232447
-------------+----------------------------------------------------------------
       alpha |   .3667368   .0523669                      .2772108    .4851755
------------------------------------------------------------------------------
Vuong Test of Zinb vs. Neg. Bin: Std. Normal  =     1.88 Pr> Z       =  0.0299

fitstat, using(0) force

Measures of Fit for zinb of art

Warning: Current model estimated by zinb, but saved model estimated by nbreg

                             Current            Saved       Difference
Model:                          zinb            nbreg
N:                               915              915                0
Log-Lik Intercept Only:    -1609.937        -1609.937           -0.000
Log-Lik Full Model:        -1553.273        -1560.958            7.686
D:                          3106.545(907)    3121.917(908)      15.371(1)
LR:                          113.328(6)        97.957(5)        15.371(1)
Prob > LR:                     0.000            0.000            0.000
McFadden's R2:                 0.035            0.030            0.005
McFadden's Adj R2:             0.030            0.026            0.004
Maximum Likelihood R2:         0.116            0.102            0.015
Cragg & Uhler's R2:            0.120            0.105            0.015
AIC:                           3.413            3.427           -0.015
AIC*n:                      3122.545         3135.917          -13.371
BIC:                       -3078.219        -3069.666           -8.552
BIC':                        -72.415          -63.862           -8.552

Difference of    8.552 in BIC' provides strong support for current model.

Note: p-value for difference in LR is only valid if models are nested.

Categorical Data Analysis Course

Phil Ender