In this unit we will encounter the opposite situation from the zero-inflated models, we will look at data that have no zeros, the so called zero-truncated models. If one tries to use standard poisson or negative binomial analysis with these kinds of data the procedures try to fit the models by including probabilities for zero values. One should be able to produce more accurate models by using a probability model that does not include the zero values.
We will illustrate zero-truncated count models examining length of hospital stay (los) from the 1997 MedPar dataset. Length of stay does not and cannot have any zero values. Length of stay begins with a value of one and grows from there.
Stata 9 introduced two new commands ztp for zero-truncated poisson and ztnb for zero-truncated negative binomial. We will use both of these commands in this unit.
Note: The commands trpois0 and trnbin0 ado's and the medpar dataset were taken from a Stata Technical article (STB-47, January 1999) by Joseph Hilbe of Arizona State University can be used with Stata 8 and below.
Looking at the Data
The response variable in this example is length of hospital stay. With length of hospital stay, regardless of how little time is spent in the hospital, patients are credited as having at least one day.
use http://www.gseis.ucla.edu/courses/data/medpar, clear describe Contains data from medpar.dta obs: 1,495 vars: 10 30 Jun 1998 13:10 size: 43,355 (98.6% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- provnum str6 %9s Provider number died float %9.0g white float %9.0g hmo byte %9.0g HMO/readmit' los int %9.0g Length of Stay age80 float %9.0g age byte %9.0g Age Group type1 byte %8.0g type== 1.0000 type2 byte %8.0g type== 2.0000 type3 byte %8.0g type== 3.0000 ------------------------------------------------------------------------------- summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- provnum | 0 died | 1495 .3431438 .4749179 0 1 white | 1495 .9150502 .2789003 0 1 hmo | 1495 .1598662 .3666046 0 1 los | 1495 9.854181 8.832906 1 116 -------------+-------------------------------------------------------- age80 | 1495 .2207358 .4148815 0 1 age | 1495 5.235452 1.668898 1 9 type1 | 1495 .7585284 .4281187 0 1 type2 | 1495 .1772575 .3820143 0 1 type3 | 1495 .064214 .2452159 0 1 tabstat los, stat(n mean sd var) variable | N mean sd variance -------------+---------------------------------------- los | 1495 9.854181 8.832906 78.02022 ------------------------------------------------------ /* note: mean and variance are very different */ tabulate los Length of | Stay | Freq. Percent Cum. ------------+----------------------------------- 1 | 126 8.43 8.43 2 | 71 4.75 13.18 3 | 75 5.02 18.19 4 | 104 6.96 25.15 5 | 123 8.23 33.38 6 | 97 6.49 39.87 7 | 116 7.76 47.63 8 | 92 6.15 53.78 9 | 74 4.95 58.73 10 | 89 5.95 64.68 11 | 70 4.68 69.36 12 | 70 4.68 74.05 13 | 43 2.88 76.92 14 | 49 3.28 80.20 15 | 41 2.74 82.94 16 | 43 2.88 85.82 17 | 29 1.94 87.76 18 | 23 1.54 89.30 19 | 24 1.61 90.90 20 | 19 1.27 92.17 21 | 18 1.20 93.38 22 | 15 1.00 94.38 23 | 10 0.67 95.05 24 | 11 0.74 95.79 25 | 4 0.27 96.05 26 | 7 0.47 96.52 27 | 7 0.47 96.99 28 | 5 0.33 97.32 29 | 3 0.20 97.53 30 | 1 0.07 97.59 31 | 2 0.13 97.73 32 | 6 0.40 98.13 33 | 2 0.13 98.26 34 | 5 0.33 98.60 36 | 1 0.07 98.66 42 | 1 0.07 98.73 43 | 1 0.07 98.80 44 | 2 0.13 98.93 46 | 3 0.20 99.13 48 | 1 0.07 99.20 49 | 1 0.07 99.26 50 | 1 0.07 99.33 52 | 1 0.07 99.40 57 | 1 0.07 99.46 59 | 1 0.07 99.53 60 | 1 0.07 99.60 63 | 1 0.07 99.67 65 | 1 0.07 99.73 70 | 1 0.07 99.80 74 | 1 0.07 99.87 91 | 1 0.07 99.93 116 | 1 0.07 100.00 ------------+----------------------------------- Total | 1,495 100.00 nbvargr los, n(15) Obtaining Parameter Estimates (36 observations deleted) here Negative Binomial Probabilities with mean = 9.854181 & overdispersion = .4902339 +------------------------------+ | k nbprob nbcum | |------------------------------| 1. | 0 0.02741744 0.02741744 | 2. | 1 0.04633566 0.07375310 | 3. | 2 0.05834830 0.13210140 | 4. | 3 0.06509732 0.19719872 | 5. | 4 0.06795350 0.26515222 | |------------------------------| 6. | 5 0.06800788 0.33316010 | 7. | 6 0.06610931 0.39926943 | 8. | 7 0.06290771 0.46217713 | 9. | 8 0.05889338 0.52107054 | 10. | 9 0.05443054 0.57550102 | |------------------------------| 11. | 10 0.04978486 0.62528592 | 12. | 11 0.04514578 0.67043167 | 13. | 12 0.04064433 0.71107602 | 14. | 13 0.03636726 0.74744326 | 15. | 14 0.03236813 0.77981138 | |------------------------------| 16. | 15 0.02867597 0.80848736 | +------------------------------+ k was int now float Poisson Probabilities for lambda = 9.854181 +------------------------------+ | k pprob pcum | |------------------------------| 1. | 0 0.00005253 0.00005253 | 2. | 1 0.00051761 0.00057014 | 3. | 2 0.00255032 0.00312046 | 4. | 3 0.00837710 0.01149756 | 5. | 4 0.02063738 0.03213494 | |------------------------------| 6. | 5 0.04067289 0.07280783 | 7. | 6 0.06679966 0.13960749 | 8. | 7 0.09403657 0.23364405 | 9. | 8 0.11583167 0.34947574 | 10. | 9 0.12682514 0.47630087 | |------------------------------| 11. | 10 0.12497579 0.60127664 | 12. | 11 0.11195764 0.71323431 | 13. | 12 0.09193757 0.80517185 | 14. | 13 0.06968996 0.87486184 | 15. | 14 0.04905268 0.92391449 | |------------------------------| 16. | 15 0.03222493 0.95613945 | +------------------------------+ (1 observation deleted)Tricking Stata
Its clear from the nbvargr that neither the poisson and negative binomial distributions fit the observed data very well. Also, the negative binomial distribution expects that there will be some (approximately 40) zero values.
We will run standard poisson and negative binomial regressions and then we will trick Stata by subtracting one from the value of length of stay and rerunning these models.
poisson los died hmo type2 type3, nolog cluster(provnum) Poisson regression Number of obs = 1495 Wald chi2(4) = 30.71 Log pseudolikelihood = -6846.9485 Prob > chi2 = 0.0000 (Std. Err. adjusted for 54 clusters in provnum) ------------------------------------------------------------------------------ | Robust los | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- died | -.2483158 .0633446 -3.92 0.000 -.3724689 -.1241627 hmo | -.0753708 .0502589 -1.50 0.134 -.1738764 .0231348 type2 | .2498558 .0646699 3.86 0.000 .1231051 .3766066 type3 | .7501452 .2184939 3.43 0.001 .3219049 1.178385 _cons | 2.264575 .0335312 67.54 0.000 2.198855 2.330295 ------------------------------------------------------------------------------ /* compute aic */ display (-2*-6846.948518+2*4)/1495 9.1651485 nbreg los died hmo type2 type3, nolog cluster(provnum) Negative binomial regression Number of obs = 1495 Dispersion = mean Wald chi2(4) = 36.13 Log pseudolikelihood = -4782.5989 Prob > chi2 = 0.0000 (Std. Err. adjusted for 54 clusters in provnum) ------------------------------------------------------------------------------ | Robust los | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- died | -.236979 .0573431 -4.13 0.000 -.3493694 -.1245886 hmo | -.0705928 .049645 -1.42 0.155 -.1678953 .0267097 type2 | .2532097 .0634972 3.99 0.000 .1287575 .3776619 type3 | .7365274 .2115372 3.48 0.000 .3219221 1.151133 _cons | 2.260834 .0327839 68.96 0.000 2.196578 2.325089 -------------+---------------------------------------------------------------- /lnalpha | -.8318959 .0634521 -.9562597 -.707532 -------------+---------------------------------------------------------------- alpha | .4352234 .0276158 .3843277 .4928591 ------------------------------------------------------------------------------ /* compute aic */ display (-2*-4782.5989+2*4)/1495 6.4034768 /* create new variable with zero */ generate newlos = los - 1 histogram newlos, discrete nbvargr newlos, n(15) Obtaining Parameter Estimates (36 observations deleted) here Negative Binomial Probabilities with mean = 8.854181 & overdispersion = .7120889 +------------------------------+ | k nbprob nbcum | |------------------------------| 1. | 0 0.06126391 0.06126391 | 2. | 1 0.07425659 0.13552050 | 3. | 2 0.07704805 0.21256854 | 4. | 3 0.07546319 0.28803173 | 5. | 4 0.07171640 0.35974813 | |------------------------------| 6. | 5 0.06690429 0.42665243 | 7. | 6 0.06163682 0.48828924 | 8. | 7 0.05627193 0.54456115 | 9. | 8 0.05102334 0.59558451 | 10. | 9 0.04601700 0.64160150 | |------------------------------| 11. | 10 0.04132344 0.68292499 | 12. | 11 0.03697751 0.71990246 | 13. | 12 0.03299088 0.75289333 | 14. | 13 0.02936026 0.78225362 | 15. | 14 0.02607288 0.80832648 | |------------------------------| 16. | 15 0.02311026 0.83143675 | +------------------------------+ Poisson Probabilities for lambda = 8.854181 +------------------------------+ | k pprob pcum | |------------------------------| 1. | 0 0.00014278 0.00014278 | 2. | 1 0.00126423 0.00140701 | 3. | 2 0.00559686 0.00700388 | 4. | 3 0.01651855 0.02352243 | 5. | 4 0.03656456 0.06008700 | |------------------------------| 6. | 5 0.06474985 0.12483685 | 7. | 6 0.09555116 0.22038800 | 8. | 7 0.12086103 0.34124902 | 9. | 8 0.13376568 0.47501472 | 10. | 9 0.13159840 0.60661310 | |------------------------------| 11. | 10 0.11651960 0.72313273 | 12. | 11 0.09378960 0.81692231 | 13. | 12 0.06920251 0.88612479 | 14. | 13 0.04713320 0.93325800 | 15. | 14 0.02980899 0.96306700 | |------------------------------| 16. | 15 0.01759561 0.98066264 | +------------------------------+ (0 observations deleted) poisson newlos died hmo type2 type3, nolog cluster(provnum) Poisson regression Number of obs = 1495 Wald chi2(4) = 31.42 Log pseudolikelihood = -7229.6375 Prob > chi2 = 0.0000 (Std. Err. adjusted for 54 clusters in provnum) ------------------------------------------------------------------------------ | Robust newlos | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- died | -.277442 .0703259 -3.95 0.000 -.4152782 -.1396057 hmo | -.0849026 .056734 -1.50 0.135 -.1960993 .0262941 type2 | .2778412 .071253 3.90 0.000 .1381879 .4174945 type3 | .8166476 .2318683 3.52 0.000 .3621941 1.271101 _cons | 2.153754 .0372609 57.80 0.000 2.080724 2.226784 ------------------------------------------------------------------------------ /* compute aic */ display (-2*-7229.6375+2*4)/1495 9.677107 nbreg newlos died hmo type2 type3, nolog cluster(provnum) Negative binomial regression Number of obs = 1495 Dispersion = mean Wald chi2(4) = 37.00 Log pseudolikelihood = -4742.6087 Prob > chi2 = 0.0000 (Std. Err. adjusted for 54 clusters in provnum) ------------------------------------------------------------------------------ | Robust newlos | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- died | -.2650532 .0644419 -4.11 0.000 -.3913571 -.1387494 hmo | -.0793184 .0563 -1.41 0.159 -.1896643 .0310275 type2 | .2826808 .069884 4.05 0.000 .1457107 .4196509 type3 | .8011306 .224282 3.57 0.000 .361546 1.240715 _cons | 2.149526 .0365384 58.83 0.000 2.077912 2.22114 -------------+---------------------------------------------------------------- /lnalpha | -.448078 .0559217 -.5576824 -.3384736 -------------+---------------------------------------------------------------- alpha | .6388549 .0357258 .5725344 .7128576 ------------------------------------------------------------------------------ /* compute aic */ display (-2*-4742.6087+2*4)/1495 6.3499782 /* Summary Table variable model log likelihood aic los poisson -6846.9485 9.1651485 los nbreg -4782.5989 6.4034768 newlos poisson -7229.6375 9.677107 newlos nbreg -4742.6087 6.3499782 */The negative binomial regression with the trick is only slightly better and the poisson regression with the trick is actually worse.
Zero-truncated Poisson
We will begin the zero-truncated models with a zero-truncated poisson regression even though it is unlikely that a poisson distribution will be appropriate for these data since the mean and variance of los are nowhere near equal.
ztp los died hmo type2 type3, nolog cluster(provnum) Zero-truncated Poisson regression Number of obs = 1495 Wald chi2(4) = 30.68 Log pseudolikelihood = -6846.6528 Prob > chi2 = 0.0000 (Std. Err. adjusted for 54 clusters in provnum) ------------------------------------------------------------------------------ | Robust los | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- died | -.248681 .0634856 -3.92 0.000 -.3731105 -.1242514 hmo | -.0755112 .0503728 -1.50 0.134 -.1742401 .0232177 type2 | .2500681 .0647042 3.86 0.000 .1232501 .376886 type3 | .7503999 .2185408 3.43 0.001 .3220678 1.178732 _cons | 2.264474 .0335532 67.49 0.000 2.198711 2.330237 ------------------------------------------------------------------------------ /* compute aic */ display (-2*-6846.6528+2*4)/1495 9.1647529 ztp, irr Zero-truncated Poisson regression Number of obs = 1495 Wald chi2(4) = 30.68 Log pseudolikelihood = -6846.6528 Prob > chi2 = 0.0000 (Std. Err. adjusted for 54 clusters in provnum) ------------------------------------------------------------------------------ | Robust los | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- died | .7798287 .0495079 -3.92 0.000 .6885891 .8831577 hmo | .9272693 .0467091 -1.50 0.134 .8400952 1.023489 type2 | 1.284113 .0830875 3.86 0.000 1.131167 1.457738 type3 | 2.117847 .462836 3.43 0.001 1.379978 3.250251 ------------------------------------------------------------------------------Zero-truncated Negative Binomial
ztnb los died hmo type2 type3, nolog cluster(provnum) Zero-truncated negative binomial regression Number of obs = 1495 Dispersion = mean Wald chi2(4) = 36.01 Log likelihood = -4737.535 Prob > chi2 = 0.0000 (Std. Err. adjusted for 54 clusters in provnum) ------------------------------------------------------------------------------ | Robust los | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- died | -.2521884 .061533 -4.10 0.000 -.3727908 -.1315859 hmo | -.0754173 .0533132 -1.41 0.157 -.1799091 .0290746 type2 | .2685095 .0666474 4.03 0.000 .137883 .3991359 type3 | .7668101 .2183505 3.51 0.000 .338851 1.194769 _cons | 2.224028 .034727 64.04 0.000 2.155964 2.292091 -------------+---------------------------------------------------------------- /lnalpha | -.630108 .0764019 -.779853 -.480363 -------------+---------------------------------------------------------------- alpha | .5325343 .0406866 .4584734 .6185588 ------------------------------------------------------------------------------ /* compute aic */ display (-2*-4782.5989+2*4)/1495 6.4034768 ztnb, irr Zero-truncated negative binomial regression Number of obs = 1495 Dispersion = mean Wald chi2(4) = 36.01 Log likelihood = -4737.535 Prob > chi2 = 0.0000 (Std. Err. adjusted for 54 clusters in provnum) ------------------------------------------------------------------------------ | Robust los | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- died | .7770984 .0478172 -4.10 0.000 .6888093 .8767039 hmo | .9273564 .0494403 -1.41 0.157 .8353461 1.029501 type2 | 1.308013 .0871756 4.03 0.000 1.147841 1.490536 type3 | 2.152888 .4700841 3.51 0.000 1.403334 3.302795 -------------+---------------------------------------------------------------- /lnalpha | -.630108 .0764019 -.779853 -.480363 -------------+---------------------------------------------------------------- alpha | .5325343 .0406866 .4584734 .6185588 ------------------------------------------------------------------------------ predict plos tablist los plos, sort(v) clean los plos Freq 1 6.662014 18 1 7.183877 70 1 8.572936 2 1 8.714004 2 1 9.244488 13 1 9.396606 8 1 12.09191 7 1 15.46608 4 1 19.90235 2 2 6.662014 7 2 7.183877 22 2 8.572936 3 2 9.244488 22 2 9.396606 5 2 12.09191 6 2 15.46608 5 2 19.90235 1 3 6.662014 3 3 7.183877 17 3 8.572936 9 3 8.714004 2 3 9.244488 33 3 9.396606 5 3 11.21351 1 3 12.09191 2 3 15.46608 3 4 6.662014 5 4 7.183877 15 4 8.572936 11 4 8.714004 1 4 9.244488 50 4 9.396606 9 4 11.21351 1 4 12.09191 8 4 15.46608 2 4 19.90235 2 5 6.662014 2 5 7.183877 19 5 8.572936 16 5 9.244488 61 5 9.396606 5 5 11.21351 3 5 12.09191 9 5 14.34257 1 5 15.46608 5 5 18.45657 1 5 19.90235 1 6 6.662014 3 6 7.183877 10 6 8.572936 11 6 9.244488 50 6 9.396606 6 6 11.21351 2 6 12.09191 11 6 15.46608 1 6 19.90235 3 7 6.662014 3 7 7.183877 20 7 8.572936 16 7 8.714004 1 7 9.244488 54 7 9.396606 10 7 11.21351 2 7 12.09191 8 7 15.46608 1 7 19.90235 1 8 6.662014 3 8 7.183877 18 8 8.572936 8 8 8.714004 1 8 9.244488 49 8 9.396606 4 8 12.09191 7 8 15.46608 1 8 19.90235 1 9 6.662014 3 9 7.183877 5 9 8.572936 15 9 9.244488 34 9 9.396606 7 9 12.09191 6 9 15.46608 1 9 19.90235 3 10 6.662014 3 10 7.183877 18 10 8.572936 2 10 9.244488 53 10 9.396606 2 10 12.09191 7 10 15.46608 1 10 19.90235 3 11 6.662014 3 11 7.183877 10 11 8.572936 9 11 9.244488 32 11 9.396606 1 11 11.21351 1 11 12.09191 8 11 15.46608 2 11 19.90235 4 12 6.662014 2 12 7.183877 10 12 8.572936 6 12 9.244488 35 12 9.396606 3 12 11.21351 2 12 12.09191 10 12 19.90235 2 13 6.662014 3 13 7.183877 6 13 8.572936 5 13 9.244488 19 13 9.396606 2 13 11.21351 1 13 12.09191 6 13 15.46608 1 14 6.662014 6 14 7.183877 9 14 8.572936 3 14 9.244488 19 14 9.396606 3 14 11.21351 2 14 12.09191 3 14 15.46608 1 14 19.90235 3 15 6.662014 1 15 7.183877 6 15 8.572936 2 15 9.244488 18 15 9.396606 3 15 12.09191 8 15 19.90235 3 16 7.183877 8 16 8.572936 2 16 9.244488 15 16 9.396606 1 16 11.21351 2 16 12.09191 12 16 15.46608 2 16 19.90235 1 17 6.662014 1 17 7.183877 6 17 8.572936 3 17 9.244488 11 17 9.396606 4 17 15.46608 2 17 18.45657 1 17 19.90235 1 18 6.662014 1 18 7.183877 3 18 8.572936 3 18 8.714004 1 18 9.244488 13 18 12.09191 1 18 19.90235 1 19 6.662014 1 19 7.183877 2 19 8.572936 3 19 8.714004 2 19 9.244488 8 19 12.09191 4 19 15.46608 4 20 6.662014 1 20 7.183877 4 20 9.244488 9 20 9.396606 2 20 12.09191 3 21 7.183877 3 21 8.572936 1 21 9.244488 8 21 9.396606 2 21 11.21351 1 21 12.09191 3 22 7.183877 2 22 8.572936 1 22 8.714004 1 22 9.244488 4 22 9.396606 1 22 12.09191 2 22 15.46608 3 22 19.90235 1 23 7.183877 1 23 8.572936 2 23 9.244488 6 23 9.396606 1 24 7.183877 3 24 9.244488 5 24 9.396606 2 24 19.90235 1 25 7.183877 2 25 9.244488 2 26 7.183877 2 26 9.244488 2 26 12.09191 2 26 19.90235 1 27 7.183877 1 27 9.244488 1 27 9.396606 1 27 11.21351 1 27 12.09191 1 27 19.90235 2 28 9.244488 1 28 9.396606 1 28 12.09191 1 28 15.46608 2 29 8.572936 1 29 9.244488 1 29 19.90235 1 30 9.396606 1 31 8.572936 1 31 9.244488 1 32 7.183877 1 32 9.244488 2 32 9.396606 1 32 12.09191 1 32 19.90235 1 33 9.244488 1 33 9.396606 1 34 9.244488 1 34 9.396606 2 34 11.21351 1 34 12.09191 1 36 6.662014 1 42 19.90235 1 43 12.09191 1 44 12.09191 2 46 9.244488 1 46 19.90235 2 48 19.90235 1 49 15.46608 1 50 7.183877 1 52 19.90235 1 57 19.90235 1 59 19.90235 1 60 9.244488 1 63 12.09191 1 65 19.90235 1 70 15.46608 1 74 19.90235 1 91 15.46608 1 116 19.90235 1 tab plos predicted | number of | events | Freq. Percent Cum. ------------+----------------------------------- 6.662014 | 70 4.68 4.68 7.183877 | 294 19.67 24.35 8.572936 | 135 9.03 33.38 8.714004 | 11 0.74 34.11 9.244488 | 635 42.47 76.59 9.396606 | 93 6.22 82.81 11.21351 | 20 1.34 84.15 12.09191 | 141 9.43 93.58 14.34257 | 1 0.07 93.65 15.46608 | 44 2.94 96.59 18.45657 | 2 0.13 96.72 19.90235 | 49 3.28 100.00 ------------+----------------------------------- Total | 1,495 100.00 univar los plos -------------- Quantiles -------------- Variable n Mean S.D. Min .25 Mdn .75 Max ------------------------------------------------------------------------------- los 1495 9.85 8.83 1.00 4.00 8.00 13.00 116.00 plos 1495 9.51 2.63 6.66 8.57 9.24 9.24 19.90 ------------------------------------------------------------------------------- corr los plos (obs=1495) | los plos -------------+------------------ los | 1.0000 plos | 0.3060 1.0000 /* Summary Table variable model log likelihood aic los poisson -6846.9485 9.1651485 los nbreg -4782.5989 6.4034768 newlos poisson -7229.6375 9.677107 newlos nbreg -4742.6087 6.3499782 los zpt -6846.6528 9.1647529 los ztnb -4737.535 6.3431906 */The zero-truncated models provided only a slight improvement over the negative binomial with the subtraction trick and also slightly better for than the standrad poisson regression.
In the final analysis, the predicted counts 't seem to match the observed counts only moderately well. This may be due, in part, to the fact that there are only eight different covariate patterns among the predictors, one of which, was not significant.