Example with Binary Response Variable
The binary response example is derived from the previous example by converting depression scores to 0/1 values with a cut point of 11 and retaining only the 61 observations with complete data.
use http://www.gseis.ucla.edu/courses/data/deprl, clear list in 1/3, nodisplay noobs nolabel /* output edited */ id dep1 dep2 dep3 dep4 dep5 dep6 treat pre 1 1 1 1 1 1 1 0 18 2 1 1 1 1 1 0 0 27 3 1 1 1 0 0 0 0 16 summarize Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- id | 61 31 17.75293 1 61 dep1 | 61 .7213115 .4520748 0 1 dep2 | 61 .6885246 .4669398 0 1 dep3 | 61 .5409836 .502453 0 1 dep4 | 61 .4754098 .5035394 0 1 dep5 | 61 .3770492 .4886694 0 1 dep6 | 61 .2459016 .4341942 0 1 treat | 61 .557377 .500819 0 1 pre | 61 21.03279 3.710199 15 28 tab1 dep1 dep2 dep3 dep4 dep5 dep6 treat -> tabulation of dep1 1 dep | Freq. Percent Cum. ------------+----------------------------------- 0 | 17 27.87 27.87 1 | 44 72.13 100.00 ------------+----------------------------------- Total | 61 100.00 -> tabulation of dep2 2 dep | Freq. Percent Cum. ------------+----------------------------------- 0 | 19 31.15 31.15 1 | 42 68.85 100.00 ------------+----------------------------------- Total | 61 100.00 -> tabulation of dep3 3 dep | Freq. Percent Cum. ------------+----------------------------------- 0 | 28 45.90 45.90 1 | 33 54.10 100.00 ------------+----------------------------------- Total | 61 100.00 -> tabulation of dep4 4 dep | Freq. Percent Cum. ------------+----------------------------------- 0 | 32 52.46 52.46 1 | 29 47.54 100.00 ------------+----------------------------------- Total | 61 100.00 -> tabulation of dep5 5 dep | Freq. Percent Cum. ------------+----------------------------------- 0 | 38 62.30 62.30 1 | 23 37.70 100.00 ------------+----------------------------------- Total | 61 100.00 -> tabulation of dep6 6 dep | Freq. Percent Cum. ------------+----------------------------------- 0 | 46 75.41 75.41 1 | 15 24.59 100.00 ------------+----------------------------------- Total | 61 100.00 -> tabulation of treat treat | Freq. Percent Cum. ------------+----------------------------------- placebo | 27 44.26 44.26 estrogen | 34 55.74 100.00 ------------+----------------------------------- Total | 61 100.00 corr dep1 dep2 dep3 dep4 dep5 dep6 (obs=61) | dep1 dep2 dep3 dep4 dep5 dep6 -------------+------------------------------------------------------ dep1 | 1.0000 dep2 | 0.4504 1.0000 dep3 | 0.3079 0.6591 1.0000 dep4 | 0.2256 0.4985 0.6134 1.0000 dep5 | 0.2573 0.5233 0.5130 0.7495 1.0000 dep6 | 0.1851 0.3019 0.5260 0.5236 0.6554 1.0000 reshape long dep, i(id) j(visit) (note: j = 1 2 3 4 5 6) Data wide -> long ----------------------------------------------------------------------------- Number of obs. 61 -> 366 Number of variables 9 -> 5 j variable (6 values) -> visit xij variables: dep1 dep2 ... dep6 -> dep ----------------------------------------------------------------------------- list in 1/18, nolabel id visit dep treat pre 1. 1 1 1 0 18 2. 1 2 1 0 18 3. 1 3 1 0 18 4. 1 4 1 0 18 5. 1 5 1 0 18 6. 1 6 1 0 18 7. 2 1 1 0 27 8. 2 2 1 0 27 9. 2 3 1 0 27 10. 2 4 1 0 27 11. 2 5 1 0 27 12. 2 6 0 0 27 13. 3 1 1 0 16 14. 3 2 1 0 16 15. 3 3 1 0 16 16. 3 4 0 0 16 17. 3 5 0 0 16 18. 3 6 0 0 16 logit dep pre treat Logit estimates Number of obs = 366 LR chi2(2) = 67.12 Prob > chi2 = 0.0000 Log likelihood = -220.08461 Pseudo R2 = 0.1323 ------------------------------------------------------------------------------ dep | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pre | .1672391 .0337122 4.96 0.000 .1011644 .2333139 treat | -1.573125 .2415083 -6.51 0.000 -2.046473 -1.099778 _cons | -2.586276 .6907273 -3.74 0.000 -3.940077 -1.232476 ------------------------------------------------------------------------------ xtgee dep pre treat, i(id) link(logit) fam(bin) corr(ind) GEE population-averaged model Number of obs = 366 Group variable: id Number of groups = 61 Link: logit Obs per group: min = 6 Family: binomial avg = 6.0 Correlation: independent max = 6 Wald chi2(2) = 53.47 Scale parameter: 1 Prob > chi2 = 0.0000 Pearson chi2(366): 369.76 Deviance = 440.17 Dispersion (Pearson): 1.010272 Dispersion = 1.202648 ------------------------------------------------------------------------------ dep | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pre | .1672391 .0337125 4.96 0.000 .1011638 .2333145 treat | -1.573125 .2415102 -6.51 0.000 -2.046476 -1.099774 _cons | -2.586276 .6907322 -3.74 0.000 -3.940087 -1.232466 ------------------------------------------------------------------------------ xtcorr Estimated within-id correlation matrix R: c1 c2 c3 c4 c5 c6 r1 1.0000 r2 0.0000 1.0000 r3 0.0000 0.0000 1.0000 r4 0.0000 0.0000 0.0000 1.0000 r5 0.0000 0.0000 0.0000 0.0000 1.0000 r6 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 xtgee dep pre treat, i(id) link(logit) fam(bin) corr(exc) GEE population-averaged model Number of obs = 366 Group variable: id Number of groups = 61 Link: logit Obs per group: min = 6 Family: binomial avg = 6.0 Correlation: exchangeable max = 6 Wald chi2(2) = 23.63 Scale parameter: 1 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ dep | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pre | .1672391 .0507097 3.30 0.001 .0678499 .2666284 treat | -1.573125 .3632751 -4.33 0.000 -2.285131 -.8611189 _cons | -2.586276 1.038986 -2.49 0.013 -4.622652 -.5499009 ------------------------------------------------------------------------------ xtcorr Estimated within-id correlation matrix R: c1 c2 c3 c4 c5 c6 r1 1.0000 r2 0.2525 1.0000 r3 0.2525 0.2525 1.0000 r4 0.2525 0.2525 0.2525 1.0000 r5 0.2525 0.2525 0.2525 0.2525 1.0000 r6 0.2525 0.2525 0.2525 0.2525 0.2525 1.0000 generate pxt = pre*treat xtgee dep pre treat pxt, i(id) link(logit) fam(bin) corr(ar1) t(visit) GEE population-averaged model Number of obs = 366 Group and time vars: id visit Number of groups = 61 Link: logit Obs per group: min = 6 Family: binomial avg = 6.0 Correlation: AR(1) max = 6 Wald chi2(3) = 19.59 Scale parameter: 1 Prob > chi2 = 0.0002 ------------------------------------------------------------------------------ dep | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pre | .1526517 .0748497 2.04 0.041 .0059491 .2993544 treat | -.9262238 2.107262 -0.44 0.660 -5.056382 3.203935 pxt | -.0245282 .1003177 -0.24 0.807 -.2211473 .1720909 _cons | -2.378658 1.516685 -1.57 0.117 -5.351307 .5939899 ------------------------------------------------------------------------------ xtgee dep pre treat, i(id) link(logit) fam(bin) corr(ar1) t(visit) GEE population-averaged model Number of obs = 366 Group and time vars: id visit Number of groups = 61 Link: logit Obs per group: min = 6 Family: binomial avg = 6.0 Correlation: AR(1) max = 6 Wald chi2(2) = 19.71 Scale parameter: 1 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ dep | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pre | .1390761 .049729 2.80 0.005 .041609 .2365432 treat | -1.434432 .359136 -3.99 0.000 -2.138326 -.7305387 _cons | -2.107566 1.030142 -2.05 0.041 -4.126608 -.0885242 ------------------------------------------------------------------------------ xtcorr Estimated within-id correlation matrix R: c1 c2 c3 c4 c5 c6 r1 1.0000 r2 0.5256 1.0000 r3 0.2762 0.5256 1.0000 r4 0.1452 0.2762 0.5256 1.0000 r5 0.0763 0.1452 0.2762 0.5256 1.0000 r6 0.0401 0.0763 0.1452 0.2762 0.5256 1.0000 xtgee, eform GEE population-averaged model Number of obs = 366 Group and time vars: id visit Number of groups = 61 Link: logit Obs per group: min = 6 Family: binomial avg = 6.0 Correlation: AR(1) max = 6 Wald chi2(2) = 19.71 Scale parameter: 1 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ dep | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pre | 1.149212 .0571492 2.80 0.005 1.042487 1.266862 treat | .2382506 .0855644 -3.99 0.000 .1178519 .4816495 ------------------------------------------------------------------------------ xi: xtgee dep pre treat i.visit, i(id) link(logit) fam(bin) corr(ar1) t(visit) GEE population-averaged model Number of obs = 366 Group and time vars: id visit Number of groups = 61 Link: logit Obs per group: min = 6 Family: binomial avg = 6.0 Correlation: AR(1) max = 6 Wald chi2(7) = 43.18 Scale parameter: 1 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ dep | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pre | .1663743 .0534293 3.11 0.002 .0616548 .2710938 treat | -1.736828 .3977053 -4.37 0.000 -2.516316 -.9573399 _Ivisit_2 | -.1606584 .3089872 -0.52 0.603 -.7662623 .4449455 _Ivisit_3 | -.9535964 .3704544 -2.57 0.010 -1.679674 -.2275192 _Ivisit_4 | -1.301396 .4028895 -3.23 0.001 -2.091045 -.5117472 _Ivisit_5 | -1.806927 .4283058 -4.22 0.000 -2.646391 -.9674631 _Ivisit_6 | -2.567095 .4682141 -5.48 0.000 -3.484778 -1.649412 _cons | -1.335994 1.104602 -1.21 0.226 -3.500974 .8289849 ------------------------------------------------------------------------------ xtgee dep pre treat visit, i(id) link(logit) fam(bin) corr(ar1) t(visit) GEE population-averaged model Number of obs = 366 Group and time vars: id visit Number of groups = 61 Link: logit Obs per group: min = 6 Family: binomial avg = 6.0 Correlation: AR(1) max = 6 Wald chi2(3) = 42.84 Scale parameter: 1 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ dep | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pre | .1756632 .0533018 3.30 0.001 .0711937 .2801328 treat | -1.759441 .3946169 -4.46 0.000 -2.532876 -.9860058 visit | -.5189469 .0917666 -5.66 0.000 -.6988061 -.3390876 _cons | -.8539732 1.095211 -0.78 0.436 -3.000546 1.2926 ------------------------------------------------------------------------------ /* test visit categorical versus visit continuous */ xtgee dep pre treat visit _Ivisit_3 _Ivisit_4 _Ivisit_5 _Ivisit_6, i(id) link(logit) fam(bin) corr(ar1) t(visit) GEE population-averaged model Number of obs = 366 Group and time vars: id visit Number of groups = 61 Link: logit Obs per group: min = 6 Family: binomial avg = 6.0 Correlation: AR(1) max = 6 Wald chi2(7) = 43.18 Scale parameter: 1 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ dep | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pre | .1663743 .0534293 3.11 0.002 .0616548 .2710938 treat | -1.736828 .3977053 -4.37 0.000 -2.516316 -.9573399 visit | -.1606584 .3089872 -0.52 0.603 -.7662623 .4449455 _Ivisit_3 | -.6322796 .4825321 -1.31 0.190 -1.578025 .3134658 _Ivisit_4 | -.8194209 .8082586 -1.01 0.311 -2.403579 .764737 _Ivisit_5 | -1.164294 1.121156 -1.04 0.299 -3.361719 1.033132 _Ivisit_6 | -1.763803 1.434124 -1.23 0.219 -4.574634 1.047028 _cons | -1.175336 1.185178 -0.99 0.321 -3.498242 1.14757 ------------------------------------------------------------------------------ test _Ivisit_3 _Ivisit_4 _Ivisit_5 _Ivisit_6 ( 1) _Ivisit_3 = 0 ( 2) _Ivisit_4 = 0 ( 3) _Ivisit_5 = 0 ( 4) _Ivisit_6 = 0 chi2( 4) = 2.58 Prob > chi2 = 0.6303 /* rerun model with continuous time */ xtgee dep pre treat visit, i(id) link(logit) fam(bin) corr(ar1) t(visit) Iteration 1: tolerance = .14672104 Iteration 2: tolerance = .00108396 Iteration 3: tolerance = .00009351 Iteration 4: tolerance = 3.269e-06 Iteration 5: tolerance = 2.043e-07 GEE population-averaged model Number of obs = 366 Group and time vars: id visit Number of groups = 61 Link: logit Obs per group: min = 6 Family: binomial avg = 6.0 Correlation: AR(1) max = 6 Wald chi2(3) = 42.84 Scale parameter: 1 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ dep | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pre | .1756632 .0533018 3.30 0.001 .0711937 .2801328 treat | -1.759441 .3946169 -4.46 0.000 -2.532876 -.9860058 visit | -.5189469 .0917666 -5.66 0.000 -.6988061 -.3390876 _cons | -.8539732 1.095211 -0.78 0.436 -3.000546 1.2926 ------------------------------------------------------------------------------ predict p table visit treat, cont(mean dep mean p) ------------------------------ | treat visit | placebo estrogen ----------+------------------- 1 | .8518519 .6176471 /* observed proportion */ | .8917946 .6337553 /* predicted proportion */ | 2 | .8518519 .5588235 | .8336766 .5178159 | 3 | .8148148 .3235294 | .7546193 .4001624 | 4 | .6666667 .3235294 | .6557054 .2923599 | 5 | .5555556 .2352941 | .5431298 .2027277 | 6 | .4074074 .1176471 | .4270517 .1344281 ------------------------------Example with Count Response Variable
In this section we will use data on executions in each of the 50 states for the years 1995, 1997 and 1999.
use http://www.gseis.ucla.edu/courses/data/execute2 describe Contains data from execute2.dta obs: 150 2000 us stat abstracts vars: 7 13 Feb 2002 21:47 size: 4,650 (89.7% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- sid float %9.0g state str3 %9s execute float %9.0g # executions murder float %9.0g murder rate unemp float %9.0g unemployment rate confed float %9.0g confederate state year float %9.0g ------------------------------------------------------------------------------- univar sid execute-year -------------- Quantiles -------------- Variable n Mean S.D. Min .25 Mdn .75 Max ------------------------------------------------------------------------------- sid 150 25.50 14.48 1.00 13.00 25.50 38.00 50.00 execute 150 1.43 4.74 0.00 0.00 0.00 1.00 37.00 murder 150 5.96 3.31 0.50 3.20 5.85 8.10 17.00 unemp 150 4.67 1.18 2.50 3.70 4.70 5.40 7.90 confed 150 0.22 0.42 0.00 0.00 0.00 0.00 1.00 year 150 1997.00 1.64 1995.00 1995.00 1997.00 1999.00 1999.00 ------------------------------------------------------------------------------- tabstat execute, by(year) stat(n mean var) Summary for variables: execute by categories of: year year | N mean variance ---------+------------------------------ 1995 | 50 1.04 8.733061 1997 | 50 1.42 29.14653 1999 | 50 1.82 30.02816 ---------+------------------------------ Total | 150 1.426667 22.43418 ---------------------------------------- nbvargr execute separate execute, by(year) graph execute1995 execute1997 execute1999 sid, s(iii) c(ll[_]l[-]) drop murder-confed execute1995-execute1999 reshape wide exec, i(sid) j(year) (note: j = 1995 1997 1999) Data long -> wide ----------------------------------------------------------------------------- Number of obs. 150 -> 50 Number of variables 4 -> 5 j variable (3 values) year -> (dropped) xij variables: execute -> execute1995 execute1997 execute1999 ----------------------------------------------------------------------------- corr execute1995 execute1997 execute1999 (obs=50) | exe~1995 exe~1997 exe~1999 -------------+--------------------------- execute1995 | 1.0000 execute1997 | 0.9481 1.0000 execute1999 | 0.9406 0.9608 1.0000 use http://www.gseis.ucla.edu/courses/data/execute2 xi: nbreg execute murder unemp confed i.year, cluster(sid) i.year _Iyear_1995-1999 (naturally coded; _Iyear_1995 omitted) Negative binomial regression Number of obs = 150 Wald chi2(5) = 37.12 Log likelihood = -167.2556 Prob > chi2 = 0.0000 (standard errors adjusted for clustering on sid) ------------------------------------------------------------------------------ | Robust execute | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- murder | .4056555 .1236221 3.28 0.001 .1633606 .6479504 unemp | -.6013436 .241296 -2.49 0.013 -1.074275 -.1284122 confed | 2.244357 .7560092 2.97 0.003 .7626066 3.726108 _Iyear_1997 | .4385796 .304441 1.44 0.150 -.1581139 1.035273 _Iyear_1999 | .729746 .4136177 1.76 0.078 -.0809298 1.540422 _cons | -1.186817 1.327187 -0.89 0.371 -3.788056 1.414422 -------------+---------------------------------------------------------------- /lnalpha | 1.272271 .2545488 .7733648 1.771178 -------------+---------------------------------------------------------------- alpha | 3.56895 .9084719 2.167046 5.877772 ------------------------------------------------------------------------------ test _Iyear_1997 _Iyear_1999 ( 1) [execute]_Iyear_1997 = 0.0 ( 2) [execute]_Iyear_1999 = 0.0 chi2( 2) = 3.12 Prob > chi2 = 0.2103 xi: xtgee execute murder unemp confed i.year, i(sid) fam(nbin) link(log) corr(exc) i.year _Iyear_1995-1999 (naturally coded; _Iyear_1995 omitted) GEE population-averaged model Number of obs = 150 Group variable: sid Number of groups = 50 Link: log Obs per group: min = 3 Family: negative binomial(k=1) avg = 3.0 Correlation: exchangeable max = 3 Wald chi2(5) = 53.39 Scale parameter: 1 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ execute | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- murder | .0860412 .0466163 1.85 0.065 -.005325 .1774075 unemp | -.1054161 .1413111 -0.75 0.456 -.3823807 .1715486 confed | 2.076369 .4320253 4.81 0.000 1.229615 2.923123 _Iyear_1997 | .2408024 .1687292 1.43 0.154 -.0899006 .5715055 _Iyear_1999 | .6461118 .2150903 3.00 0.003 .2245425 1.067681 _cons | -1.040887 .7862023 -1.32 0.186 -2.581815 .5000416 ------------------------------------------------------------------------------ test _Iyear_1997 _Iyear_1999 ( 1) _Iyear_1997 = 0.0 ( 2) _Iyear_1999 = 0.0 chi2( 2) = 9.53 Prob > chi2 = 0.0085 xtcorr Estimated within-sid correlation matrix R: c1 c2 c3 r1 1.0000 r2 0.7856 1.0000 r3 0.7856 0.7856 1.0000
Categorical Data Analysis Course
Phil Ender