In all of the examples so far, the observations have been independent. But what if the observations were matched or even repeated? You might think that it would possible to include dummy coded variables to indicate the matching. For example, if you had 56 matched pairs you could include 55 dummy variables to account for non-independence along with whatever covariates you wanted to have in the model. Logistic regression has problems when the number of degrees of freedom is close to the total degrees of freedom available. In a situation, such as this, the conditional logistic model is recommended.
Conditional logistic regression, also known as fixed effects logistic regression, is designed to work with matched subjects or repeated measures. Stata's clogit command will work with 1:1 matching, 1:k matching and repeated measures models. The repeated measures models are also called panel models or cross-sectional time-series models.
Example 1: 1-1 Matching
This example is adapted from Hosmer & Lemeshow (2000). Mothers of low birth weight babies were matched by age with mothers of normal weight babies. Low birth weight is defined as less than 2500 grams. The variable, pairid, indicates with mother were matched.
use http://www.gseis.ucla.edu/courses/data/lbwt11, clear describe Contains data from http://www.gseis.ucla.edu/courses/data/lbwt11.dta obs: 112 vars: 9 10 Feb 2001 12:40 size: 4,480 (99.9% of memory free) ------------------------------------------------------------------------------- 1. pairid float %9.0g 2. lbwt float %9.0g low brth wt < 2500g 3. age float %9.0g mother's age 4. lastwt float %9.0g last weight 5. race float %9.0g rl 1 wht 2 blk 3 oth 6. smoke float %9.0g smoke during pregnancy 7. ptd float %9.0g previous preterm delivery 8. ht float %9.0g hypertension 9. ui float %9.0g uterine irritability ------------------------------------------------------------------------------- summarize Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- pairid | 112 28.5 16.23587 1 56 lbwt | 112 .5 .5022472 0 1 age | 112 22.50893 4.341286 14 35 lastwt | 112 127.1696 30.46986 80 241 race | 112 2.026786 .9050392 1 3 smoke | 112 .4107143 .4941746 0 1 ptd | 112 .2232143 .4182723 0 1 ht | 112 .0892857 .2864373 0 1 ui | 112 .1785714 .3847144 0 1 tab1 smoke ptd ht ui -> tabulation of smoke smoke | during | pregnancy | Freq. Percent Cum. ------------+----------------------------------- 0 | 66 58.93 58.93 1 | 46 41.07 100.00 ------------+----------------------------------- Total | 112 100.00 -> tabulation of ptd previous | preterm | delivery | Freq. Percent Cum. ------------+----------------------------------- 0 | 87 77.68 77.68 1 | 25 22.32 100.00 ------------+----------------------------------- Total | 112 100.00 -> tabulation of ht hypertensio | n | Freq. Percent Cum. ------------+----------------------------------- 0 | 102 91.07 91.07 1 | 10 8.93 100.00 ------------+----------------------------------- Total | 112 100.00 -> tabulation of ui uterine | irritabilit | y | Freq. Percent Cum. ------------+----------------------------------- 0 | 92 82.14 82.14 1 | 20 17.86 100.00 ------------+----------------------------------- Total | 112 100.00 tab race, gen(race) 1 wht 2 blk | 3 oth | Freq. Percent Cum. ------------+----------------------------------- white | 44 39.29 39.29 black | 21 18.75 58.04 other | 47 41.96 100.00 ------------+----------------------------------- Total | 112 100.00 tabulate lbwt smoke, lrchi2 exp +--------------------+ | Key | |--------------------| | frequency | | expected frequency | +--------------------+ | smoke during low brth | pregnancy wt < 2500g | 0 1 | Total -----------+----------------------+---------- 0 | 40 16 | 56 | 33.0 23.0 | 56.0 -----------+----------------------+---------- 1 | 26 30 | 56 | 33.0 23.0 | 56.0 -----------+----------------------+---------- Total | 66 46 | 112 | 66.0 46.0 | 112.0 likelihood-ratio chi2(1) = 7.3216 Pr = 0.007 logit lbwt smoke, nolog /* does not take into account the matched pairs */ Logit estimates Number of obs = 112 LR chi2(1) = 7.32 Prob > chi2 = 0.0068 Log likelihood = -73.971688 Pseudo R2 = 0.0472 ------------------------------------------------------------------------------ lbwt | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- smoke | 1.059392 .3991154 2.65 0.008 .2771398 1.841643 _cons | -.4307829 .2519156 -1.71 0.087 -.9245285 .0629626 ------------------------------------------------------------------------------ clogit lbwt smoke, group(pairid) nolog Conditional (fixed-effects) logistic regression Number of obs = 112 LR chi2(1) = 6.79 Prob > chi2 = 0.0091 Log likelihood = -35.419282 Pseudo R2 = 0.0875 ------------------------------------------------------------------------------ lbwt | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- smoke | 1.011601 .4128614 2.45 0.014 .2024075 1.820794 ------------------------------------------------------------------------------ predict p1 (option pc1 assumed; conditional probability for single outcome within group) list pairid lbwt smoke p1, sep(2) +----------------------------------+ | pairid lbwt smoke p1 | |----------------------------------| 1. | 1 0 0 .2666667 | 2. | 1 1 1 .7333333 | |----------------------------------| 3. | 2 0 0 .5 | 4. | 2 1 0 .5 | |----------------------------------| 5. | 3 0 0 .5 | 6. | 3 1 0 .5 | |----------------------------------| 7. | 4 0 0 .2666667 | 8. | 4 1 1 .7333333 | |----------------------------------| 9. | 5 0 1 .5 | 10. | 5 1 1 .5 | |----------------------------------| 11. | 6 0 0 .2666667 | 12. | 6 1 1 .7333333 | |----------------------------------| 13. | 7 0 0 .5 | 14. | 7 1 0 .5 | |----------------------------------| 15. | 8 0 0 .5 | 16. | 8 1 0 .5 | |----------------------------------| 17. | 9 0 1 .7333333 | 18. | 9 1 0 .2666667 | |----------------------------------| /* manual computation of the probabilities */ display exp(1*1.011601)/(exp(1*1.011601)+exp(0*1.011601)) .73333335 display exp(0*1.011601)/(exp(1*1.011601)+exp(0*1.011601)) .26666665 display exp(1*1.011601)/(exp(1*1.011601)+exp(1*1.011601)) .5 display exp(0*1.011601)/(exp(0*1.011601)+exp(0*1.011601)) .5 clogit lbwt lastwt smoke race2 race3 ptd ht ui, group(pairid) nolog Conditional (fixed-effects) logistic regression Number of obs = 112 LR chi2(7) = 26.04 Prob > chi2 = 0.0005 Log likelihood = -25.794271 Pseudo R2 = 0.3355 ------------------------------------------------------------------------------ lbwt | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lastwt | -.0183757 .0100806 -1.82 0.068 -.0381333 .0013819 smoke | 1.400656 .6278396 2.23 0.026 .1701131 2.631199 race2 | .5713643 .6896449 0.83 0.407 -.7803149 1.923044 race3 | -.0253148 .6992044 -0.04 0.971 -1.39573 1.345101 ptd | 1.808009 .7886502 2.29 0.022 .2622829 3.353735 ht | 2.361152 1.086128 2.17 0.030 .2323797 4.489924 ui | 1.401929 .6961585 2.01 0.044 .0374836 2.766375 ------------------------------------------------------------------------------ test race2 race3 ( 1) race2 = 0.0 ( 2) race3 = 0.0 chi2( 2) = 0.88 Prob > chi2 = 0.6436 estimates store M1 clogit lbwt lastwt smoke ptd ht ui, group(pairid) nolog Conditional (fixed-effects) logistic regression Number of obs = 112 LR chi2(5) = 25.16 Prob > chi2 = 0.0001 Log likelihood = -26.236872 Pseudo R2 = 0.3241 ------------------------------------------------------------------------------ lbwt | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lastwt | -.0150834 .0081465 -1.85 0.064 -.0310503 .0008834 smoke | 1.479564 .5620191 2.63 0.008 .3780272 2.581102 ptd | 1.670594 .7468062 2.24 0.025 .206881 3.134308 ht | 2.329361 1.002549 2.32 0.020 .3644009 4.294322 ui | 1.344895 .693843 1.94 0.053 -.0150127 2.704802 ------------------------------------------------------------------------------ lrtest M1 likelihood-ratio test LR chi2(2) = 0.89 (Assumption: . nested in M1) Prob > chi2 = 0.6424 listcoef clogit (N=112): Factor Change in Odds Odds of: 1 vs 0 -------------------------------------------------- lbwt | b z P>|z| e^b -------------+------------------------------------ lastwt | -0.01508 -1.852 0.064 0.9850 smoke | 1.47956 2.633 0.008 4.3910 ptd | 1.67059 2.237 0.025 5.3153 ht | 2.32936 2.323 0.020 10.2714 ui | 1.34489 1.938 0.053 3.8378 -------------------------------------------------- listcoef, percent clogit (N=112): Percentage Change in Odds Odds of: 1 vs 0 -------------------------------------------------- lbwt | b z P>|z| % -------------+------------------------------------ lastwt | -0.01508 -1.852 0.064 -1.5 smoke | 1.47956 2.633 0.008 339.1 ptd | 1.67059 2.237 0.025 431.5 ht | 2.32936 2.323 0.020 927.1 ui | 1.34489 1.938 0.053 283.8 -------------------------------------------------- predict p list pairid lbwt p, sep(2) +--------------------------+ | pairid lbwt p | |--------------------------| 1. | 1 0 .0250138 | 2. | 1 1 .9749862 | |--------------------------| 3. | 2 0 .2519053 | 4. | 2 1 .7480947 | |--------------------------| 5. | 3 0 .6289979 | 6. | 3 1 .3710021 | |--------------------------| 7. | 4 0 .0164993 | 8. | 4 1 .9835007 | |--------------------------| 9. | 5 0 .4548728 | 10. | 5 1 .5451272 | |--------------------------| 11. | 6 0 .2019775 | 12. | 6 1 .7980225 | |--------------------------| 13. | 7 0 .5263715 | 14. | 7 1 .4736285 | |--------------------------| 15. | 8 0 .1210587 | 16. | 8 1 .8789413 | |--------------------------| 17. | 9 0 .9005696 | 18. | 9 1 .0994304 | |--------------------------| 19. | 10 0 .4939925 | 20. | 10 1 .5060075 | |--------------------------| 21. | 11 0 .004564 | 22. | 11 1 .995436 | |--------------------------| 23. | 12 0 .4511353 | 24. | 12 1 .5488647 | |--------------------------| 25. | 13 0 .2950889 | 26. | 13 1 .7049111 | |--------------------------| 27. | 14 0 .578796 | 28. | 14 1 .4212039 | |--------------------------| 29. | 15 0 .2663816 | 30. | 15 1 .7336184 | |--------------------------| prchange prchange does not work for last model estimated. prtab lastwt prtab does not work for the last type of model estimated.Example 2: 1-M Matching
Here is another example from Hosmer and Lemeshow which involves a 1-M matching. In this case, there are three controls matched with each diagnosed case of breast cancer.
use http://www.gseis.ucla.edu/courses/data/bbdm13, clear describe Contains data from bbdm13.dta obs: 200 vars: 15 5 Nov 2001 19:37 size: 10,400 (96.1% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- str float %9.0g stratum obs float %9.0g agmt float %9.0g age at interview fndx float %9.0g final diagnosis chk float %9.0g regular check-ups agp1 float %9.0g age at 1st preg agmn float %9.0g age at menarche nlv float %9.0g number stillbirths liv float %9.0g number live births wt float %9.0g wt at interview mst float %9.0g marital status mar byte %8.0g married mod byte %8.0g div or sep wid byte %8.0g widowed nvmr byte %8.0g never married ------------------------------------------------------------------------------- summarize Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- str | 200 25.5 14.46708 1 50 obs | 200 2.5 1.12084 1 4 agmt | 200 46.185 10.29323 27 68 fndx | 200 .25 .4340993 0 1 chk | 200 1.405 .4921239 1 2 agp1 | 178 23.57865 4.05847 14 40 agmn | 200 12.95 1.744338 8 17 nlv | 178 .5168539 .9638946 0 7 liv | 178 2.853933 1.544449 0 11 wt | 200 143.715 31.92994 80 280 mst | 200 1.655 1.234339 1 5 mar | 200 .725 .4476348 0 1 mod | 200 .13 .3371474 0 1 wid | 200 .085 .2795815 0 1 nvmr | 200 .06 .2380828 0 1 clogit fndx chk agmn wt mod wid nvmr, group(str) nolog Conditional (fixed-effects) logistic regression Number of obs = 200 LR chi2(6) = 48.20 Prob > chi2 = 0.0000 Log likelihood = -45.214824 Pseudo R2 = 0.3477 ------------------------------------------------------------------------------ fndx | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- chk | -1.121849 .4474471 -2.51 0.012 -1.998829 -.2448688 agmn | .3561333 .1291722 2.76 0.006 .1029605 .6093061 wt | -.0283565 .0099776 -2.84 0.004 -.0479122 -.0088009 mod | -.2030472 .6472909 -0.31 0.754 -1.471714 1.06562 wid | -.4915826 .8173094 -0.60 0.548 -2.09348 1.110314 nvmr | 1.472195 .7582064 1.94 0.052 -.0138621 2.958252 ------------------------------------------------------------------------------ listcoef clogit (N=200): Factor Change in Odds Odds of: 1 vs 0 -------------------------------------------------- fndx | b z P>|z| e^b -------------+------------------------------------ chk | -1.12185 -2.507 0.012 0.3257 agmn | 0.35613 2.757 0.006 1.4278 wt | -0.02836 -2.842 0.004 0.9720 mod | -0.20305 -0.314 0.754 0.8162 wid | -0.49158 -0.601 0.548 0.6117 nvmr | 1.47220 1.942 0.052 4.3588 -------------------------------------------------- listcoef, percent clogit (N=200): Percentage Change in Odds Odds of: 1 vs 0 -------------------------------------------------- fndx | b z P>|z| % -------------+------------------------------------ chk | -1.12185 -2.507 0.012 -67.4 agmn | 0.35613 2.757 0.006 42.8 wt | -0.02836 -2.842 0.004 -2.8 mod | -0.20305 -0.314 0.754 -18.4 wid | -0.49158 -0.601 0.548 -38.8 nvmr | 1.47220 1.942 0.052 335.9 -------------------------------------------------- test mod wid nvmr ( 1) mod = 0.0 ( 2) wid = 0.0 ( 3) nvmr = 0.0 chi2( 3) = 4.99 Prob > chi2 = 0.1724 xtlogit fndx chk agmn wt mod wid nvmr, i(str) fe nolog Conditional fixed-effects logistic regression Number of obs = 200 Group variable (i): str Number of groups = 50 Obs per group: min = 4 avg = 4.0 max = 4 LR chi2(6) = 48.20 Log likelihood = -45.214824 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ fndx | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- chk | -1.121849 .4474471 -2.51 0.012 -1.998829 -.2448688 agmn | .3561333 .1291722 2.76 0.006 .1029605 .6093061 wt | -.0283565 .0099776 -2.84 0.004 -.0479122 -.0088009 mod | -.2030472 .6472909 -0.31 0.754 -1.471714 1.06562 wid | -.4915826 .8173094 -0.60 0.548 -2.09348 1.110314 nvmr | 1.472195 .7582064 1.94 0.052 -.0138621 2.958252 ------------------------------------------------------------------------------ clogit fndx chk agmn wt nvmr, group(str) nolog Conditional (fixed-effects) logistic regression Number of obs = 200 LR chi2(4) = 47.75 Prob > chi2 = 0.0000 Log likelihood = -45.439011 Pseudo R2 = 0.3445 ------------------------------------------------------------------------------ fndx | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- chk | -1.161303 .4469763 -2.60 0.009 -2.037361 -.285246 agmn | .3592472 .1278849 2.81 0.005 .1085973 .609897 wt | -.0282355 .0099785 -2.83 0.005 -.047793 -.0086781 nvmr | 1.593384 .7360284 2.16 0.030 .1507946 3.035973 ------------------------------------------------------------------------------ listcoef clogit (N=200): Factor Change in Odds Odds of: 1 vs 0 -------------------------------------------------- fndx | b z P>|z| e^b -------------+------------------------------------ chk | -1.16130 -2.598 0.009 0.3131 agmn | 0.35925 2.809 0.005 1.4323 wt | -0.02824 -2.830 0.005 0.9722 nvmr | 1.59338 2.165 0.030 4.9204 -------------------------------------------------- listcoef, percent clogit (N=200): Percentage Change in Odds Odds of: 1 vs 0 -------------------------------------------------- fndx | b z P>|z| % -------------+------------------------------------ chk | -1.16130 -2.598 0.009 -68.7 agmn | 0.35925 2.809 0.005 43.2 wt | -0.02824 -2.830 0.005 -2.8 nvmr | 1.59338 2.165 0.030 392.0 --------------------------------------------------
Categorical Data Analysis Course
Phil Ender