It was stated in the multinomial logistic regression unit that the model is equivalent simultaneous tests of k-1 comparisons versus a reference group using binary logistic regression. This is similar to what happens when a multicategory predictor variable is used as a right-hand side variable. It also allows us to illustrate left/right equivalency, that is, the equivalence of using using variables as either left-hand side (lhs) or right-hand (rhs) side variables.
Example 1
Consider the following example using two dichotomous variables, female and public, from the high school and beyond dataset (hsb2).
use http://www.gseis.ucla.edu/courses/data/hsb2 generate public=schtyp==1 tabulate public female | female public | male female | Total -----------+----------------------+---------- 0 | 14 18 | 32 1 | 77 91 | 168 -----------+----------------------+---------- Total | 91 109 | 200 display (14*91)/(77*18) /* odds ratio */ .91919192 tabulate female public | public female | 0 1 | Total -----------+----------------------+---------- male | 14 77 | 91 female | 18 91 | 109 -----------+----------------------+---------- Total | 32 168 | 200 display (14*91)/(18*77) /* odds ratio */ .91919192 logit public female Logit estimates Number of obs = 200 LR chi2(1) = 0.05 Prob > chi2 = 0.8281 Log likelihood = -87.910407 Pseudo R2 = 0.0003 ------------------------------------------------------------------------------ public | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | -.0842603 .3885359 -0.22 0.828 -.8457767 .677256 _cons | 1.704748 .2905436 5.87 0.000 1.135293 2.274203 ------------------------------------------------------------------------------ listcoef Odds of: 1 (public) vs 0 (private) ---------------------------------------------------------------------- public | b z P>|z| e^b e^bStdX SDofX -------------+-------------------------------------------------------- female | -0.08426 -0.217 0.828 0.9192 0.9588 0.4992 ---------------------------------------------------------------------- logit female public Logit estimates Number of obs = 200 LR chi2(1) = 0.05 Prob > chi2 = 0.8281 Log likelihood = -137.79477 Pseudo R2 = 0.0002 ------------------------------------------------------------------------------ female | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- public | -.0842603 .3885359 -0.22 0.828 -.8457767 .677256 _cons | .2513144 .3563483 0.71 0.481 -.4471154 .9497443 ------------------------------------------------------------------------------ listcoef Odds of: female vs male ---------------------------------------------------------------------- female | b z P>|z| e^b e^bStdX SDofX -------------+-------------------------------------------------------- public | -0.08426 -0.217 0.828 0.9192 0.9695 0.3675 ----------------------------------------------------------------------Example 2
Now, we will use a dichotomous variable (female) and a multicategory variable (ses).
tabulate ses female, row | female ses | male female | Total -----------+----------------------+---------- low | 15 32 | 47 | 31.91 68.09 | 100.00 -----------+----------------------+---------- middle | 47 48 | 95 | 49.47 50.53 | 100.00 -----------+----------------------+---------- high | 29 29 | 58 | 50.00 50.00 | 100.00 -----------+----------------------+---------- Total | 91 109 | 200 | 45.50 54.50 | 100.00 tabulate female ses, col | ses female | low middle high | Total -----------+---------------------------------+---------- male | 15 47 29 | 91 | 31.91 49.47 50.00 | 45.50 -----------+---------------------------------+---------- female | 32 48 29 | 109 | 68.09 50.53 50.00 | 54.50 -----------+---------------------------------+---------- Total | 47 95 58 | 200 | 100.00 100.00 100.00 | 100.00 mlogit ses female, base(1) Multinomial regression Number of obs = 200 LR chi2(2) = 4.68 Prob > chi2 = 0.0964 Log likelihood = -208.24309 Pseudo R2 = 0.0111 ------------------------------------------------------------------------------ ses | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- middle | female | -.7366323 .3742013 -1.97 0.049 -1.470053 -.0032113 _cons | 1.142097 .2965523 3.85 0.000 .5608656 1.723329 -------------+---------------------------------------------------------------- high | female | -.7576857 .4085121 -1.85 0.064 -1.558355 .0429834 _cons | .6592456 .31804 2.07 0.038 .0358988 1.282592 ------------------------------------------------------------------------------ (Outcome ses==low is the comparison group) xi: logit female i.ses Logit estimates Number of obs = 200 LR chi2(2) = 4.68 Prob > chi2 = 0.0964 Log likelihood = -135.47889 Pseudo R2 = 0.0170 ------------------------------------------------------------------------------ female | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ises_2 | -.7366323 .3742013 -1.97 0.049 -1.470053 -.0032113 _Ises_3 | -.7576857 .4085122 -1.85 0.064 -1.558355 .0429834 _cons | .7576857 .3129164 2.42 0.015 .1443808 1.370991 ------------------------------------------------------------------------------Example 3
We can do the same thing using two multicategory variables as lhs and rhs variables, in this case, ses and prog.
xi: mlogit ses i.prog, base(1) Multinomial regression Number of obs = 200 LR chi2(4) = 16.78 Prob > chi2 = 0.0021 Log likelihood = -202.19105 Pseudo R2 = 0.0398 ------------------------------------------------------------------------------ ses | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- middle | _Iprog_2 | .6166071 .4334269 1.42 0.155 -.232894 1.466108 _Iprog_3 | .725937 .4775892 1.52 0.129 -.2101205 1.661995 _cons | .2231436 .3354102 0.67 0.506 -.4342484 .8805355 -------------+---------------------------------------------------------------- high | _Iprog_2 | 1.368595 .5000522 2.74 0.006 .3885105 2.348679 _Iprog_3 | .0363676 .6322987 0.06 0.954 -1.202915 1.27565 _cons | -.5753641 .4166667 -1.38 0.167 -1.392016 .2412875 ------------------------------------------------------------------------------ (Outcome ses==low is the comparison group) xi: mlogit prog i.ses, base(1) Multinomial regression Number of obs = 200 LR chi2(4) = 16.78 Prob > chi2 = 0.0021 Log likelihood = -195.70519 Pseudo R2 = 0.0411 ------------------------------------------------------------------------------ prog | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- academic | _Ises_2 | .6166071 .4334269 1.42 0.155 -.232894 1.466108 _Ises_3 | 1.368595 .5000522 2.74 0.006 .3885105 2.348679 _cons | .1718503 .3393104 0.51 0.613 -.493186 .8368865 -------------+---------------------------------------------------------------- vocation | _Ises_2 | .725937 .4775892 1.52 0.129 -.2101205 1.661995 _Ises_3 | .0363676 .6322987 0.06 0.954 -1.202915 1.27565 _cons | -.2876821 .3818813 -0.75 0.451 -1.036156 .4607915 ------------------------------------------------------------------------------ (Outcome prog==general is the comparison group)Example 4
Finally, let's try this with a categorical and a continuous variable.
anova write prog Number of obs = 200 R-squared = 0.1776 Root MSE = 8.63918 Adj R-squared = 0.1693 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 3175.69786 2 1587.84893 21.27 0.0000 | prog | 3175.69786 2 1587.84893 21.27 0.0000 | Residual | 14703.1771 197 74.635417 -----------+---------------------------------------------------- Total | 17878.875 199 89.843593 daoneway write, by(prog) One-way Disciminant Function Analysis Observations = 200 Variables = 1 Groups = 3 Pct of Cum Canonical After Wilks' Fcn Eigenvalue Variance Pct Corr Fcn Lambda Chi-square df P-value | 0 0.82238 38.525 2 0.0000 1 0.2160 100.00 100.00 0.4215 | [output omitted] display "approximate F-ratio = " 38.525/2 approximate F-ratio = 19.2625 mlogit prog write Multinomial regression Number of obs = 200 LR chi2(2) = 37.17 Prob > chi2 = 0.0000 Log likelihood = -185.51084 Pseudo R2 = 0.0911 ------------------------------------------------------------------------------ prog | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- general | write | -.0660079 .0210013 -3.14 0.002 -.1071696 -.0248461 _cons | 2.71249 1.132801 2.39 0.017 .4922405 4.932739 -------------+---------------------------------------------------------------- vocation | write | -.1178089 .0216186 -5.45 0.000 -.1601806 -.0754372 _cons | 5.358994 1.115256 4.81 0.000 3.173132 7.544856 ------------------------------------------------------------------------------ (Outcome prog==academic is the comparison group) display "approximate F-ratio = " 37.17/2 approximate F-ratio = 18.585
Categorical Data Analysis Course
Phil Ender