Contingency tables are sufficient for analyzing associations in two-way tables. Three-way and higher tables have many more associations that may be explored necessitating a flexible method for generating the expected frequencies for the different hypotheses. Loglinear regression models are one approach that can be used. Loglinear regression models have the general form
We will illustrate loglinear regression models for contingency tables using the acm dataset from Agresti. The acm file contains information about alochol, cigarette, and marijuana use. Let's begin with a two-variable example looking at the relation between cigarette smoking and marijuana use. We will start off with a traditional contingency table analysis that the independence between cigarettes and marijuana.
Two-Variable Example
use http://www.philender.com/courses/data/acm, clear describe Contains data from http://www.gseis.ucla.edu/courses/data/acm.dta obs: 8 vars: 4 28 Nov 2001 14:28 size: 72 (100.0% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- a byte %8.0g yn alcohol use c byte %8.0g yn cigarette use m byte %8.0g yn marijuana use freq int %8.0g ------------------------------------------------------------------------------- list a c m freq 1. no no yes 2 2. no yes yes 3 3. no yes no 43 4. yes no yes 44 5. no no no 279 6. yes no no 456 7. yes yes no 538 8. yes yes yes 911 tabulate c m [fw=freq], exp all +--------------------+ | Key | |--------------------| | frequency | | expected frequency | +--------------------+ cigarette | marijuana use use | no yes | Total -----------+----------------------+---------- no | 735 46 | 781 | 451.6 329.4 | 781.0 -----------+----------------------+---------- yes | 581 914 | 1,495 | 864.4 630.6 | 1,495.0 -----------+----------------------+---------- Total | 1,316 960 | 2,276 | 1,316.0 960.0 | 2,276.0 Pearson chi2(1) = 642.0347 Pr = 0.000 likelihood-ratio chi2(1) = 751.8083 Pr = 0.000 Cramer's V = 0.5311 gamma = 0.9235 ASE = 0.012 Kendall's tau-b = 0.5311 ASE = 0.014 ipf [fw=freq], fit(c+m) exp Expansion of the various marginal models ---------------------------------------- marginal model 1 varlist : c marginal model 2 varlist : m unique varlist c m ------------------------------------------------------------------- N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 1 Goodness of Fit Tests --------------------- df = 1 Likelihood Ratio Statistic G^2 = 751.8083 p-value = 0.000 Pearson Statistic X^2 = 642.0347 p-value = 0.000 c m Efreq Ofreq prob 1 1 630.57996 914 .27705622 1 2 864.42004 581 .37979791 2 1 329.42004 46 .1447364 2 2 451.57996 735 .19840947 ipf [fw=freq], fit(c*m) exp Expansion of the various marginal models ---------------------------------------- marginal model 1 varlist : c m unique varlist c m ------------------------------------------------------------------- N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 0 Goodness of Fit Tests --------------------- df = 0 Likelihood Ratio Statistic G^2 = 0.0000 p-value = . Pearson Statistic X^2 = 0.0000 p-value = . c m Efreq Ofreq prob 1 1 914 914 .40158172 1 2 581 581 .25527241 2 1 46 46 .0202109 2 2 735 735 .32293497The last model, using c*m, perfectly reproduces the observed frequencies because it uses information concerning both the joint and marginal distributions of cigarettes and marijuana. It is equivalent to testing c+m+c*m and is called the saturated model since it is fully deterministic.
The loglinear model for independence looks like this:
The loglinear model for the saturated case looks like this:
When you get to three variables and higher models the questions that can be asked are more interesting than with only two variables. If we add alcohol to our model then testing a+c+m is equivalent to running a 3-way contingency table test for independence. This is known as the mutual independence model. It isn't really very interesting and fortunately isn't all that common either.
The loglinear model for mutual independence would look something like this:
A shorthand notation for this model would be (a,c,m).
Consider the following model:
This is the joint independence model which asserts the two-way independence independence between cigarettes and four combinations of levels of alcohol and marijuana.
Consider the following conditional independence model:
This model accounts for the association between cigarettes and marijuana controlling for alochol and for the association between alcohol and marijuana controlling for cigarettes. This is a conditional independence model specifying the conditional independence between alcohol and cigarettes while controlling for marijuana. It can be denoted (cm,am).
A model that contains all of the two-way interactions is known as a homogeneous association model.
In this model all three pairs of variables are conditionally dependent on the third variable and is denoted (am,cm,ac). The saturated model, (acm), with all possible interactions, looks like this:
Three-Variable Example
table c m [fw=freq], by(a) ---------------------- alcohol | use and | marijuana cigarette | use use | no yes ----------+----------- no | no | 279 2 yes | 43 3 ----------+----------- yes | no | 456 44 yes | 538 911 ---------------------- ipf [fw=freq], fit(a+c+m) /* mutual independence model */ Expansion of the various marginal models ---------------------------------------- marginal model 1 varlist : a marginal model 2 varlist : c marginal model 3 varlist : m unique varlist a c m ------------------------------------------------------------------- N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 4 Goodness of Fit Tests --------------------- df = 4 Likelihood Ratio Statistic G^2 = 1286.0199 p-value = 0.000 Pearson Statistic X^2 = 1411.3860 p-value = 0.000 ipf [fw=freq], fit(a*c+m) Expansion of the various marginal models ---------------------------------------- marginal model 1 varlist : a c marginal model 2 varlist : m unique varlist a c m ------------------------------------------------------------------- N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 3 Goodness of Fit Tests --------------------- df = 3 Likelihood Ratio Statistic G^2 = 843.8267 p-value = 0.000 Pearson Statistic X^2 = 704.9071 p-value = 0.000 ipf [fw=freq], fit(a*m+c) Expansion of the various marginal models ---------------------------------------- marginal model 1 varlist : a m marginal model 2 varlist : c unique varlist a m c ------------------------------------------------------------------- N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 3 Goodness of Fit Tests --------------------- df = 3 Likelihood Ratio Statistic G^2 = 939.5626 p-value = 0.000 Pearson Statistic X^2 = 824.1630 p-value = 0.000 ipf [fw=freq], fit(c*m+a) Expansion of the various marginal models ---------------------------------------- marginal model 1 varlist : c m marginal model 2 varlist : a unique varlist c m a ------------------------------------------------------------------- N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 3 Goodness of Fit Tests --------------------- df = 3 Likelihood Ratio Statistic G^2 = 534.2117 p-value = 0.000 Pearson Statistic X^2 = 505.5977 p-value = 0.000 ipf [fw=freq], fit(c*m+a*m) Expansion of the various marginal models ---------------------------------------- marginal model 1 varlist : c m marginal model 2 varlist : a m unique varlist c m a ------------------------------------------------------------------- N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 2 Goodness of Fit Tests --------------------- df = 2 Likelihood Ratio Statistic G^2 = 187.7543 p-value = 0.000 Pearson Statistic X^2 = 177.6149 p-value = 0.000 ipf [fw=freq], fit(c*a+m*a) Expansion of the various marginal models ---------------------------------------- marginal model 1 varlist : c a marginal model 2 varlist : m a unique varlist c a m ------------------------------------------------------------------- N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 2 Goodness of Fit Tests --------------------- df = 2 Likelihood Ratio Statistic G^2 = 497.3693 p-value = 0.000 Pearson Statistic X^2 = 443.7611 p-value = 0.000 ipf [fw=freq], fit(a*c+c*m) Expansion of the various marginal models ---------------------------------------- marginal model 1 varlist : a c marginal model 2 varlist : c m unique varlist a c m ------------------------------------------------------------------- N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 2 Goodness of Fit Tests --------------------- df = 2 Likelihood Ratio Statistic G^2 = 92.0184 p-value = 0.000 Pearson Statistic X^2 = 80.8148 p-value = 0.000 ipf [fw=freq], fit(c*m+a*m+a*c) /* homogeneous association model */ Expansion of the various marginal models ---------------------------------------- marginal model 1 varlist : c m marginal model 2 varlist : a m marginal model 3 varlist : a c unique varlist c m a ------------------------------------------------------------------- N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 1 Goodness of Fit Tests --------------------- df = 1 Likelihood Ratio Statistic G^2 = 0.3740 p-value = 0.541 Pearson Statistic X^2 = 0.4011 p-value = 0.527 ipf [fw=freq], fit(a*c*m) /* fully saturated model */ Expansion of the various marginal models ---------------------------------------- marginal model 1 varlist : a c m unique varlist a c m ------------------------------------------------------------------- N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 0 Goodness of Fit Tests --------------------- df = 0 Likelihood Ratio Statistic G^2 = 0.0000 p-value = . Pearson Statistic X^2 = 0.0000 p-value = .Let's collect all of the results together into a table.
Model df G2 χ2 a+c+m 4 1286.02 1411.39 mutual independence a*c+m 3 843.83 704.91 joint independentce a*m+c 3 939.56 824.16 joint independentce c*m+a 3 534.21 505.6 joint independentce c*m+a*m 2 187.75 177.61 conditional independence c*a+m*a 2 497.37 443.76 conditional independence a*c+c*m 2 92.02 80.81 conditional independence c*m+a*m+a*c 1 .4 .4 homogeneous association a*c*m 0 0.0 0.0 fully saturatedIt is possible to test the λac = 0 in the (c*m+a*m+a*c) model. This is accomplished by computing a difference in the G2's which is equivalent to a likelihood ration test.
Using the Poisson Distribution
Loglinear regression models can also be estimated using the poisson distribution. In this example we will use the glm command with family = poisson and link = log to estimate the models for (a+c+m), (a*c+c*m+a*c) and (c*m+a*m). We will be discussing generalized linear models, glm, later in the course.
glm freq a c m, family(pois) link(log) nolog Generalized linear models No. of obs = 8 Optimization : ML: Newton-Raphson Residual df = 4 Scale parameter = 1 Deviance = 1286.019954 (1/df) Deviance = 321.505 Pearson = 1411.386007 (1/df) Pearson = 352.8465 Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] Standard errors : OIM Log likelihood = -667.5316914 AIC = 167.8829 BIC = 1277.702188 ------------------------------------------------------------------------------ freq | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- a | 1.785112 .0597594 29.87 0.000 1.667985 1.902238 c | .6493063 .0441509 14.71 0.000 .5627721 .7358406 m | -.3154188 .0424446 -7.43 0.000 -.3986087 -.2322289 _cons | 4.172538 .0649589 64.23 0.000 4.045221 4.299855 ------------------------------------------------------------------------------ tabulate c m [fw=freq], cell cigarette | marijuana use use | no yes | Total -----------+----------------------+---------- no | 735 46 | 781 | 32.29 2.02 | 34.31 -----------+----------------------+---------- yes | 581 914 | 1,495 | 25.53 40.16 | 65.69 -----------+----------------------+---------- Total | 1,316 960 | 2,276 | 57.82 42.18 | 100.00 prtesti 781 .3431 1495 .6569 Two-sample test of proportion x: Number of obs = 781 y: Number of obs = 1495 ------------------------------------------------------------------------------ Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .3431 .0169877 .3098047 .3763953 y | .6569 .0122783 .6328349 .6809651 -------------+---------------------------------------------------------------- diff | -.3138 .0209604 -.3548817 -.2727183 | under Ho: .0219682 -14.28 0.000 ------------------------------------------------------------------------------ Ho: proportion(x) - proportion(y) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 z = -14.284 z = -14.284 z = -14.284 P < z = 0.0000 P > |z| = 0.0000 P > z = 1.0000The constant in the glm model is the log of the expected frequency for cell a=0, c=0, m=0, thus exp(4.172538) = 64.879909. The coefficient for c is the difference in the logs of the expected frequency from c=0 to c=1 for a=0 and m=0. The test of the coefficient is the test of the differences in the marginal proportions for cigarette.
We will need to create three interaction terms, a*m, a*c and c*m. The response variable will be freq the number of observations.
generate am = a*m generate ac = a*c generate cm = c*m glm freq a c m ac cm am, family(pois) link(log) Generalized linear models No. of obs = 8 Optimization : ML: Newton-Raphson Residual df = 1 Scale param = 1 Deviance = .3739858701 (1/df) Deviance = .3739859 Pearson = .4011005168 (1/df) Pearson = .4011005 Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] Standard errors : OIM Log likelihood = -24.70870712 AIC = 7.927177 BIC = -14.18210492 ------------------------------------------------------------------------------ freq | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- a | .487719 .0757672 6.44 0.000 .339218 .63622 c | -1.886669 .162697 -11.60 0.000 -2.205549 -1.567789 m | -5.309042 .475197 -11.17 0.000 -6.240411 -4.377673 ac | 2.054534 .1740643 11.80 0.000 1.713374 2.395694 cm | 2.847889 .1638394 17.38 0.000 2.52677 3.169009 am | 2.986014 .464678 6.43 0.000 2.075262 3.896767 _cons | 5.63342 .0597008 94.36 0.000 5.516409 5.750432 ------------------------------------------------------------------------------ lrtest, saving(0) glm freq a c m am cm, family(pois) link(log) Generalized linear models No. of obs = 8 Optimization : ML: Newton-Raphson Residual df = 2 Scale param = 1 Deviance = 187.7543029 (1/df) Deviance = 93.87715 Pearson = 177.6148606 (1/df) Pearson = 88.80743 Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] Standard errors : OIM Log likelihood = -118.3988656 AIC = 31.09972 BIC = 175.2776537 ------------------------------------------------------------------------------ freq | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- a | 1.127186 .064122 17.58 0.000 1.001509 1.252862 c | -.2351197 .0555132 -4.24 0.000 -.3439236 -.1263159 m | -6.620924 .4737127 -13.98 0.000 -7.549384 -5.692464 am | 4.125088 .4529445 9.11 0.000 3.237333 5.012843 cm | 3.224309 .1609812 20.03 0.000 2.908792 3.539826 _cons | 5.19207 .060879 85.29 0.000 5.072749 5.311391 ------------------------------------------------------------------------------ lrtest Glm: likelihood-ratio test chi2(1) = 187.38 Prob > chi2 = 0.0000Note that the Deviance and the Pearson are the same as the G2 and χ2, respectively in the table above.
Connection to Logistic Regression
We haven't begun our discussions of logistic regression yet, but this will serve as a demonstration that there is a relationship between logistic regression and loglinear regression models.
Logistic regression requires that we select one of the variables to be the response variable. Let's select m, marijuana.
use http://www.philender.com/courses/data/acm, clear logit m a c ac [fw=freq] Logit estimates Number of obs = 2276 LR chi2(3) = 843.83 Prob > chi2 = 0.0000 Log likelihood = -1127.7332 Pseudo R2 = 0.2723 ------------------------------------------------------------------------------ m | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- a | -3.778783 1.402386 -2.69 0.007 -6.527409 -1.030157 c | -3.454498 .9857381 -3.50 0.000 -5.386509 -1.522487 ac | .5895107 .9423638 0.63 0.532 -1.257488 2.43651 _cons | 7.170455 1.441154 4.98 0.000 4.345845 9.995064 ------------------------------------------------------------------------------The likelihood ratio chi-square in this model is the same as the a*c+m loglinear regression model above.
hsbdemo Example
Here is an example derived from the hsbdemo dataset, in which, write and read have been categorized by quantile and the data, along with female, contracted to frequencies.
clear input f w r n 0 1 1 18 0 1 2 9 0 1 3 5 0 1 4 2 0 2 1 5 0 2 2 5 0 2 3 8 0 2 4 4 0 3 2 4 0 3 3 5 0 3 4 9 0 4 2 1 0 4 3 7 0 4 4 9 1 1 1 10 1 1 2 5 1 1 3 1 1 2 1 14 1 2 2 13 1 2 3 6 1 2 4 2 1 3 1 5 1 3 2 5 1 3 3 10 1 3 4 6 1 4 2 7 1 4 3 10 1 4 4 15 end tab f w [fw=n], chi2 lr row tab f r [fw=n], chi2 lr row tab w r [fw=n], chi2 lr row ipf [fw=n], fit(f+w+r) /* mutual independence */ ipf [fw=n], fit(f*w+r) /* joint independence */ ipf [fw=n], fit(f*r+w) ipf [fw=n], fit(w*r+f) ipf [fw=n], fit(f*w+f*r) /* conditional independence */ ipf [fw=n], fit(w*f+w*r) ipf [fw=n], fit(r*w+r*f) ipf [fw=n], fit(f*w+f*r+w*r) /* homogeneous association */ ipf [fw=n], fit(f*w*r) /* fully saturated */ summary of models model df G2 X2 f+w+r 24 117.10 101.33 mutual independence f*w+r 21 102.99 85.56 joint independence f*r+w 21 115.44 102.94 joint independence w*r+f 15 33.08 29.39 joint independence f*w+f*r 18 101.32 84.14 conditional independence w*f+w*r 12 18.96 16.24 conditional independence (n.s.) r*w+r*f 12 31.42 28.03 conditional independence f*w+f*r+w*r 9 6.53 5.67 homogeneous association (n.s.) f*w*r 0 0.0 0.0 fully saturated xi3: poisson n i.w*i.f i.w*i.r fitstat, saving(m1) xi3: poisson n i.w*i.f i.w*i.r i.f*i.r fitstat, using(m1) Measures of Fit for poisson of n Current Saved Difference Model: poisson poisson N: 28 28 0 Log-Lik Intercept Only: -84.603 -84.603 0.000 Log-Lik Full Model: -52.752 -56.749 3.997 D: 105.504(6) 113.498(9) 7.994(3) LR: 63.702(21) 55.707(18) 7.994(3) Prob > LR: 0.000 0.000 0.046 McFadden's R2: 0.376 0.329 0.047 McFadden's Adj R2: 0.116 0.105 0.012 Maximum Likelihood R2: 0.897 0.863 0.034 Cragg & Uhler's R2: 0.899 0.865 0.034 AIC: 5.339 5.411 -0.071 AIC*n: 149.504 151.498 -1.994 BIC: 85.510 83.508 2.002 BIC': 6.275 4.272 2.002 Difference of 2.002 in BIC' provides positive support for saved model. Note: p-value for difference in LR is only valid if models are nested.
Categorical Data Analysis Course
Phil Ender