So Far...
Example Using hsbdemo
We will look at a model that uses write as the response variable and female and prog as predictors.
use http://www.philender.com/courses/data/hsbdemo, clear tab1 female prog -> tabulation of female female | Freq. Percent Cum. ------------+----------------------------------- male | 91 45.50 45.50 female | 109 54.50 100.00 ------------+----------------------------------- Total | 200 100.00 -> tabulation of prog type of | program | Freq. Percent Cum. ------------+----------------------------------- general | 45 22.50 22.50 academic | 105 52.50 75.00 vocation | 50 25.00 100.00 ------------+----------------------------------- Total | 200 100.00 table prog female, cont(mean write sd write freq) ------------------------------ type of | female program | male female ----------+------------------- general | 49.14286 53.25 | 10.36478 8.205248 | 21 24 | academic | 54.61702 57.58621 | 8.656622 7.115672 | 47 58 | vocation | 41.82609 50.96296 | 8.003705 8.341193 | 23 27 ------------------------------ /* model 2 -- no interaction */ anova write female prog Number of obs = 200 R-squared = 0.2408 Root MSE = 8.32211 Adj R-squared = 0.2291 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 4304.40272 3 1434.80091 20.72 0.0000 | female | 1128.70487 1 1128.70487 16.30 0.0001 prog | 3128.18888 2 1564.09444 22.58 0.0000 | Residual | 13574.4723 196 69.2575116 -----------+---------------------------------------------------- Total | 17878.875 199 89.843593 regress write i.female i.prog Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 20.72 Model | 4304.40272 3 1434.80091 Prob > F = 0.0000 Residual | 13574.4723 196 69.2575116 R-squared = 0.2408 -------------+------------------------------ Adj R-squared = 0.2291 Total | 17878.875 199 89.843593 Root MSE = 8.3221 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.female | 4.771211 1.181876 4.04 0.000 2.440385 7.102037 | prog | 2 | 4.832929 1.482956 3.26 0.001 1.908331 7.757528 3 | -4.605141 1.710049 -2.69 0.008 -7.9776 -1.232683 | _cons | 48.78869 1.391537 35.06 0.000 46.04438 51.533 ------------------------------------------------------------------------------ test 1.female ( 1) 1.female = 0 F( 1, 196) = 16.30 Prob > F = 0.0001 testparm i.prog ( 1) 2.prog = 0 ( 2) 3.prog = 0 F( 2, 196) = 22.58 Prob > F = 0.0000 /* model 2 -- interaction */ anova write female prog female#prog Number of obs = 200 R-squared = 0.2590 Root MSE = 8.26386 Adj R-squared = 0.2399 Source | Partial SS df MS F Prob > F ------------+---------------------------------------------------- Model | 4630.36091 5 926.072182 13.56 0.0000 | female | 1261.85329 1 1261.85329 18.48 0.0000 prog | 3274.35082 2 1637.17541 23.97 0.0000 female#prog | 325.958189 2 162.979094 2.39 0.0946 | Residual | 13248.5141 194 68.2913097 ------------+---------------------------------------------------- Total | 17878.875 199 89.843593 regress write i.female##i.prog Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 5, 194) = 13.56 Model | 4630.36091 5 926.072182 Prob > F = 0.0000 Residual | 13248.5141 194 68.2913097 R-squared = 0.2590 -------------+------------------------------ Adj R-squared = 0.2399 Total | 17878.875 199 89.843593 Root MSE = 8.2639 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.female | 4.107143 2.469299 1.66 0.098 -.7629757 8.977261 | prog | 2 | 5.474164 2.169095 2.52 0.012 1.196128 9.7522 3 | -7.31677 2.494224 -2.93 0.004 -12.23605 -2.397493 | female#prog | 1 2 | -1.137957 2.954299 -0.39 0.701 -6.964625 4.68871 1 3 | 5.029733 3.40528 1.48 0.141 -1.686391 11.74586 | _cons | 49.14286 1.803321 27.25 0.000 45.58623 52.69949 ------------------------------------------------------------------------------ test _IfemXpro_1_2 _IfemXpro_1_3 ( 1) 1.female#2.prog = 0 ( 2) 1.female#3.prog = 0 F( 2, 194) = 2.39 Prob > F = 0.0946 test 1.female ( 1) 1.female = 0 F( 1, 194) = 2.77 Prob > F = 0.0979 testparm i.prog ( 1) 2.prog = 0 ( 2) 3.prog = 0 F( 2, 194) = 18.69 Prob > F = 0.0000Please note: With dummy coding the tests of the highest order interaction is the same as that using anova. However, the tests of the main effects will not be the same as anova. We will need to use a different approach, such as, the anovalator program (findit anovalator).
anovalator female prog, main fratio anovalator main-effect for female chi2(1) = 18.477509 p-value = .00001719 scaled as F-ratio = 18.477509 anovalator main-effect for prog chi2(2) = 47.946815 p-value = 3.877e-11 scaled as F-ratio = 23.973408Some examples of 2x2 interactions
Consider the following 2x2 table of cell means and the regression results. The categorical predictors are A and B, each with two levels. The first example will not have a significant interaction effect.
A0 | A1 | |
---|---|---|
B0 | 50.02 | 55.09 |
B1 | 54.61 | 60.09 |
regress y1 a##b Source | SS df MS Number of obs = 40 -------------+------------------------------ F( 3, 36) = 185.27 Model | 507.825013 3 169.275004 Prob > F = 0.0000 Residual | 32.8925799 36 .913682774 R-squared = 0.9392 -------------+------------------------------ Adj R-squared = 0.9341 Total | 540.717593 39 13.8645537 Root MSE = .95587 ------------------------------------------------------------------------------ y1 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.a | 5.068859 .427477 11.86 0.000 4.201896 5.935823 1.b | 4.582972 .427477 10.72 0.000 3.716009 5.449936 | a#b | 1 1 | .410215 .6045437 0.68 0.502 -.8158565 1.636286 | _cons | 50.02439 .3022719 165.49 0.000 49.41135 50.63743 ------------------------------------------------------------------------------Let's interpret this regression table.
Next, let's try an example in which the interaction term is significant and positive.
A0 | A1 | |
---|---|---|
B0 | 50.25 | 54.73 |
B1 | 55.10 | 65.08 |
regress y2 a##b Source | SS df MS Number of obs = 40 -------------+------------------------------ F( 3, 36) = 753.03 Model | 1175.55141 3 391.850469 Prob > F = 0.0000 Residual | 18.7332456 36 .520367934 R-squared = 0.9843 -------------+------------------------------ Adj R-squared = 0.9830 Total | 1194.28465 39 30.6226834 Root MSE = .72137 ------------------------------------------------------------------------------ y2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.a | 4.485349 .3226044 13.90 0.000 3.831077 5.139621 1.b | 4.848378 .3226044 15.03 0.000 4.194106 5.50265 | a#b | 1 1 | 5.494711 .4562315 12.04 0.000 4.569431 6.419991 | _cons | 50.24988 .2281157 220.28 0.000 49.78724 50.71252 ------------------------------------------------------------------------------Let's interpret this regression table for y2.
Finally, we will run a model in which the interaction coefficient is negative and statistically significant.
A0 | A1 | |
---|---|---|
B0 | 50.33 | 55.21 |
B1 | 54.76 | 55.43 |
regress y3 a##b Source | SS df MS Number of obs = 40 -------------+------------------------------ F( 3, 36) = 33.39 Model | 175.137941 3 58.3793136 Prob > F = 0.0000 Residual | 62.938775 36 1.74829931 R-squared = 0.7356 -------------+------------------------------ Adj R-squared = 0.7136 Total | 238.076716 39 6.10453118 Root MSE = 1.3222 ------------------------------------------------------------------------------ y3 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.a | 4.879437 .5913204 8.25 0.000 3.680183 6.07869 1.b | 4.427636 .5913204 7.49 0.000 3.228383 5.626889 | a#b | 1 1 | -4.213134 .8362534 -5.04 0.000 -5.909134 -2.517133 | _cons | 50.33252 .4181267 120.38 0.000 49.48452 51.18052 ------------------------------------------------------------------------------Let's interpret this regression table for y3.
We can see from the tables and regression results that the coefficient for the interaction term specifies the amount that is added to or subtracted from the sum of the coefficients for A and B plus the constant.
Linear Statistical Models Course
Phil Ender, 24sep10, 18dec99