The linear models for anova and regression look a little bit different from each other. How might they be related? Let's begin by looking at each model.
Linear Model for Anova
use http://www.philender.com/courses/data/hsb2, clear tabstat write, by(female) stat(n mean sd) Summary for variables: write by categories of: female female | N mean sd -------+------------------------------ male | 91 50.12088 10.30516 female | 109 54.99083 8.133715 -------+------------------------------ Total | 200 52.775 9.478586 -------------------------------------- anova write female Number of obs = 200 R-squared = 0.0658 Root MSE = 9.1846 Adj R-squared = 0.0611 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 1176.21384 1 1176.21384 13.94 0.0002 | female | 1176.21384 1 1176.21384 13.94 0.0002 | Residual | 16702.6612 198 84.3568745 -----------+---------------------------------------------------- Total | 17878.875 199 89.843593 predict e1, residIn the anova model, μ is the grand mean and is equal to 52.775. The αj's are the treatment effects for being in the jth group. They are the difference between the gran mean and the mean of group j. In this example the treatment effects are:
50.12088 - 52.775 = -2.65412 for males
54.99083 - 52.775 = 2.21583 for females.
Next, let's run a regression using a manually generated orthogonal coding.
generate oc = 91 if female==1 replace oc=-109 if female==0 tab oc oc | Freq. Percent Cum. ------------+----------------------------------- -109 | 91 45.50 45.50 91 | 109 54.50 100.00 ------------+----------------------------------- Total | 200 100.00 regress write oc Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 13.94 Model | 1176.21384 1 1176.21384 Prob > F = 0.0002 Residual | 16702.6612 198 84.3568745 R-squared = 0.0658 -------------+------------------------------ Adj R-squared = 0.0611 Total | 17878.875 199 89.843593 Root MSE = 9.1846 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- oc | .0243497 .006521 3.73 0.000 .0114903 .0372092 _cons | 52.775 .6494493 81.26 0.000 51.49427 54.05573 ------------------------------------------------------------------------------You can see the the constant in this model is equal to the grand mean, therefore,
predict e2, residFirst, the constant in the regression analysis is equal to the grand mean, so b0 in the regression model is equal to μ in the anova model allowing us to simplify the equation above.
summarize e1 e2 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- e1 | 200 3.90e-08 9.161494 -19.99083 16.87912 e2 | 200 3.90e-08 9.161494 -19.99083 16.87912 compare e1 e2 ---------- difference ---------- count minimum average maximum ------------------------------------------------------------------------ e1=e2 200 ---------- jointly defined 200 0 0 0 ---------- total 200Since ei equals εi(j) and we can again simplify the equation, leaving us with.
.0243497 * -109 = -2.65412 for males
.0243497 * 91 = 2.215823 for females
Please note that the equivalence between b1X and αj does not hold for dummy coding. For the dummy coded model it is true that,
Linear Statistical Models Course
Phil Ender, 17sep10, 31dec04