Consider the Following 4 Group Design:
Level a1
a2 a3 a4 Total
1
3
2
2
2
3
4
3
5
6
4
5
10
10
9
11
Mean 2.0 3.0 5.0 10.0 5.0
Dummy Coding
Dummy coded variables are also known as indicator variables.
input y grp d1 d2 d3 1 1 1 0 0 3 1 1 0 0 2 1 1 0 0 2 1 1 0 0 2 2 0 1 0 3 2 0 1 0 4 2 0 1 0 3 2 0 1 0 5 3 0 0 1 6 3 0 0 1 4 3 0 0 1 5 3 0 0 1 10 4 0 0 0 10 4 0 0 0 9 4 0 0 0 11 4 0 0 0 end tabstat y, by(grp) Summary for variables: y by categories of: grp grp | mean ---------+---------- 1 | 2 2 | 3 3 | 5 4 | 10 ---------+---------- Total | 5 -------------------- regress y d1 d2 d3 Source | SS df MS Number of obs = 16 ---------+------------------------------ F( 3, 12) = 76.00 Model | 152.00 3 50.6666667 Prob > F = 0.0000 Residual | 8.00 12 .666666667 R-squared = 0.9500 ---------+------------------------------ Adj R-squared = 0.9375 Total | 160.00 15 10.6666667 Root MSE = .8165 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- d1 | -8 .5773503 -13.856 0.000 -9.257938 -6.742062 d2 | -7 .5773503 -12.124 0.000 -8.257938 -5.742062 d3 | -5 .5773503 -8.660 0.000 -6.257938 -3.742062 _cons | 10 .4082483 24.495 0.000 9.110503 10.8895 ------------------------------------------------------------------------------Introduced in Stata 11, dummy coded factor variables can be generated for most estomation models.
regress y i.grp Source | SS df MS Number of obs = 16 -------------+------------------------------ F( 3, 12) = 76.00 Model | 152 3 50.6666667 Prob > F = 0.0000 Residual | 8 12 .666666667 R-squared = 0.9500 -------------+------------------------------ Adj R-squared = 0.9375 Total | 160 15 10.6666667 Root MSE = .8165 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- grp | 2 | 1 .5773503 1.73 0.109 -.2579382 2.257938 3 | 3 .5773503 5.20 0.000 1.742062 4.257938 4 | 8 .5773503 13.86 0.000 6.742062 9.257938 | _cons | 2 .4082483 4.90 0.000 1.110503 2.889497 ------------------------------------------------------------------------------ /* change reference group to grp 4 */ regress y ib4.grp Source | SS df MS Number of obs = 16 -------------+------------------------------ F( 3, 12) = 76.00 Model | 152 3 50.6666667 Prob > F = 0.0000 Residual | 8 12 .666666667 R-squared = 0.9500 -------------+------------------------------ Adj R-squared = 0.9375 Total | 160 15 10.6666667 Root MSE = .8165 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- grp | 1 | -8 .5773503 -13.86 0.000 -9.257938 -6.742062 2 | -7 .5773503 -12.12 0.000 -8.257938 -5.742062 3 | -5 .5773503 -8.66 0.000 -6.257938 -3.742062 | _cons | 10 .4082483 24.49 0.000 9.110503 10.8895 ------------------------------------------------------------------------------ /* anova treats all predictors as categorical unless otherwise indicated */ anova y grp Number of obs = 16 R-squared = 0.9500 Root MSE = .816497 Adj R-squared = 0.9375 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 152.00 3 50.6666667 76.00 0.0000 | grp | 152.00 3 50.6666667 76.00 0.0000 | Residual | 8.00 12 .666666667 -----------+---------------------------------------------------- Total | 160.00 15 10.6666667 regress Source | SS df MS Number of obs = 16 -------------+------------------------------ F( 3, 12) = 76.00 Model | 152 3 50.6666667 Prob > F = 0.0000 Residual | 8 12 .666666667 R-squared = 0.9500 -------------+------------------------------ Adj R-squared = 0.9375 Total | 160 15 10.6666667 Root MSE = .8165 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- grp | 2 | 1 .5773503 1.73 0.109 -.2579382 2.257938 3 | 3 .5773503 5.20 0.000 1.742062 4.257938 4 | 8 .5773503 13.86 0.000 6.742062 9.257938 | _cons | 2 .4082483 4.90 0.000 1.110503 2.889497 ------------------------------------------------------------------------------
Effect Coding
Effect coding is sometimes known as deviation coding.
input y grp e1 e2 e3 1 1 1 0 0 3 1 1 0 0 2 1 1 0 0 2 1 1 0 0 2 2 0 1 0 3 2 0 1 0 4 2 0 1 0 3 2 0 1 0 5 3 0 0 1 6 3 0 0 1 4 3 0 0 1 5 3 0 0 1 10 4 -1 -1 -1 10 4 -1 -1 -1 9 4 -1 -1 -1 11 4 -1 -1 -1 end regress y e1 e2 e3 Source | SS df MS Number of obs = 16 ---------+------------------------------ F( 3, 12) = 76.00 Model | 152.00 3 50.6666667 Prob > F = 0.0000 Residual | 8.00 12 .666666667 R-squared = 0.9500 ---------+------------------------------ Adj R-squared = 0.9375 Total | 160.00 15 10.6666667 Root MSE = .8165 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- e1 | -3 .3535534 -8.485 0.000 -3.770327 -2.229673 e2 | -2 .3535534 -5.657 0.000 -2.770327 -1.229673 e3 | 0 .3535534 0.000 1.000 -.7703266 .7703266 _cons | 5 .2041241 24.495 0.000 4.555252 5.444748 ------------------------------------------------------------------------------ test e1 e2 e3 ( 1) e1 = 0 ( 2) e2 = 0 ( 3) e3 = 0 F( 3, 12) = 76.00 Prob > F = 0.0000
Orthogonal Coding
Example Using Orthogonal Coding
input y grp x1 x2 x3 1 1 1 1 1 3 1 1 1 1 2 1 1 1 1 2 1 1 1 1 2 2 -1 1 1 3 2 -1 1 1 4 2 -1 1 1 3 2 -1 1 1 5 3 0 -2 1 6 3 0 -2 1 4 3 0 -2 1 5 3 0 -2 1 10 4 0 0 -3 10 4 0 0 -3 9 4 0 0 -3 11 4 0 0 -3 end table grp, contents(freq mean y sd y) ---------------------------------------------- grp | Freq. mean(y) sd(y) ----------+----------------------------------- 1 | 4 2 .8164966 2 | 4 3 .8164966 3 | 4 5 .8164966 4 | 4 10 .8164966 ---------------------------------------------- corr x1 x2 x3 (obs=16) | x1 x2 x3 -------------+--------------------------- x1 | 1.0000 x2 | 0.0000 1.0000 x3 | 0.0000 0.0000 1.0000 Anova anova y grp Number of obs = 16 R-squared = 0.9500 Root MSE = .816497 Adj R-squared = 0.9375 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 152.00 3 50.6666667 76.00 0.0000 | grp | 152.00 3 50.6666667 76.00 0.0000 | Residual | 8.00 12 .666666667 -----------+---------------------------------------------------- Total | 160.00 15 10.6666667 Regression Analysis Using Orthogonal Coding regress y x1 x2 x3 Source | SS df MS Number of obs = 16 ---------+------------------------------ F( 3, 12) = 76.00 Model | 152.00 3 50.6666667 Prob > F = 0.0000 Residual | 8.00 12 .666666667 R-squared = 0.9500 ---------+------------------------------ Adj R-squared = 0.9375 Total | 160.00 15 10.6666667 Root MSE = .8165 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x1 | -.5 .2886751 -1.732 0.109 -1.128969 .1289691 x2 | -.8333333 .1666667 -5.000 0.000 -1.196469 -.4701979 x3 | -1.666667 .1178511 -14.142 0.000 -1.923442 -1.409891 _cons | 5 .2041241 24.495 0.000 4.555252 5.444748 ------------------------------------------------------------------------------ test x1 x2 x3 ( 1) x1 = 0 ( 2) x2 = 0 ( 3) x3 = 0 F( 3, 12) = 76.00 Prob > F = 0.0000
Orthogonal Coding Schema
Grp X1 X2 X3 X4 X5 X6 X7 X8 X9 1 1 1 1 1 1 1 1 1 1 2 -1 1 1 1 1 1 1 1 1 3 0 -2 1 1 1 1 1 1 1 4 0 0 -3 1 1 1 1 1 1 5 0 0 0 -4 1 1 1 1 1 6 0 0 0 0 -5 1 1 1 1 7 0 0 0 0 0 -6 1 1 1 8 0 0 0 0 0 0 -7 1 1 9 0 0 0 0 0 0 0 -8 1 10 0 0 0 0 0 0 0 0 -9
Linear Statistical Models Course
Phil Ender, 17sep10, 21Feb02, 17Mar98