Dichotomous Variables
Interpreting Coefficients
Level a1
a2 Total
1
3
2
2
2
3
4
3
5
6
4
5
10
10
9
11
Mean 2.5 7.5 5.0
Example Using Dummy Coding
input y grp x1 x2 x3 x4 onetwo 1 1 1 0 1 326 1 3 1 1 0 1 326 1 2 1 1 0 1 326 1 2 1 1 0 1 326 1 2 1 1 0 1 326 1 3 1 1 0 1 326 1 4 1 1 0 1 326 1 3 1 1 0 1 326 1 5 2 0 1 -1 -11814 2 6 2 0 1 -1 -11814 2 4 2 0 1 -1 -11814 2 5 2 0 1 -1 -11814 2 10 2 0 1 -1 -11814 2 10 2 0 1 -1 -11814 2 9 2 0 1 -1 -11814 2 11 2 0 1 -1 -11814 2 end regress y grp, beta Source | SS df MS Number of obs = 16 ---------+------------------------------ F( 1, 14) = 23.33 Model | 100.00 1 100.00 Prob > F = 0.0003 Residual | 60.00 14 4.28571429 R-squared = 0.6250 ---------+------------------------------ Adj R-squared = 0.5982 Total | 160.00 15 10.6666667 Root MSE = 2.0702 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- grp | 5 1.035098 4.830 0.000 .7905694 _cons | -2.5 1.636634 -1.528 0.149 . ------------------------------------------------------------------------------ regress y x1, beta Source | SS df MS Number of obs = 16 ---------+------------------------------ F( 1, 14) = 23.33 Model | 100.00 1 100.00 Prob > F = 0.0003 Residual | 60.00 14 4.28571429 R-squared = 0.6250 ---------+------------------------------ Adj R-squared = 0.5982 Total | 160.00 15 10.6666667 Root MSE = 2.0702 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- x1 | -5 1.035098 -4.830 0.000 -.7905694 _cons | 7.5 .7319251 10.247 0.000 . ------------------------------------------------------------------------------ regress y x2, beta Source | SS df MS Number of obs = 16 ---------+------------------------------ F( 1, 14) = 23.33 Model | 100.00 1 100.00 Prob > F = 0.0003 Residual | 60.00 14 4.28571429 R-squared = 0.6250 ---------+------------------------------ Adj R-squared = 0.5982 Total | 160.00 15 10.6666667 Root MSE = 2.0702 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- x2 | 5 1.035098 4.830 0.000 .7905694 _cons | 2.5 .7319251 3.416 0.004 . ------------------------------------------------------------------------------ regress y x3, beta Source | SS df MS Number of obs = 16 ---------+------------------------------ F( 1, 14) = 23.33 Model | 100.00 1 100.00 Prob > F = 0.0003 Residual | 60.00 14 4.28571429 R-squared = 0.6250 ---------+------------------------------ Adj R-squared = 0.5982 Total | 160.00 15 10.6666667 Root MSE = 2.0702 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- x3 | -2.5 .5175492 -4.830 0.000 -.7905694 _cons | 5 .5175492 9.661 0.000 . ------------------------------------------------------------------------------ regress y x4, beta Source | SS df MS Number of obs = 16 ---------+------------------------------ F( 1, 14) = 23.33 Model | 100.00 1 100.00 Prob > F = 0.0003 Residual | 60.00 14 4.28571429 R-squared = 0.6250 ---------+------------------------------ Adj R-squared = 0.5982 Total | 160.00 15 10.6666667 Root MSE = 2.0702 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- x4 | -.0004119 .0000853 -4.830 0.000 -.7905694 _cons | 2.634267 .7125415 3.697 0.002 . ------------------------------------------------------------------------------ regress y x1 x2, beta Source | SS df MS Number of obs = 16 ---------+------------------------------ F( 1, 14) = 23.33 Model | 100.00 1 100.00 Prob > F = 0.0003 Residual | 60.00 14 4.28571429 R-squared = 0.6250 ---------+------------------------------ Adj R-squared = 0.5982 Total | 160.00 15 10.6666667 Root MSE = 2.0702 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- x1 | -5 1.035098 -4.830 0.000 -.7905694 x2 | (dropped) _cons | 7.5 .7319251 10.247 0.000 . ------------------------------------------------------------------------------
Well, why not just use 1's and 2's, why all this 0/1 or 1/-1 coding.
regress y onetwo Source | SS df MS Number of obs = 16 -------------+------------------------------ F( 1, 14) = 23.33 Model | 100 1 100 Prob > F = 0.0003 Residual | 60 14 4.28571429 R-squared = 0.6250 -------------+------------------------------ Adj R-squared = 0.5982 Total | 160 15 10.6666667 Root MSE = 2.0702 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- onetwo | 5 1.035098 4.83 0.000 2.779935 7.220065 _cons | -2.5 1.636634 -1.53 0.149 -6.010231 1.010231 ------------------------------------------------------------------------------
As you can see, the coefficient for the groups is the same as for dummy coding. However, the constant is not as informative since it represents the mean for the group coded zero. A group that does not, in fact, exist. In this respect, dummy coding is much more informative.
Automatic Dummy Coding
Stata introduced factor variables in Stata 11, which allow for the automatic coding of dummy variables. It is also easy to change the reference group when using factor variables.
regress y i.grp Source | SS df MS Number of obs = 16 -------------+------------------------------ F( 1, 14) = 23.33 Model | 100 1 100 Prob > F = 0.0003 Residual | 60 14 4.28571429 R-squared = 0.6250 -------------+------------------------------ Adj R-squared = 0.5982 Total | 160 15 10.6666667 Root MSE = 2.0702 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 2.grp | 5 1.035098 4.83 0.000 2.779935 7.220065 _cons | 2.5 .7319251 3.42 0.004 .9301769 4.069823 ------------------------------------------------------------------------------ /* changing the reference grp */ regress y ib2.grp Source | SS df MS Number of obs = 16 -------------+------------------------------ F( 1, 14) = 23.33 Model | 100 1 100 Prob > F = 0.0003 Residual | 60 14 4.28571429 R-squared = 0.6250 -------------+------------------------------ Adj R-squared = 0.5982 Total | 160 15 10.6666667 Root MSE = 2.0702 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.grp | -5 1.035098 -4.83 0.000 -7.220065 -2.779935 _cons | 7.5 .7319251 10.25 0.000 5.930177 9.069823 ------------------------------------------------------------------------------
Linear Statistical Models Course
Phil Ender, 17sep10, 11Feb99