Linear Statistical Models: Regression

Dichotomous Variables

Updated for Stata 11


Dichotomous Variables

  • A categorical variable with two levels.
  • Observations can be classed into two groups; male/female, group 1/group2, true/false, yes/no, etc.
  • Can use 1/0, 1/-1 or any coding system that uses two different values even 1/2 (see below).

    Interpreting Coefficients

  • Dummy Coding
  • Effect Coding Consider the Following Two Group Design:

    Levela1 a2Total
    1
    3
    2
    2
    2
    3
    4
    3
    5
    6
    4
    5
    10
    10
    9
    11
    Mean2.57.55.0

    Example Using Dummy Coding

    input y  grp x1 x2 x3 x4 onetwo
     1   1  1  0   1   326   1
     3   1  1  0   1   326   1
     2   1  1  0   1   326   1
     2   1  1  0   1   326   1
     2   1  1  0   1   326   1
     3   1  1  0   1   326   1
     4   1  1  0   1   326   1
     3   1  1  0   1   326   1
     5   2  0  1  -1 -11814  2
     6   2  0  1  -1 -11814  2
     4   2  0  1  -1 -11814  2
     5   2  0  1  -1 -11814  2
    10   2  0  1  -1 -11814  2
    10   2  0  1  -1 -11814  2
     9   2  0  1  -1 -11814  2
    11   2  0  1  -1 -11814  2
    end
    
    regress y grp, beta
    
      Source |       SS       df       MS                  Number of obs =      16
    ---------+------------------------------               F(  1,    14) =   23.33
       Model |      100.00     1      100.00               Prob > F      =  0.0003
    Residual |       60.00    14  4.28571429               R-squared     =  0.6250
    ---------+------------------------------               Adj R-squared =  0.5982
       Total |      160.00    15  10.6666667               Root MSE      =  2.0702
    
    ------------------------------------------------------------------------------
           y |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
         grp |          5   1.035098      4.830   0.000                   .7905694
       _cons |       -2.5   1.636634     -1.528   0.149                          .
    ------------------------------------------------------------------------------
    
    regress y x1, beta
    
      Source |       SS       df       MS                  Number of obs =      16
    ---------+------------------------------               F(  1,    14) =   23.33
       Model |      100.00     1      100.00               Prob > F      =  0.0003
    Residual |       60.00    14  4.28571429               R-squared     =  0.6250
    ---------+------------------------------               Adj R-squared =  0.5982
       Total |      160.00    15  10.6666667               Root MSE      =  2.0702
    
    ------------------------------------------------------------------------------
           y |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
          x1 |         -5   1.035098     -4.830   0.000                  -.7905694
       _cons |        7.5   .7319251     10.247   0.000                          .
    ------------------------------------------------------------------------------
    
    regress y x2, beta
    
      Source |       SS       df       MS                  Number of obs =      16
    ---------+------------------------------               F(  1,    14) =   23.33
       Model |      100.00     1      100.00               Prob > F      =  0.0003
    Residual |       60.00    14  4.28571429               R-squared     =  0.6250
    ---------+------------------------------               Adj R-squared =  0.5982
       Total |      160.00    15  10.6666667               Root MSE      =  2.0702
    
    ------------------------------------------------------------------------------
           y |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
          x2 |          5   1.035098      4.830   0.000                   .7905694
       _cons |        2.5   .7319251      3.416   0.004                          .
    ------------------------------------------------------------------------------
    
    regress y x3, beta
    
      Source |       SS       df       MS                  Number of obs =      16
    ---------+------------------------------               F(  1,    14) =   23.33
       Model |      100.00     1      100.00               Prob > F      =  0.0003
    Residual |       60.00    14  4.28571429               R-squared     =  0.6250
    ---------+------------------------------               Adj R-squared =  0.5982
       Total |      160.00    15  10.6666667               Root MSE      =  2.0702
    
    ------------------------------------------------------------------------------
           y |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
          x3 |       -2.5   .5175492     -4.830   0.000                  -.7905694
       _cons |          5   .5175492      9.661   0.000                          .
    ------------------------------------------------------------------------------
    
    regress y x4, beta
    
      Source |       SS       df       MS                  Number of obs =      16
    ---------+------------------------------               F(  1,    14) =   23.33
       Model |      100.00     1      100.00               Prob > F      =  0.0003
    Residual |       60.00    14  4.28571429               R-squared     =  0.6250
    ---------+------------------------------               Adj R-squared =  0.5982
       Total |      160.00    15  10.6666667               Root MSE      =  2.0702
    
    ------------------------------------------------------------------------------
           y |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
          x4 |  -.0004119   .0000853     -4.830   0.000                  -.7905694
       _cons |   2.634267   .7125415      3.697   0.002                          .
    ------------------------------------------------------------------------------
    
    regress y x1 x2, beta
    
      Source |       SS       df       MS                  Number of obs =      16
    ---------+------------------------------               F(  1,    14) =   23.33
       Model |      100.00     1      100.00               Prob > F      =  0.0003
    Residual |       60.00    14  4.28571429               R-squared     =  0.6250
    ---------+------------------------------               Adj R-squared =  0.5982
       Total |      160.00    15  10.6666667               Root MSE      =  2.0702
    
    ------------------------------------------------------------------------------
           y |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
          x1 |         -5   1.035098     -4.830   0.000                  -.7905694
          x2 |  (dropped)
       _cons |        7.5   .7319251     10.247   0.000                          .
    ------------------------------------------------------------------------------
    

    Well, why not just use 1's and 2's, why all this 0/1 or 1/-1 coding.

    regress y onetwo
    
          Source |       SS       df       MS              Number of obs =      16
    -------------+------------------------------           F(  1,    14) =   23.33
           Model |         100     1         100           Prob > F      =  0.0003
        Residual |          60    14  4.28571429           R-squared     =  0.6250
    -------------+------------------------------           Adj R-squared =  0.5982
           Total |         160    15  10.6666667           Root MSE      =  2.0702
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          onetwo |          5   1.035098     4.83   0.000     2.779935    7.220065
           _cons |       -2.5   1.636634    -1.53   0.149    -6.010231    1.010231
    ------------------------------------------------------------------------------

    As you can see, the coefficient for the groups is the same as for dummy coding. However, the constant is not as informative since it represents the mean for the group coded zero. A group that does not, in fact, exist. In this respect, dummy coding is much more informative.

    Automatic Dummy Coding

    Stata introduced factor variables in Stata 11, which allow for the automatic coding of dummy variables. It is also easy to change the reference group when using factor variables.

    regress y i.grp
    
          Source |       SS       df       MS              Number of obs =      16
    -------------+------------------------------           F(  1,    14) =   23.33
           Model |         100     1         100           Prob > F      =  0.0003
        Residual |          60    14  4.28571429           R-squared     =  0.6250
    -------------+------------------------------           Adj R-squared =  0.5982
           Total |         160    15  10.6666667           Root MSE      =  2.0702
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           2.grp |          5   1.035098     4.83   0.000     2.779935    7.220065
           _cons |        2.5   .7319251     3.42   0.004     .9301769    4.069823
    ------------------------------------------------------------------------------
    
    /* changing the reference grp */
    
    regress y ib2.grp
    
          Source |       SS       df       MS              Number of obs =      16
    -------------+------------------------------           F(  1,    14) =   23.33
           Model |         100     1         100           Prob > F      =  0.0003
        Residual |          60    14  4.28571429           R-squared     =  0.6250
    -------------+------------------------------           Adj R-squared =  0.5982
           Total |         160    15  10.6666667           Root MSE      =  2.0702
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           1.grp |         -5   1.035098    -4.83   0.000    -7.220065   -2.779935
           _cons |        7.5   .7319251    10.25   0.000     5.930177    9.069823
    ------------------------------------------------------------------------------


    Linear Statistical Models Course

    Phil Ender, 17sep10, 11Feb99