Ed230B/C

Linear Statistical Models

Analysis of Covariance

Updated for Stata 11


Analysis of Covariance

  • Controling for an unwanted nuisance variable -- statistical control.
  • An alternative to blocking for controling extraneous sources of variability.
  • In ANOVA terminology, a covariate is a continuous independent variable.

    Linear Model

    Hypotheses

    Assumptions

    1. Independence.
    2. Normality.
    3. Homogeneity of Variance.
    4. Population within-group regression coefficients are equal. Homogeneity of regression coefficients.

    5. Regression residuals are NID with mean 0 and equal variances.
    6. Relationship between the covariate and the dependent variable is linear.
    7. Covariate is measured without error.
    8. * Covariate is related to the dependent variable but is independent of the treatment.

    Selecting a Covariate

    1. One or more extraneous variables which effect the dependent variable but are irrelevant to the objectives of the experiment.
    2. Experimental control is not possible or not feasible.
    3. Covariate is independent of the categorical independent variable
      1. collected prior to the presentation of the treatments.
      2. collected after treatments but before they take effect.
      3. assume treatment is not affected by the covariate.

    Schematic with Example Data

    a1a2a3a4
    Y   XY   XY   XY   X
    3   42
    6   57
    3   33
    3   47
    1   32
    2   35
    2   33
    2   39
    4   47
    5   49
    4   42
    3   41
    2   38
    3   43
    4   48
    3   45
    7   61
    8   65
    7   64
    6   56
    5   52
    6   58
    5   53
    6   54
    7   65
    8   74
    9   80
    8   73
    10   85
    10   82
    9   78
    11   89

    ANCOVA Summary Table

    SourceSS   dfMSFError Term
    1Covariate33.950133.950130.09[3]
    2A1.79330.5982.29[3]
    3Error7.047270.261
    Adj Total8.84030
    Grand Total235.50031

    Compare with this ANOVA Summary Table

    SourceSSdfMSF
    A194.5364.83344.28
    Error41.0281.464
    Total235.531

    Table of the F-distribution

    Comparing ANCOVA with Randomized Block Designs

  • Inspect correlation between covariate and the dependent variable.
  • RB better when r < 0.4
  • ANCOVA and RB about equal when .4 < r < .6
  • ANCOVA better when r > .6

    Some Stata Tricks

    One Factor Design with one Covariate:
    anova y aanalysis of variance
    anova y x aanalysis of covariance
    anova y x a x*atests homogeneity of slopes
    Two Factor Design with One Covariate:
    anova y a b a*banalysis of variance
    anova y x a b a*banalysis of covariance
    anova y x a b a*b x*a*btests homogeneity of slopes
    One Factor Design with Two Covariates:
    anova y a analysis of variance
    anova y x z a analysis of covariance
    anova y x a x*ahomogeneity of x slopes
    anova y z a z*ahomogeneity of z slopes
    Two Factor Design with Two Covariates:
    anova y a b a*banalysis of variance
    anova y x z a b a*banalysis of covariance
    anova y x a b a*b x*a*bhomogeneity of x slopes
    anova y z a b a*b z*a*bhomogeneity of z slopes
    Note: Don't forget the cont option in the ancova

    Stata Example

    input x y a x1 x2 x3
    42  3 1  1  1  1
    57  6 1  1  1  1
    33  3 1  1  1  1
    47  3 1  1  1  1
    32  1 1  1  1  1
    35  2 1  1  1  1
    33  2 1  1  1  1
    39  2 1  1  1  1
    47  4 2 -1  1  1
    49  5 2 -1  1  1
    42  4 2 -1  1  1
    41  3 2 -1  1  1
    38  2 2 -1  1  1
    43  3 2 -1  1  1
    48  4 2 -1  1  1
    45  3 2 -1  1  1
    61  7 3  0 -2  1
    65  8 3  0 -2  1
    64  7 3  0 -2  1
    56  6 3  0 -2  1
    52  5 3  0 -2  1
    58  6 3  0 -2  1
    53  5 3  0 -2  1
    54  6 3  0 -2  1
    65  7 4  0  0 -3
    74  8 4  0  0 -3
    80  9 4  0  0 -3
    73  8 4  0  0 -3
    85 10 4  0  0 -3
    82 10 4  0  0 -3
    78  9 4  0  0 -3
    89 11 4  0  0 -3
    end
    
    anova y a c.x
    
                               Number of obs =      32     R-squared     =  0.9701
                               Root MSE      = .510876     Adj R-squared =  0.9656
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  228.453154     4  57.1132885     218.83     0.0000
                             |
                           a |  1.79283521     3  .597611737       2.29     0.1010
                           x |  33.9531542     1  33.9531542     130.09     0.0000
                             |
                    Residual |  7.04684582    27   .26099429   
                  -----------+----------------------------------------------------
                       Total |       235.5    31  7.59677419 
    
    margins a, asbalanced
    
    Predictive margins                                Number of obs   =         32
    
    Expression   : Linear prediction, predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               a |
              1  |   5.310127   .2881078    18.43   0.000     4.745446    5.874807
              2  |   5.325664   .2413402    22.07   0.000     4.852646    5.798682
              3  |   5.767353   .1855126    31.09   0.000     5.403755    6.130951
              4  |   5.096856   .3869503    13.17   0.000     4.338448    5.855265
    ------------------------------------------------------------------------------
    
    /* pairwise comparisons using anovalator */
    /* these tests have not been adjusted for multiplicity */
    
    anovalator a, pair quietly
    
    anovalator pairwise comparisons for a  
    
    Comparison          Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    1 vs 2          -.0155375     .26343    -.059   0.953    -.5318595    .5007846
    1 vs 3           -.457227    .369347    -1.24   0.216    -1.181147    .2666942
    1 vs 4             .21327    .621578     .343   0.732    -1.005023    1.431564
    2 vs 3           -.441689    .325894    -1.36   0.175    -1.080441    .1970623
    2 vs 4            .228808    .563495     .406   0.685    -.8756423    1.333258
    3 vs 4            .670497    .393934      1.7   0.089    -.1016129    1.442607
    
    
    
    /* test for homogeneity of regression slopes */
    anova y a c.x a#c.x
    
                               Number of obs =      32     R-squared     =  0.9719
                               Root MSE      = .525009     Adj R-squared =  0.9637
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  228.884782     7  32.6978259     118.63     0.0000
                             |
                           a |  .355072259     3   .11835742       0.43     0.7338
                           x |  25.8488494     1  25.8488494      93.78     0.0000
                         a#x |  .431627333     3  .143875778       0.52     0.6713
                             |
                    Residual |  6.61521849    24  .275634104   
                  -----------+----------------------------------------------------
                       Total |       235.5    31  7.59677419 

    Stata Example Continued

    regress y x x1 x2 x3
    
      Source |       SS       df       MS                  Number of obs =      32
    ---------+------------------------------               F(  4,    27) =  218.83
       Model |  228.453154     4  57.1132885               Prob > F      =  0.0000
    Residual |  7.04684582    27   .26099429               R-squared     =  0.9701
    ---------+------------------------------               Adj R-squared =  0.9656
       Total |      235.50    31  7.59677419               Root MSE      =  .51088
       
    [remainder of output omitted]
    
    regress y x 
    
      Source |       SS       df       MS                  Number of obs =      32
    ---------+------------------------------               F(  1,    30) =  769.24
       Model |  226.660319     1  226.660319               Prob > F      =  0.0000
    Residual |  8.83968103    30  .294656034               R-squared     =  0.9625
    ---------+------------------------------               Adj R-squared =  0.9612
       Total |      235.50    31  7.59677419               Root MSE      =  .54282
    
    [remainder of output omitted]
    
    regress y x1 x2 x3
    
      Source |       SS       df       MS                  Number of obs =      32
    ---------+------------------------------               F(  3,    28) =   44.28
       Model |      194.50     3  64.8333333               Prob > F      =  0.0000
    Residual |       41.00    28  1.46428571               R-squared     =  0.8259
    ---------+------------------------------               Adj R-squared =  0.8072
       Total |      235.50    31  7.59677419               Root MSE      =  1.2101
    
    [remainder of output omitted]
    
    Regression Results Summarized
    
    Model: M0     R-square       0.9701
    Model: M1     R-square       0.9625
    Model: M2     R-square       0.8259
    

    F-ratios Using Regression

    with 1 and 27 degrees of freedom

    with 3 and 27 degrees of freedom

    Example with Two Covariates

    input id  y  c1  c2 grp
     1  6   1   6   1      
     2  9   1   7   1  
     3  8   2  15   1   
     4  8   3  13   1   
     5 12   3  18   1   
     6 12   4   9   1   
     7 10   4  16   1   
     8  8   5  10   1  
     9 12   5  16   1   
    10 13   6  18   1     
    11 13   4  12   2   
    12 16   4  12   2  
    13 15   5  17   2   
    14 16   6   9   2   
    15 19   6  20   2   
    16 17   8  18   2   
    17 19   8  16   2  
    18 23   9  20   2  
    19 19  10  10   2   
    20 22  10  17   2      
    21 20   7   8   3   
    22 22   7  14   3   
    23 24   9  11   3   
    24 26   9  11   3   
    25 24  10  16   3   
    26 25  11  20   3  
    27 28  11  19   3   
    28 27  12  19   3  
    29 29  13  12   3  
    30 26  13  16   3   
    31 27   7  16   4 
    32 28   8  10   4 
    33 25   8  13   4 
    34 27   9   7   4  
    35 31   9  15   4  
    36 29  10  20   4  
    37 32  10  16   4  
    38 30  12  21   4  
    39 32  12  15   4  
    40 33  14  21   4 
    end
    
    tabstat y c1 c2, by(grp) stat(n mean sd) col(stat)  
    
    Summary for variables: y c1 c2
         by categories of: grp 
    
         grp |         N      mean        sd
    ---------+------------------------------
           1 |        10       9.8  2.347576
             |        10       3.4  1.712698
             |        10      12.8  4.491968
    ---------+------------------------------
           2 |        10      17.9  3.107339
             |        10         7  2.309401
             |        10      15.1  4.040077
    ---------+------------------------------
           3 |        10      25.1  2.726414
             |        10      10.2   2.20101
             |        10      14.6  4.060651
    ---------+------------------------------
           4 |        10      29.4  2.633122
             |        10       9.9   2.18327
             |        10      15.4  4.599517
    ---------+------------------------------
       Total |        40     20.55  7.977372
             |        40     7.625  3.439495
             |        40    14.475  4.260658
    ----------------------------------------
    
    anova y grp  /* 0 covariates */
    
                               Number of obs =      40     R-squared     =  0.8929
                               Root MSE      = 2.71723     Adj R-squared =  0.8840
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |      2216.1     3       738.7     100.05     0.0000
                             |
                         grp |      2216.1     3       738.7     100.05     0.0000
                             |
                    Residual |       265.8    36  7.38333333   
                  -----------+----------------------------------------------------
                       Total |      2481.9    39  63.6384615   
    
    anova y grp c.c1  /* 1 covariate */
    
                               Number of obs =      40     R-squared     =  0.9594
                               Root MSE      = 1.69598     Adj R-squared =  0.9548
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  2381.22741     4  595.306852     206.97     0.0000
                             |
                         grp |  415.841199     3  138.613733      48.19     0.0000
                          c1 |  165.127408     1  165.127408      57.41     0.0000
                             |
                    Residual |  100.672592    35  2.87635976   
                  -----------+----------------------------------------------------
                       Total |      2481.9    39  63.6384615 
    
    anova y grp c.c1 c.c2  /* 2 covariates */
    
                               Number of obs =      40     R-squared     =  0.9624
                               Root MSE      = 1.65656     Adj R-squared =  0.9569
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  2388.59757     5  477.719513     174.08     0.0000
                             |
                         grp |  420.189396     3  140.063132      51.04     0.0000
                          c1 |   98.974038     1   98.974038      36.07     0.0000
                          c2 |  7.37015734     1  7.37015734       2.69     0.1105
                             |
                    Residual |  93.3024343    34  2.74418925   
                  -----------+----------------------------------------------------
                       Total |      2481.9    39  63.6384615 
    
    margins grp, asbalanced
    
    
    Predictive margins                                Number of obs   =         40
    
    Expression   : Linear prediction, predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             grp |
              1  |   13.78338   .7820854    17.62   0.000     12.25052    15.31624
              2  |   18.38456    .537869    34.18   0.000     17.33035    19.43876
              3  |   22.77973    .646886    35.21   0.000     21.51186     24.0476
              4  |   27.25234   .6098128    44.69   0.000     26.05713    28.44755
    ------------------------------------------------------------------------------
    
    
      
    /* pairwise comparisons using anovalator */
    /* these tests have not been adjusted for multiplicity */
    
    anovalator grp, pair quietly
    
    anovalator pairwise comparisons for grp  
    
    Comparison          Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    1 vs 2           -4.60118     .88207    -5.22   0.000    -6.330038   -2.872323
    1 vs 3           -8.99635    1.21035    -7.43   0.000    -11.36863   -6.624072
    1 vs 4            -13.469    1.16021    -11.6   0.000    -15.74298   -11.19494
    2 vs 3           -4.39517    .891386    -4.93   0.000    -6.142288   -2.648057
    2 vs 4           -8.86778    .852674    -10.4   0.000    -10.53902   -7.196539
    3 vs 4           -4.47261    .746185    -5.99   0.000     -5.93513   -3.010083
    
    anova y grp c.c1 c.c1#grp  /* check homogeneity of regression for c2 */
    
                               Number of obs =      40     R-squared     =  0.9598
                               Root MSE      = 1.76482     Adj R-squared =  0.9511
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  2382.23359     7  340.319084     109.27     0.0000
                             |
                         grp |  70.1635717     3  23.3878572       7.51     0.0006
                          c1 |  152.279387     1  152.279387      48.89     0.0000
                      grp#c1 |  1.00618243     3  .335394144       0.11     0.9550
                             |
                    Residual |  99.6664092    32  3.11457529   
                  -----------+----------------------------------------------------
                       Total |      2481.9    39  63.6384615    
    
    anova y grp c.c2 c.c2#grp  /* check homogeneity of regression for c2 */
    
                               Number of obs =      40     R-squared     =  0.9228
                               Root MSE      = 2.44624     Adj R-squared =  0.9060
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  2290.40886     7  327.201265      54.68     0.0000
                             |
                         grp |  182.057287     3  60.6857623      10.14     0.0001
                          c2 |  73.6130056     1  73.6130056      12.30     0.0014
                      grp#c2 |  .785330753     3  .261776918       0.04     0.9876
                             |
                    Residual |  191.491142    32  5.98409817   
                  -----------+----------------------------------------------------
                       Total |      2481.9    39  63.6384615 


    Linear Statistical Models Course

    Phil Ender, 17sep10, 13may06, 11apr06, 25May00