Linear Statistical Models

Completely Randomized MANOVA Design

Updated for Stata 11


Its a multivariate world after all...

There are many situations in which you will have access to more than one outcome variable. In those situations you have three options:

The Manova Linear Model

y1 y2 y3 = constant + grp + error

What is different in this model from the univariate model? What elements are the same?

Hypotheses

  • Treatment Effect
    H0: the populations centroids are equal for all groups
    H1: the populations centroids are not equal for at lest two groups

    Assumptions

    1.The sets of observations are independent of one another.
    2.The variables within each group come from multivariate normal populations.
    3.The variance-covariance matrices for each group are equal in the population.

    Schematic with Example Data

    Level
    Group 1Group 2Group 3
    y11 y21 y31
    ...   ...   ...
    y1n y2n y3n
    y11 y21 y31
    ...   ...   ...
    y1n y2n y3n
    y11 y21 y31
    ...   ...   ...
    y1n y2n y3n

    Stata Computer Example

    input y1 y2 y3 grp
    19.6  5.15  9.5 1
    15.4  5.75  9.1 1
    22.3  4.35  3.3 1
    24.3  7.55  5.0 1
    22.5  8.50  6.0 1
    20.5 10.25  5.0 1
    14.1  5.95 18.8 1
    13.0  6.30 16.5 1
    14.1  5.45  8.9 1
    16.7  3.75  6.0 1
    16.8  5.10  7.4 1
    17.1  9.00  7.5 2
    15.7  5.30  8.5 2
    14.9  9.85  6.0 2
    19.7  3.60  2.9 2
    17.2  4.05  0.2 2
    16.0  4.40  2.6 2
    12.8  7.15  7.0 2
    13.6  7.25  3.2 2
    14.2  5.30  6.2 2
    13.1  3.10  5.5 2
    16.5  2.40  6.6 2
    16.0  4.55  2.9 3
    12.5  2.65  0.7 3
    18.5  6.50  5.3 3
    19.2  4.85  8.3 3
    12.0  8.75  9.0 3
    13.0  5.20 10.3 3
    11.9  4.75  8.5 3
    12.0  5.85  9.5 3
    19.8  2.85  2.3 3
    16.5  6.55  3.3 3
    17.4  6.60  1.9 3
    end
     
    sort grp
     
    by grp: summarize y1 y2 y3
    
    -> grp=        1  
    Variable |     Obs        Mean   Std. Dev.       Min        Max
    ---------+-----------------------------------------------------
          y1 |      11    18.11818   3.903797         13       24.3  
          y2 |      11    6.190909   1.899713       3.75      10.25  
          y3 |      11    8.681818   4.863089        3.3       18.8  
    
    -> grp=        2  
    Variable |     Obs        Mean   Std. Dev.       Min        Max
    ---------+-----------------------------------------------------
          y1 |      11    15.52727   2.075616       12.8       19.7  
          y2 |      11    5.581818   2.434263        2.4       9.85  
          y3 |      11    5.109091   2.531187         .2        8.5  
    
    -> grp=        3  
    Variable |     Obs        Mean   Std. Dev.       Min        Max
    ---------+-----------------------------------------------------
          y1 |      11    15.34545   3.138268       11.9       19.8  
          y2 |      11    5.372727   1.759029       2.65       8.75  
          y3 |      11    5.636364   3.546907         .7       10.3 
    
     
    manova y1 y2 y3 = grp
    
                               Number of obs =      33
    
                               W = Wilks' lambda      L = Lawley-Hotelling trace
                               P = Pillai's trace     R = Roy's largest root
    
                      Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
                  -----------+--------------------------------------------------
                         grp | W   0.5258      2     6.0    56.0     3.54 0.0049 e
                             | P   0.4767            6.0    58.0     3.02 0.0122 a
                             | L   0.8972            6.0    54.0     4.04 0.0021 a
                             | R   0.8920            3.0    29.0     8.62 0.0003 u
                             |--------------------------------------------------
                    Residual |                30
                  -----------+--------------------------------------------------
                       Total |                32
                  --------------------------------------------------------------
                               e = exact, a = approximate, u = upper bound on F
     
    forvalues i=1/3 {
      display
      display "anova for y`i'"
      display
      anova y`i' grp
    }
    
    anova for y1
    
    
                               Number of obs =      33     R-squared     =  0.1526
                               Root MSE      = 3.13031     Adj R-squared =  0.0961
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  52.9242378     2  26.4621189       2.70     0.0835
                             |
                         grp |  52.9242378     2  26.4621189       2.70     0.0835
                             |
                    Residual |  293.965442    30  9.79884808   
                  -----------+----------------------------------------------------
                       Total |   346.88968    32  10.8403025   
    
    anova for y2
    
    
                               Number of obs =      33     R-squared     =  0.0305
                               Root MSE      = 2.05173     Adj R-squared = -0.0341
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  3.97515121     2   1.9875756       0.47     0.6282
                             |
                         grp |  3.97515121     2   1.9875756       0.47     0.6282
                             |
                    Residual |  126.287277    30  4.20957589   
                  -----------+----------------------------------------------------
                       Total |  130.262428    32  4.07070087   
    
    anova for y3
    
    
                               Number of obs =      33     R-squared     =  0.1610
                               Root MSE      = 3.76993     Adj R-squared =  0.1051
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  81.8296936     2  40.9148468       2.88     0.0718
                             |
                         grp |  81.8296936     2  40.9148468       2.88     0.0718
                             |
                    Residual |  426.370896    30  14.2123632   
                  -----------+----------------------------------------------------
                       Total |   508.20059    32  15.8812684

    Interestingly, the multivariate F-ratio was significant but none of the univariate F's were. A better tool for looking at the multivariate effects is to use simultaneous confidence intervals.

    simulci y1 y2 y3, by(grp) cv(.31)
    
    s=2  m=0  n=13 cv= .31
    
    group variable:   grp
    
                                        pairwise simultaneous
    comparison           difference      confidence intervals
    dv: y1
    grp 1 vs grp 2        2.591*        0.292         4.889
    grp 1 vs grp 3        2.773*        0.474         5.071
    grp 2 vs grp 3        0.182        -2.117         2.480
    
    dv: y2
    grp 1 vs grp 2        0.609        -0.897         2.116
    grp 1 vs grp 3        0.818        -0.688         2.325
    grp 2 vs grp 3        0.209        -1.297         1.716
    
    dv: y3
    grp 1 vs grp 2        3.573*        0.805         6.341
    grp 1 vs grp 3        3.045*        0.277         5.814
    grp 2 vs grp 3       -0.527        -3.295         2.241
    

    We see from these results that variables y1 and y3 display significant effects when looking at the differences by groups 1 & 2 and 1 & 3.

    Example Using HSB2

    use http://www.philender.com/courses/data/hsb2, clear
    
    manova read write math science = prog
    
                               Number of obs =     200
    
                               W = Wilks' lambda      L = Lawley-Hotelling trace
                               P = Pillai's trace     R = Roy's largest root
    
                      Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
                  -----------+--------------------------------------------------
                        prog | W   0.6942      2     8.0   388.0     9.71 0.0000 e
                             | P   0.3134            8.0   390.0     9.06 0.0000 a
                             | L   0.4296            8.0   386.0    10.36 0.0000 a
                             | R   0.4023            4.0   195.0    19.61 0.0000 u
                             |--------------------------------------------------
                    Residual |               197
                  -----------+--------------------------------------------------
                       Total |               199
                  --------------------------------------------------------------
                               e = exact, a = approximate, u = upper bound on F
     
    simulci read write math science, by(prog) cv(.075)
    
    s=2  m=.5  n=96 cv= .075
    
    group variable:   prog
    
                                        pairwise simultaneous
    comparison           difference      confidence intervals
    dv: read
    prog 1 vs prog 2       -6.406       -13.876         1.063
    prog 1 vs prog 3        3.556        -3.914        11.025
    prog 2 vs prog 3        9.962*        2.492        17.431
    
    dv: write
    prog 1 vs prog 2       -4.924       -11.829         1.982
    prog 1 vs prog 3        4.573        -2.332        11.479
    prog 2 vs prog 3        9.497*        2.592        16.403
    
    dv: math
    prog 1 vs prog 2       -6.711*      -13.319        -0.103
    prog 1 vs prog 3        3.602        -3.006        10.210
    prog 2 vs prog 3       10.313*        3.705        16.921
    
    dv: science
    prog 1 vs prog 2       -1.356        -9.000         6.289
    prog 1 vs prog 3        5.224        -2.420        12.869
    prog 2 vs prog 3        6.580        -1.065        14.225
    
    Factorial Manova Example

    use http://www.philender.com/courses/data/hsb2, clear
    
    manova read math science = female prog female#prog
    
                               Number of obs =     200
    
                               W = Wilks' lambda      L = Lawley-Hotelling trace
                               P = Pillai's trace     R = Roy's largest root
    
                      Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
                 ------------+--------------------------------------------------
                       Model | W   0.6719      5    15.0   530.4     5.48 0.0000 a
                             | P   0.3516           15.0   582.0     5.15 0.0000 a
                             | L   0.4541           15.0   572.0     5.77 0.0000 a
                             | R   0.3665            5.0   194.0    14.22 0.0000 u
                             |--------------------------------------------------
                    Residual |               194
                 ------------+--------------------------------------------------
                      female | W   0.9823      1     3.0   192.0     1.15 0.3283 e
                             | P   0.0177            3.0   192.0     1.15 0.3283 e
                             | L   0.0180            3.0   192.0     1.15 0.3283 e
                             | R   0.0180            3.0   192.0     1.15 0.3283 e
                             |--------------------------------------------------
                        prog | W   0.7177      2     6.0   384.0    11.55 0.0000 e
                             | P   0.2892            6.0   386.0    10.87 0.0000 a
                             | L   0.3839            6.0   382.0    12.22 0.0000 a
                             | R   0.3573            3.0   193.0    22.99 0.0000 u
                             |--------------------------------------------------
                 female#prog | W   0.9586      2     6.0   384.0     1.37 0.2273 e
                             | P   0.0416            6.0   386.0     1.37 0.2268 a
                             | L   0.0429            6.0   382.0     1.36 0.2278 a
                             | R   0.0353            3.0   193.0     2.27 0.0819 u
                             |--------------------------------------------------
                    Residual |               194
                 ------------+--------------------------------------------------
                       Total |               199
                 ---------------------------------------------------------------
                               e = exact, a = approximate, u = upper bound on F
    Only the multivariate test of the prog main effect was statistically significant.


    Linear Statistical Models Course

    Phil Ender, 17sep00, 26apr00