Ed230B/C

Linear Statistical Models

Unbalanced Designs

Updated for Stata 11


A Very Small Example

Levelb1b2
a1 3
6
4
5
a21
2
2

Check the orthogonality of this very small example. As you can see, our usual algorithm for orthogonal coding does not yield a pairwise orthogonal system for A main effect, B main effect and A*B interaction. This implies that the various sums of squares cannot be estimated independently.

Estimation of Sums of Squares: Two Factor Design

anova y a b a*b

Type 1        Type 2         Type 3  
SS(A)         SS(A|B)        SS(A|B, A*B)
SS(B|A)       SS(B|A)        SS(B|A, A*B)
SS(A*B|A, B)  SS(A*B|A, B)   SS(A*B|A, B)
Estimation of Sums of Squares: Three Factor Design

anova y a b c a*b a*c b*c a*b*c

Type 1                      Type 2                      Type 3  
SS(A)                       SS(A|B,C)                   SS(A|B,C,A*B,A*C,B*C,A*B*C)
SS(B|A)                     SS(B|A,C)                   SS(B|A,C,A*B,A*C,B*C,A*B*C)
SS(C|A,B)                   SS(C|A,B)                   SS(C|A,B,A*B,A*C,B*C,A*B*C)
SS(A*B|A,B,C)               SS(A*B|A,B,C,A*C,B*C)       SS(A*B|A,B,C,A*C,B*C,A*B*C)
SS(A*C|A,B,C,A*B)           SS(A*C|A,B,C,A*B,B*C)       SS(A*C|A,B,C,A*B,B*C,A*B*C)
SS(B*C|A,B,C,A*B,A*C)       SS(B*C|A,B,C,A*B,A*C)       SS(B*C|A,B,C,A*B,A*C,A*B*C)
SS(A*B*C|A,B,C,A*B,A*C,B*C) SS(A*B*C|A,B,C,A*B,A*C,B*C) SS(A*B*C|A,B,C,A*B,A*C,B*C)
Schematic with Example Data

Levelb1 b2b3b4
a1 3
6
3
4
5
4
3
3
7
8
7
6
7
8
9
8
a21
2
2
2
2
3
4
3
5
6
5
6
10
10
9
11

Three ANOVA Summary Tables

Type 1            SS  df      MS     F             
A              3.125   1   3.125   4.04          
B|A          193.931   3  64.644  83.64       
A*B|A, B      19.894   3   6.631   8.58
Error         18.550  24    0.77              
Total        235.500  31             

Type 2            SS  df      MS     F   
A|B            2.707   1   2.707   3.50
B|A          193.931   3  64.644  83.64
A*B|A, B      19.894   3   6.631   8.58
Error         18.550  24    0.77
Total        235.500  31

Type 3            SS  df      MS     F   
A|B, A*B       3.199   1   3.199   4.14
B|A, A*B     188.726   3  62.909  81.83
A*B|A, B      19.894   3   6.631   8.58
Error         18.550  24    0.77
Total        235.500  31
Using Stata

input y a b x1 x2 x3 x4
 3 1 1   1  1  1  1 
 6 1 1   1  1  1  1 
 3 1 1   1  1  1  1
 1 2 1  -1  1  1  1
 2 2 1  -1  1  1  1
 2 2 1  -1  1  1  1
 2 2 1  -1  1  1  1 
 4 1 2   1 -1  1  1 
 5 1 2   1 -1  1  1   
 4 1 2   1 -1  1  1 
 3 1 2   1 -1  1  1 
 3 1 2   1 -1  1  1
 2 2 2  -1 -1  1  1 
 3 2 2  -1 -1  1  1
 4 2 2  -1 -1  1  1
 3 2 2  -1 -1  1  1
 7 1 3   1  0 -2  1 
 8 1 3   1  0 -2  1
 7 1 3   1  0 -2  1  
 6 1 3   1  0 -2  1  
 5 2 3  -1  0 -2  1
 6 2 3  -1  0 -2  1
 5 2 3  -1  0 -2  1 
 6 2 3  -1  0 -2  1
 7 1 4   1  0  0 -3
 8 1 4   1  0  0 -3  
 9 1 4   1  0  0 -3 
 8 1 4   1  0  0 -3 
10 2 4  -1  0  0 -3
10 2 4  -1  0  0 -3
 9 2 4  -1  0  0 -3
11 2 4  -1  0  0 -3
end

/* Type 1 SS; order a b a#b */
anova y a b a#b, sequential

                           Number of obs =      32     R-squared     =  0.9212
                           Root MSE      = .879157     Adj R-squared =  0.8983

                  Source |    Seq. SS     df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |      216.95     7  30.9928571      40.10     0.0000
                         |
                       a |       3.125     1       3.125       4.04     0.0557
                       b |     193.931     3  64.6436667      83.64     0.0000
                     a#b |      19.894     3  6.63133333       8.58     0.0005
                         |
                Residual |       18.55    24  .772916667   
              -----------+----------------------------------------------------
                   Total |       235.5    31  7.59677419  

/* Type 1 SS; order b a a#b */
anova y b a a#b, sequential

                           Number of obs =      32     R-squared     =  0.9212
                           Root MSE      = .879157     Adj R-squared =  0.8983

                  Source |    Seq. SS     df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |      216.95     7  30.9928571      40.10     0.0000
                         |
                       b |  194.349206     3  64.7830688      83.82     0.0000
                       a |  2.70679365     1  2.70679365       3.50     0.0735
                     a#b |      19.894     3  6.63133333       8.58     0.0005
                         |
                Residual |       18.55    24  .772916667   
              -----------+----------------------------------------------------
                   Total |       235.5    31  7.59677419 

/* Type 2 SS; constructed from two previous analyses */

                     Number of obs =      32     R-squared     =  0.9212
                     Root MSE      = .879157     Adj R-squared =  0.8983

            Source |    Seq. SS     df       MS           F     Prob > F
        -----------+----------------------------------------------------
             Model |      216.95     7  30.9928571      40.10     0.0000
                   |
                 a |  2.70679365     1  2.70679365       3.50     0.0735
                 b |     193.931     3  64.6436667      83.64     0.0000
               a*b |      19.894     3  6.63133333       8.58     0.0005
                   |
          Residual |       18.55    24  .772916667   
        -----------+----------------------------------------------------
             Total |      235.50    31  7.59677419

/* Type 3 SS */
anova y a b a#b

                           Number of obs =      32     R-squared     =  0.9212
                           Root MSE      = .879157     Adj R-squared =  0.8983

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |      216.95     7  30.9928571      40.10     0.0000
                         |
                       a |  3.19795082     1  3.19795082       4.14     0.0531
                       b |     188.726     3  62.9086667      81.39     0.0000
                     a#b |      19.894     3  6.63133333       8.58     0.0005
                         |
                Residual |       18.55    24  .772916667   
              -----------+----------------------------------------------------
                   Total |       235.5    31  7.59677419 
  
generate x5=x1*x2
generate x6=x1*x3
generate x7=x1*x4

regress y x1 x2 x3 x4 x5 x6 x7  /* Model: M0 */

  Source |       SS       df       MS               Number of obs =      32
---------+------------------------------            F(  7,    24) =   40.10
   Model |      216.95     7  30.9928571            Prob > F      =  0.0000
Residual |       18.55    24  .772916667            R-squared     =  0.9212
---------+------------------------------            Adj R-squared =  0.8983
   Total |      235.50    31  7.59677419            Root MSE      =  .87916

[remainder of output omitted]

regress y x1  /* Model: M1 */

  Source |       SS       df       MS               Number of obs =      32
---------+------------------------------            F(  1,    30) =    0.40
   Model |       3.125     1       3.125            Prob > F      =  0.5301
Residual |     232.375    30  7.74583333            R-squared     =  0.0133
---------+------------------------------            Adj R-squared = -0.0196
   Total |      235.50    31  7.59677419            Root MSE      =  2.7831

[remainder of output omitted]

regress y x2 x3 x4    /* Model: M2 */

  Source |       SS       df       MS               Number of obs =      32
---------+------------------------------            F(  3,    28) =   44.08
   Model |  194.349206     3  64.7830688            Prob > F      =  0.0000
Residual |  41.1507937    28   1.4696712            R-squared     =  0.8253
---------+------------------------------            Adj R-squared =  0.8065
   Total |      235.50    31  7.59677419            Root MSE      =  1.2123

[remainder of output omitted]

regress y x5 x6 x7    /* Model: M3 */

  Source |       SS       df       MS               Number of obs =      32
---------+------------------------------            F(  3,    28) =    1.07
   Model |  24.2569444     3  8.08564815            Prob > F      =  0.3770
Residual |  211.243056    28  7.54439484            R-squared     =  0.1030
---------+------------------------------            Adj R-squared =  0.0069
   Total |      235.50    31  7.59677419            Root MSE      =  2.7467

[remainder of output omitted]

regress y x1 x2 x3 x4    /* Model: M4 */

  Source |       SS       df       MS               Number of obs =      32
---------+------------------------------            F(  4,    27) =   34.60
   Model |     197.056     4      49.264            Prob > F      =  0.0000
Residual |      38.444    27  1.42385185            R-squared     =  0.8368
---------+------------------------------            Adj R-squared =  0.8126
   Total |      235.50    31  7.59677419            Root MSE      =  1.1933

[remainder of output omitted]

regress y x1 x5 x6 x7    /* Model: M5 */

      Source |       SS       df       MS              Number of obs =      32
-------------+------------------------------           F(  4,    27) =    0.92
       Model |      28.224     4       7.056           Prob > F      =  0.4672
    Residual |     207.276    27  7.67688889           R-squared     =  0.1198
-------------+------------------------------           Adj R-squared = -0.0105
       Total |      235.50    31  7.59677419           Root MSE      =  2.7707

[remainder of output omitted]

regress y x2 x3 x4 x5 x6 x7    /* Model: M6 */

  Source |       SS       df       MS               Number of obs =      32
---------+------------------------------            F(  6,    25) =   40.95
   Model |  213.752049     6  35.6253415            Prob > F      =  0.0000
Residual |  21.7479508    25  .869918033            R-squared     =  0.9077
---------+------------------------------            Adj R-squared =  0.8855
   Total |      235.50    31  7.59677419            Root MSE      =  .93269

[remainder of output omitted]

Sums of Squares Summary

Model: M0     SSA:B:A*B      216.950  (Full Model)
Model: M1     SSA              3.125  (A Main Effect)
Model: M2     SSB            194.349  (B Main Effect)
Model: M3     SSA*B           24.257  (A*B Interaction)
Model: M4     SSA:B          197.056  (A, B Main Effects)
Model: M5     SSA:A*B         28.224  (A, A*B)
Model: M6     SSB:A*B        213.752  (B, A*B)

Computing the Three Types of Sums of Squares

Type 1 Sums of Squares (sequential)
SSA                           =   3.125   (from M1) [SSA]
SSB|A     = 197.056 -   3.125 = 193.931   (M4 - M1) [SSA:B     - SSA]
SSA*B|A,B = 216.950 - 197.056 =  19.894   (M0 - M4) [SSA:B:A*B - SSA:B]

Type 2 Sums of Squares
SSA|B     = 197.056 - 194.349 =   2.707   (M4 - M2) [SSA:B     - SSB]
SSB|A     = 197.056 -   3.125 = 193.931   (M4 - M1) [SSA:B     - SSA]
SSA*B|A,B = 216.950 - 197.056 =  19.894   (M0 - M4) [SSA:B:A*B - SSA:B]

Type 3 Sums of Squares
SSA|B,A*B = 216.950 - 213.752 =   3.198   (M0 - M6) [SSA:B:A*B - SSB:A*B]
SSB|A,A*B = 216.950 -  28.224 = 188.726   (M0 - M5) [SSA:B:A*B - SSA:A*B]
SSA*B|A,B = 216.950 - 197.056 =  19.894   (M0 - M4) [SSA:B:A*B - SSA:B]
Using Stata: Continued

  m0: regress y x1 x2 x3 x4 x5 x6 x7
  m1: regress y x1
  m2: regress y x2 x3 x4
  m3: regress y x5 x6 x7
  m4: regress y x1 x2 x3 x4
  m5: regress y x1 x5 x6 x7
  m6: regress y x2 x3 x4 x5 x6 x7

Summary of Regression Results

Model: M0     R-square       0.9212  (Full Model)
Model: M1     R-square       0.0133  (A Main Effect)
Model: M2     R-square       0.8253  (B Main Effect)
Model: M3     R-square       0.1030  (A*B Interaction)
Model: M4     R-square       0.8368  (A, B Main Effects)
Model: M5     R-square       0.1198  (A, A*B)
Model: M6     R-square       0.9077  (B, A*B)

Computing F-ratios from Regression

  • A*B Interaction [Same for SS1, SS2 & SS3] A*B|A, B

    F-ratio numerator for A*B|A, B = (R2y.x1-x7 - R2y.x1-x4)/(k1 - k2)

    F-ratio denominator for all fixed effects = (1 - R2y.x1-x7)/(N - k1 - 1)

        (.9212 - .8368)/(7-4)
    F = ------------------------ = 8.57
        (1 - .9212)/(32 - 7 - 1)
  • Type I [SS1] A & B|A

    F-ratio numerator for A = R2y.x1/k

        .0133/1
    F = ------------------------ = 4.05
        (1 - .9212)/(32 - 7 - 1)
    F-ratio numerator for B|A = (R2y.x1-x4 - R2y.x1)/(k2 - k)

        (.8368 - .0133)/(4-1)
    F = ------------------------ = 83.60
        (1 - .9212)/(32 - 7 - 1)
  • Type II [SS2] A|B & B|A

    F-ratio numerator for A|B = (R2y.x1-x4 - R2y.x2-x4)/(k2 - k)

        (.8368 - .8253)/(4-3)
    F = ------------------------ = 3.50
        (1 - .9212)/(32 - 7 - 1)
    F-ratio numerator for B|A = (R2y.x1-x4 - R2y.x1)/(k2 - k)

        (.8368 - .0133)/(4-1)
    F = ------------------------ = 83.60
        (1 - .9212)/(32 - 7 - 1)
  • Type III [SS3] A|B, A*B & B|A, A*B

    F-ratio numerator for A|B, A*B = (R2y.x1-x7 - R2y.x2-x7)/(k1 - k)

        (.9212 - .9077)/(7-6)
    F = ------------------------ = 4.11
        (1 - .9212)/(32 - 7 - 1)
    F-ratio numerator for B|A, A*B = (R2y.x1-x7 - R2y.x1,x5-x7)/(k1 - k)

        (.9212 - .1198)/(7-4)
    F = ------------------------ = 81.63
        (1 - .9212)/(32 - 7 - 1)


    Linear Statistical Models Course

    Phil Ender, 17sep10, 30May00