Multiple Comparisons

Linear Statistical Models

Multiple Comparisons

Organization of Multiple Comparisons

Planned comparisons
- Planned orthogonal comparisons
- Planned nonorthogonal comparisons
  - Pairwise versus control group (Dunnett's Test)
Posthoc comparisons
- All pairwise (Fisher-Hayter, Tukey's HSD)
- Pairwise versus control group (Dunnett's Test)
- Non-pairwise (Scheffé's Test)
- Bonferroni Method

The Problem with Multiple Comparisons

If n independent contrasts are each tested at α, then the probability of making at least one type I error is 1 - (1 - α)ⁿ.

The table below gives the probability of making at least one type I error for four different numbers of comparisons:

 n    probability
 3      .1426
 5      .2262
10      .4013
15      .5367

Conceptual Error Rates

Error rate contrastwise
The probability that a contrast will be falsely declared significant.

Error rate experimentwise
The probability that at least one contrast will be falsely declared significant in an experiment.

Error rate familywise
The probability that at least one contrast will be falsely declared significant in a family of contrasts.

Changing the critical value of the statistical test is what controls the conceptual error rate.

Beware

You are generally safe sticking with the following post-hoc comparison techniques: Dunnett, Fisher-Hayter, Tukey HSD, Tukey-Kramer, Scheffé or Bonferroni, since they do a reasonably good job of of protecting the familywise error rate. They are known to strongly protect the familywise error rate. However, post-hoc techniques such as Fisher's least significant difference (LSD), Student-Newman-Keuls, and Duncan's multiple range test fail to strongly protect the familywise error rate. Such procedures are said to protect the familywise error rate in a weak sense, avoid them if possible.

Contrasts

A contrast or comparison among means is a difference among the means.

Consider the following four group means: M₁, M₂, M₃, & M₄

A contrast can then be thought of as set of weights, c, that are multiplied times the group means.

The greek letter psi is used to indicate contrasts.

Some examples:

Group 1 vs Group 2: ψ₁ = (1)M₁ + (-1)M₂ + (0)M₃ + (0)M₄
c₁ = 1 -1 0 0

Group 1 vs Group 3: ψ₂ = (1)M₁ + (0)M₂ + (-1)M₃ + (0)M₄
c₂ = 1 0 -1 0

Group 3 vs Group 4: ψ₃ = (0)M₁ + (0)M₂ + (1)M₃ + (-1)M₄
c₃ = 0 0 1 -1

Groups 1 & 2 vs Groups 3 & 4: ψ₄ = (1)M₁ + (1)M₂ + (-1)M₃ + (-1)M₄
c₄ = 1 1 -1 -1

Group 1 vs Group 4: ψ₅ = (1)M₁ + (0)M₂ + (0)M₃ + (-1)M₄
c₅ = 1 0 0 -1

Orthogonal Contrasts

Contrasts are orthogonal when the dot product of their weights, c, equals zero.

Some examples:

ψ₁ & ψ₂ = (1)(1) + (-1)(0) + (0)(-1) + (0)(0) = 1 [not orthogonal]

ψ₁ & ψ₃ = (1)(0) + (-1)(0) + (0)(1) + (0)(-1) = 0 [orthogonal]

ψ₁ & ψ₄ = (1)(1) + (-1)(1) + (0)(-1) + (0)(-1) = 0 [orthogonal]

ψ₂ & ψ₄ = (1)(1) + (0)(1) + (-1)(-1) + (0)(-1) = 2 [not orthogonal]

ψ₃ & ψ₄ = (0)(1) + (0)(1) + (1)(-1) + (-1)(-1) = 0 [orthogonal]

Planned Orthogonal Comparisons

Requirements

They must be planned
They must be orthogonal

There are at most p-1 orthogonal comparisons possible in any set of comparisons

t Tests for Orthogonal Comparisons

An Example

Using contrasts ψ₁, ψ₃ & ψ₄

The sets of weights are:
c₁ = 1 -1 0 0
c₃ = 0 0 1 -1
c₄ = 1 1 -1 -1

MSerr = 2.179 and n = 8 for each group

The three t-tests are:
t₁ = -0.68 --> t² = F = 0.46
t₂ = -2.71 --> t² = F = 7.34
t₃ = -3.83 --> t² = F = 14.69

Using Stata

This section make use of the anovacontrast.ado file which can be obtained from UCLA ATS via the Internet.

use http://www.philender.com/courses/data/cr4new, clear

table a, cont(freq mean y sd y)

----------------------------------------------
        a |      Freq.     mean(y)       sd(y)
----------+-----------------------------------
        1 |          8           3    1.511858
        2 |          8         3.5    .9258201
        3 |          8        4.25    1.035098
        4 |          8        6.25     2.12132
----------------------------------------------

anova y a


                           Number of obs =      32     R-squared     =  0.4455
                           Root MSE      =   1.476     Adj R-squared =  0.3860

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |       49.00     3  16.3333333       7.50     0.0008
                         |
                       a |       49.00     3  16.3333333       7.50     0.0008
                         |
                Residual |       61.00    28  2.17857143   
              -----------+----------------------------------------------------
                   Total |      110.00    31   3.5483871  


anovacontrast a, values(1 -1  0  0) title(1vs2)

1vs2
Contrast variable a (1 -1 0 0)                 Dep Var  =        y
source           SS          df      MS        Contrast =    -0.50
---------+---------------------------------    N of obs =       32
contrast |          1         1      1.0000    F        =     0.46
error    |         61        28      2.1786    Prob > F =   0.5036
---------+---------------------------------    t        =     0.68

anovacontrast a, values(0  0  1 -1) title(3vs4)

3vs4
Contrast variable a (0 0 1 -1)                 Dep Var  =        y
source           SS          df      MS        Contrast =    -2.00
---------+---------------------------------    N of obs =       32
contrast |         16         1     16.0000    F        =     7.34
error    |         61        28      2.1786    Prob > F =   0.0114
---------+---------------------------------    t        =     2.71

anovacontrast a, values(1  1 -1 -1) title(12vs34)

12vs34
Contrast variable a (1 1 -1 -1)                Dep Var  =        y
source           SS          df      MS        Contrast =    -4.00
---------+---------------------------------    N of obs =       32
contrast |         32         1     32.0000    F        =    14.69
error    |         61        28      2.1786    Prob > F =   0.0007
---------+---------------------------------    t        =     3.83

anovalator a, wgt(1 -1 0 0)

Adjusted predictions                              Number of obs   =         32

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           a |
          1  |          3   .5218443     5.75   0.000     1.977204    4.022796
          2  |        3.5   .5218443     6.71   0.000     2.477204    4.522796
          3  |       4.25   .5218443     8.14   0.000     3.227204    5.272796
          4  |       6.25   .5218443    11.98   0.000     5.227204    7.272796
------------------------------------------------------------------------------

anovalator contrast for a  


 ( 1)  1bn.a - 2.a = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |        -.5   .7379992    -0.68   0.498    -1.946452    .9464519
------------------------------------------------------------------------------

anovalator a, wgt(0 0 1 -1) quietly

anovalator contrast for a  


 ( 1)  3.a - 4.a = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |         -2   .7379992    -2.71   0.007    -3.446452   -.5535481
------------------------------------------------------------------------------

anovalator a, wgt(1 1 -1 -1) quietly

anovalator contrast for a  


 ( 1)  1bn.a + 2.a - 3.a - 4.a = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |         -4   1.043689    -3.83   0.000    -6.045592   -1.954408
------------------------------------------------------------------------------

Recall

The group means for the 4 group one-way anova

Group    1     2     3     4
Mean    3.00  3.50  4.25  6.25

The MSerr = 2.179

Dunnett's Test (Pairwise versus Control Group)

Compare p-1 treatment groups with a control-group.

df for Dunnett's t is the same as the df associated with the MSerr

Critical values found in Dunnett's table of critical values using p and df for MSerr

In our four group example p = 4, dferr = 28, and the nearest critical value of Dunnett's t is 2.51.

Dunnett's t squared is an F and the critical value is 6.30.

An Example

Using contrasts ψ₁, ψ₂ & ψ₅

The sets of weights are:
c₁ = 1 -1 0 0
c₂ = 1 0 -1 0
c₅ = 1 0 0 -1

MSerr = 2.179 and n = 8 for each group

The three t-test are:
t₁ = -0.68 --> t² = F = 0.46 -- n.s.
t₂ = -1.69 --> t² = F = 2.87 -- n.s.
t₃ = -4.40 --> t² = F = 18.39 -- sig. at .05

Using Stata

use http://www.philender.com/courses/data/cr4new, clear

anova y a

                           Number of obs =      32     R-squared     =  0.4455
                           Root MSE      =   1.476     Adj R-squared =  0.3860

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |       49.00     3  16.3333333       7.50     0.0008
                         |
                       a |       49.00     3  16.3333333       7.50     0.0008
                         |
                Residual |       61.00    28  2.17857143   
              -----------+----------------------------------------------------
                   Total |      110.00    31   3.5483871  
				   
anovacontrast a, values(1 -1  0  0) title(1vs2)

1vs2
Contrast variable a (1 -1 0 0)                 Dep Var  =        y
source           SS          df      MS        Contrast =    -0.50
---------+---------------------------------    N of obs =       32
contrast |          1         1      1.0000    F        =     0.46
error    |         61        28      2.1786    Prob > F =   0.5036
---------+---------------------------------    t        =     0.68

anovacontrast a, values(1 0 -1 0) title(1vs3)

1vs3
Contrast variable a (1 0 -1 0)                 Dep Var  =        y
source           SS          df      MS        Contrast =    -1.25
---------+---------------------------------    N of obs =       32
contrast |       6.25         1      6.2500    F        =     2.87
error    |         61        28      2.1786    Prob > F =   0.1014
---------+---------------------------------    t        =     1.69

anovacontrast a, values(1 0 0 -1) title(1vs4)

1vs4
Contrast variable a (1 0 0 -1)                 Dep Var  =        y
source           SS          df      MS        Contrast =    -3.25
---------+---------------------------------    N of obs =       32
contrast |      42.25         1     42.2500    F        =    19.39
error    |         61        28      2.1786    Prob > F =   0.0001
---------+---------------------------------    t        =     4.40

/* use regression with appropriate reference group */

regress y ib1.a

      Source |       SS       df       MS              Number of obs =      32
-------------+------------------------------           F(  3,    28) =    7.50
       Model |          49     3  16.3333333           Prob > F      =  0.0008
    Residual |          61    28  2.17857143           R-squared     =  0.4455
-------------+------------------------------           Adj R-squared =  0.3860
       Total |         110    31   3.5483871           Root MSE      =   1.476

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           a |
          2  |         .5   .7379992     0.68   0.504    -1.011723    2.011723
          3  |       1.25   .7379992     1.69   0.101    -.2617229    2.761723
          4  |       3.25   .7379992     4.40   0.000     1.738277    4.761723
             |
       _cons |          3   .5218443     5.75   0.000      1.93105     4.06895
------------------------------------------------------------------------------

Recall

The group means for the 4 group one-way anova

Group    1     2     3     4
Mean    3.00  3.50  4.25  6.25

The MSerr = 2.179

Fisher-Hayter Pairwise Comparisons

A post-hoc technique that test all pairwise comparisons of means

Based on the Studentized Range Distribution the critical value q(α, k-1, dfe) = q (.05, 3, 28) = 3.4994064

Let nn = (1/n1 + 1/n2) = (1/8 + 1/8) = 2/8 = 0.25

Let the critical difference = sqrt(mse * nn / 2) * q = sqrt(2.18 * 0.25 / 2) * 3.5 = 1.826

Compare each of the pairwise diffences in means with the critical difference

1vs2  -0.50  n.s.
1vs3  -1.25  n.s.
1vs4  -3.25  sig.
2vs3  -0.75  n.s.
2vs4  -2.75  sig.
3vs4  -2.00  sig.

Alternatively

Alternatively, you could compute a kind of a t-test for each pairwise difference in means and compare that to q.

From above q (.05, 3, 28) = 3.5

Using this formula the t values are:

1vs2  -0.96  n.s.
1vs3  -2.39  n.s.
1vs4  -6.23  sig.
2vs3  -1.44  n.s.
2vs4  -5.27  sig.
3vs4  -3.83  sig.

Using Stata

use http://www.philender.com/courses/data/cr4new, clear

anova y a

                           Number of obs =      32     R-squared     =  0.4455
                           Root MSE      =   1.476     Adj R-squared =  0.3860

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |       49.00     3  16.3333333       7.50     0.0008
                         |
                       a |       49.00     3  16.3333333       7.50     0.0008
                         |
                Residual |       61.00    28  2.17857143   
              -----------+----------------------------------------------------
                   Total |      110.00    31   3.5483871  

fhcomp a

 Fisher-Hayter pairwise comparisons for variable grp
studentized range critical value(.05, 3, 28) = 3.4994064

                                      mean     critical
grp vs grp       group means          dif        dif
-------------------------------------------------------
  1 vs   2     3.0000     3.5000     0.5000    1.8261
  1 vs   3     3.0000     4.2500     1.2500    1.8261
  1 vs   4     3.0000     6.2500     3.2500*   1.8261
  2 vs   3     3.5000     4.2500     0.7500    1.8261
  2 vs   4     3.5000     6.2500     2.7500*   1.8261
  3 vs   4     4.2500     6.2500     2.0000*   1.8261

Tukey's HSD Pairwise Comparisons

A post-hoc technique that test all pairwise comparisons of means

Based on the Studentized Range Distribution the critical value q(α, k, dfe) = q (.05, 4, 28) = 3.8613586

If the sample sizes are unequal use the harmonica mean sample size

For the harmonic mean sample size let n = k/(1/n1 + 1/n2 + 1/n3 + 1/n4) = 4/(1/8 + 1/8 + 1/8 + 1/8) = 8

Let the critical difference = sqrt(mse / n) * q = sqrt(2.18 / 8) * 3.86 = 2.01

Compare each of the pairwise diffences in means with the critical difference

1vs2  -0.50  n.s.
1vs3  -1.25  n.s.
1vs4  -3.25  sig.
2vs3  -0.75  n.s.
2vs4  -2.75  sig.
3vs4  -2.00  n.s.

Alternatively

Alternatively, you could compute a kind of a t-test for each pairwise difference in means and compare that to q.

From above q (.05, 4, 28) = 3.86

Using this formula the t values are:

1vs2  -0.96  n.s.
1vs3  -2.39  n.s.
1vs4  -6.23  sig.
2vs3  -1.44  n.s.
2vs4  -5.27  sig.
3vs4  -3.83  n.s.

Using Stata

use http://www.philender.com/courses/data/cr4new, clear

anova y a

                           Number of obs =      32     R-squared     =  0.4455
                           Root MSE      =   1.476     Adj R-squared =  0.3860

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |       49.00     3  16.3333333       7.50     0.0008
                         |
                       a |       49.00     3  16.3333333       7.50     0.0008
                         |
                Residual |       61.00    28  2.17857143   
              -----------+----------------------------------------------------
                   Total |      110.00    31   3.5483871  
                  
tukeyhsd a

Tukey HSD pairwise comparisons for variable a
studentized range critical value(.05, 4, 28) = 3.8613586
uses harmonica mean sample size =    8.000

                                      mean     critical
grp vs grp       group means          dif        dif
-------------------------------------------------------
  1 vs   2     3.0000     3.5000     0.5000    2.0150
  1 vs   3     3.0000     4.2500     1.2500    2.0150
  1 vs   4     3.0000     6.2500     3.2500*   2.0150
  2 vs   3     3.5000     4.2500     0.7500    2.0150
  2 vs   4     3.5000     6.2500     2.7500*   2.0150
  3 vs   4     4.2500     6.2500     2.0000    2.0150
  
tkcomp a

Tukey-Kramer pairwise comparisons for variable a
studentized range critical value(.05, 4, 28) = 3.8613586

                                      mean 
grp vs grp       group means          dif     TK-test
-------------------------------------------------------
  1 vs   2     3.0000     3.5000      0.5000   0.9581 
  1 vs   3     3.0000     4.2500      1.2500   2.3954 
  1 vs   4     3.0000     6.2500      3.2500   6.2279*
  2 vs   3     3.5000     4.2500      0.7500   1.4372 
  2 vs   4     3.5000     6.2500      2.7500   5.2698*
  3 vs   4     4.2500     6.2500      2.0000   3.8326

Comparing Tukey's HSD with Tukey-Kramer

When cell sizes are equal Tukey HSD and Tukey-Kramer give the same results.

Tukey-Kramer handles unequal cell sizes better than Tukey Hsd

Comparing Fisher-Hayter with Tukey's HSD

When the cell sizes are equal Fisher-Hayter and Tukey's HSD give the same results, i.e., t_q = qT.

Fisher-Hayter is a little bit more powerful since the critical value from the Studentized Range is based on k-1 and not k as in Tukey's HSD.

Fisher-Hayter handles unequal cell sizes in a cleaner manner.

Recall

The group means for the 4 group one-way anova

Group    1     2     3     4
Mean    3.00  3.50  4.25  6.25

The MSerr = 2.179

Scheffé's Test

Can perform an unlimited number of contrasts

Usually used to perform non-pairwise comparisons.

Create a new critical value of F
F_S = (p-1)F where F is found using p-1 & df_error degrees of freedom

The critical value of F for the ANOVA at α = .05 was F_3,28 = 2.95

The critical value for F_S = (4-1)2.95 = 8.85

Formula for Scheffé

From Our Example

Using contrasts ψ_a, ψ_b, ψ_c & ψ_d

The sets of weights are:

c_a = 3 -1 -1 -1
c_b = 2  0 -1 -1
c_c = 1  1 -1 -1 
c_d = 1  1 -2  0

MSerr = 1.464 and n = 8 for each group

The four F-tests are:

F_a =   7.65 -- n.s.
F_b =  12.39 -- sig.
F_c =  14.69 -- sig.
F_d =   2.45 -- n.s.

Using Stata

When using coded contrasts the critical value should be 8.85

use http://www.philender.com/courses/data/cr4new, clear

anova y a

                           Number of obs =      32     R-squared     =  0.4455
                           Root MSE      =   1.476     Adj R-squared =  0.3860

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |       49.00     3  16.3333333       7.50     0.0008
                         |
                       a |       49.00     3  16.3333333       7.50     0.0008
                         |
                Residual |       61.00    28  2.17857143   
              -----------+----------------------------------------------------
                   Total |      110.00    31   3.5483871    

anovacontrast a, values(3 -1  -1  -1) title(1vs234)

1vs234
Contrast variable a (3 -1 -1 -1)               Dep Var  =        y
source           SS          df      MS        Contrast =    -5.00
---------+---------------------------------    N of obs =       32
contrast | 16.6666667         1     16.6667    F        =     7.65
error    |         61        28      2.1786    Prob > F =   0.0099
---------+---------------------------------    t        =     2.77

anovacontrast a, values(2  0  -1  -1) title(1vs34)

1vs34
Contrast variable a (2 0 -1 -1)                Dep Var  =        y
source           SS          df      MS        Contrast =    -4.50
---------+---------------------------------    N of obs =       32
contrast |         27         1     27.0000    F        =    12.39
error    |         61        28      2.1786    Prob > F =   0.0015
---------+---------------------------------    t        =     3.52

anovacontrast a, values(1  1  -1  -1) title(12vs34)

12vs34
Contrast variable a (1 1 -1 -1)                Dep Var  =        y
source           SS          df      MS        Contrast =    -4.00
---------+---------------------------------    N of obs =       32
contrast |         32         1     32.0000    F        =    14.69
error    |         61        28      2.1786    Prob > F =   0.0007
---------+---------------------------------    t        =     3.83

anovacontrast a, values(1  1  -2   0) title(12vs3)

12vs3
Contrast variable a (1 1 -2 0)                 Dep Var  =        y
source           SS          df      MS        Contrast =    -2.00
---------+---------------------------------    N of obs =       32
contrast | 5.33333333         1      5.3333    F        =     2.45
error    |         61        28      2.1786    Prob > F =   0.1289
---------+---------------------------------    t        =     1.56

Bonferroni & Sidak Methods

quick and dirty

Not very powerful for pairwise comparisons

Divide the alpha level by the number of contrast to arrive at a new alpha level

α_B = α/n

Sidak is a modification of the Bonferroni approach

α_Si = 1 - (1-α)^1/k

For the non-pairwise contrast in the section above the Bonferroni p-value would be, α_B = .05/4 = .0125 which equates to a critical value of F_B = 4.33.

The Sidak critical value is α_Si = 1 - (1-.05)^.25 = .01274146 which equates to a critical value of F_Si = 4.31.

Compare these critical values with the Scheffé critical value of 8.85.

Comparing the Comparisons

Consider a four group design with error df=28. Here are the critical values for pairwise comparisons using various methods at α = 0.05.

Method                     Critical Value of t*
Ordinary Student's t                  2.048
Dunnett's test                        2.157
Fisher-Hayter                         2.474 requires rescaling studentized range statistic
Tukey HSD                             2.730 requires rescaling studentized range statistic
Tukey-Kramer                          2.730 requires rescaling studentized range statistic
Sidak                                 2.830
Bonferoni                             2.839
Scheffé                               2.975

Linear Statistical Models Course

Phil Ender, 17sep10, 13apr06, 12Feb98