Multivariate Analysis
Multivariate Multiple Regression


Multivariate multiple regression is a logical extension of the multiple regression concept to allow for multiple response (dependent) variables. Multivariate regression estimates the same coefficients and standard errors as one would obtain using separate OLS regressions. In addition, multivariate regression, being a joint estimator, also estimates the between-equation covariances. This means that it is possible to test coefficient across equations.

The matrix formula for multivariate regression is virtually identical to the OLS formula with the only change being that Y is a matrix response variables and not a vector.

The residual covariance matrix can be obtained by In Stata mvreg is the command used for multivariate multiple regression estimates. In addition, mvtest by David E. Moore (Cincinnati University) can be used to produce traditional multivariate tests on the estimates.

Stata Example

use http://www.gseis.ucla.edu/courses/data/hsb2
 
xi: regress read female i.prog
i.prog            _Iprog_1-3          (naturally coded; _Iprog_1 omitted)

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   14.45
       Model |  3789.28412     3  1263.09471           Prob > F      =  0.0000
    Residual |  17130.1359   196  87.3986524           R-squared     =  0.1811
-------------+------------------------------           Adj R-squared =  0.1686
       Total |    20919.42   199  105.122714           Root MSE      =  9.3487

------------------------------------------------------------------------------
        read |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |  -1.208582   1.327672    -0.91   0.364    -3.826939    1.409774
    _Iprog_2 |    6.42937   1.665893     3.86   0.000     3.143993    9.714746
    _Iprog_3 |  -3.547498   1.921001    -1.85   0.066    -7.335983    .2409862
       _cons |   50.40013   1.563197    32.24   0.000     47.31729    53.48298
------------------------------------------------------------------------------
 
xi: regress write female i.prog
i.prog            _Iprog_1-3          (naturally coded; _Iprog_1 omitted)

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   20.72
       Model |  4304.40272     3  1434.80091           Prob > F      =  0.0000
    Residual |  13574.4723   196  69.2575116           R-squared     =  0.2408
-------------+------------------------------           Adj R-squared =  0.2291
       Total |   17878.875   199   89.843593           Root MSE      =  8.3221

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   4.771211   1.181876     4.04   0.000     2.440385    7.102037
    _Iprog_2 |   4.832929   1.482956     3.26   0.001     1.908331    7.757528
    _Iprog_3 |  -4.605141   1.710049    -2.69   0.008      -7.9776   -1.232683
       _cons |   48.78869   1.391537    35.06   0.000     46.04438      51.533
------------------------------------------------------------------------------
 
xi: regress math  female i.prog
i.prog            _Iprog_1-3          (naturally coded; _Iprog_1 omitted)

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   19.56
       Model |  4024.61221     3   1341.5374           Prob > F      =  0.0000
    Residual |  13441.1828   196  68.5774632           R-squared     =  0.2304
-------------+------------------------------           Adj R-squared =  0.2186
       Total |   17465.795   199  87.7678141           Root MSE      =  8.2812

------------------------------------------------------------------------------
        math |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |  -.6737673   1.176059    -0.57   0.567    -2.993122    1.645587
    _Iprog_2 |   6.723945   1.475657     4.56   0.000      3.81374    9.634149
    _Iprog_3 |   -3.59773   1.701633    -2.11   0.036    -6.953591   -.2418702
       _cons |   50.38156   1.384689    36.38   0.000     47.65076    53.11237
------------------------------------------------------------------------------
 
xi: mvreg read write math = female i.prog
i.prog            _Iprog_1-3          (naturally coded; _Iprog_1 omitted)

Equation          Obs  Parms        RMSE    "R-sq"          F        P
----------------------------------------------------------------------
read              200      4    9.348725    0.1811   14.45211   0.0000
write             200      4     8.32211    0.2408    20.7169   0.0000
math              200      4    8.281151    0.2304   19.56237   0.0000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
read         |
      female |  -1.208582   1.327672    -0.91   0.364    -3.826939    1.409774
    _Iprog_2 |    6.42937   1.665893     3.86   0.000     3.143993    9.714746
    _Iprog_3 |  -3.547498   1.921001    -1.85   0.066    -7.335983    .2409862
       _cons |   50.40013   1.563197    32.24   0.000     47.31729    53.48298
-------------+----------------------------------------------------------------
write        |
      female |   4.771211   1.181876     4.04   0.000     2.440385    7.102037
    _Iprog_2 |   4.832929   1.482956     3.26   0.001     1.908331    7.757528
    _Iprog_3 |  -4.605141   1.710049    -2.69   0.008      -7.9776   -1.232683
       _cons |   48.78869   1.391537    35.06   0.000     46.04438      51.533
-------------+----------------------------------------------------------------
math         |
      female |  -.6737673   1.176059    -0.57   0.567    -2.993122    1.645587
    _Iprog_2 |   6.723945   1.475657     4.56   0.000      3.81374    9.634149
    _Iprog_3 |   -3.59773   1.701633    -2.11   0.036    -6.953591   -.2418702
       _cons |   50.38156   1.384689    36.38   0.000     47.65076    53.11237
------------------------------------------------------------------------------
 
test female

 ( 1)  [read]female = 0.0
 ( 2)  [write]female = 0.0
 ( 3)  [math]female = 0.0

       F(  3,   196) =   11.63
            Prob > F =    0.0000
 
test _Iprog_2 _Iprog_3

 ( 1)  [read]_Iprog_2 = 0.0
 ( 2)  [write]_Iprog_2 = 0.0
 ( 3)  [math]_Iprog_2 = 0.0
 ( 4)  [read]_Iprog_3 = 0.0
 ( 5)  [write]_Iprog_3 = 0.0
 ( 6)  [math]_Iprog_3 = 0.0

       F(  6,   196) =   11.83
            Prob > F =    0.0000
 

The same model run using the manova command to get the multivariate tests.

 manova read write math = female prog

                           Number of obs =     200

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                   Model | W   0.6231      3     9.0   472.3    11.26 0.0000 a
                         | P   0.4170            9.0   588.0    10.55 0.0000 a
                         | L   0.5406            9.0   578.0    11.57 0.0000 a
                         | R   0.3642            3.0   196.0    23.79 0.0000 u
                         |--------------------------------------------------
                Residual |               196
              -----------+--------------------------------------------------
                  female | W   0.8489      1     3.0   194.0    11.51 0.0000 e
                         | P   0.1511            3.0   194.0    11.51 0.0000 e
                         | L   0.1780            3.0   194.0    11.51 0.0000 e
                         | R   0.1780            3.0   194.0    11.51 0.0000 e
                         |--------------------------------------------------
                    prog | W   0.7329      2     6.0   388.0    10.87 0.0000 e
                         | P   0.2686            6.0   390.0    10.08 0.0000 a
                         | L   0.3623            6.0   386.0    11.65 0.0000 a
                         | R   0.3564            3.0   195.0    23.16 0.0000 u
                         |--------------------------------------------------
                Residual |               196
              -----------+--------------------------------------------------
                   Total |               199
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F
Example 2

Next, we will perform an mvreg which is equivalent to a factorial multivariate analysis of variance. Using xi3 will ensure that the the main effects are estimated correctly.

xi3: mvreg read write math = e.female*e.prog
e.female          _Ifemale_0-1        (naturally coded; _Ifemale_0 omitted)
e.prog            _Iprog_1-3          (naturally coded; _Iprog_1 omitted)

Equation          Obs  Parms        RMSE    "R-sq"          F        P
----------------------------------------------------------------------
read              200      6    9.301994    0.1976   9.553455   0.0000
write             200      6    8.263856    0.2590   13.56062   0.0000
math              200      6     8.32305    0.2306   11.62587   0.0000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
read         |
  _Ifemale_1 |  -.8645308   .7076023    -1.22   0.223    -2.260112    .5310502
    _Iprog_2 |   5.410261   .8822916     6.13   0.000     3.670146    7.150376
    _Iprog_3 |  -4.603099   1.039838    -4.43   0.000    -6.653939    -2.55226
   _Ife1Xpr2 |   .7607157   .8822916     0.86   0.390    -.9793993    2.500831
   _Ife1Xpr3 |   1.371777   1.039838     1.32   0.189    -.6790623    3.422617
       _cons |   50.76252   .7076023    71.74   0.000     49.36694     52.1581
-------------+----------------------------------------------------------------
write        |
  _Ifemale_1 |   2.702201   .6286312     4.30   0.000     1.462372     3.94203
    _Iprog_2 |   4.870758   .7838244     6.21   0.000     3.324847     6.41667
    _Iprog_3 |  -4.836331   .9237884    -5.24   0.000    -6.658289   -3.014373
   _Ife1Xpr2 |  -1.217608   .7838244    -1.55   0.122    -2.763519    .3283035
   _Ife1Xpr3 |   1.866237   .9237884     2.02   0.045     .0442793    3.688195
       _cons |   51.23086   .6286312    81.50   0.000     49.99103    52.47068
-------------+----------------------------------------------------------------
math         |
  _Ifemale_1 |   -.323731    .633134    -0.51   0.610    -1.572441    .9249787
    _Iprog_2 |   5.684064   .7894389     7.20   0.000     4.127079    7.241049
    _Iprog_3 |   -4.63014   .9304055    -4.98   0.000    -6.465149   -2.795132
   _Ife1Xpr2 |  -.0332022   .7894389    -0.04   0.966    -1.590187    1.523783
   _Ife1Xpr3 |  -.1327907   .9304055    -0.14   0.887    -1.967799    1.702218
       _cons |   51.08666    .633134    80.69   0.000     49.83795    52.33537
------------------------------------------------------------------------------


/* using  manova (multivariate analysis of variance) */

manova read write math = female prog female*prog

                           Number of obs =     200

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
             ------------+--------------------------------------------------
                   Model | W   0.5808      5    15.0   530.4     7.69 0.0000 a
                         | P   0.4796           15.0   582.0     7.38 0.0000 a
                         | L   0.6206           15.0   572.0     7.89 0.0000 a
                         | R   0.3762            5.0   194.0    14.59 0.0000 u
                         |--------------------------------------------------
                Residual |               194
             ------------+--------------------------------------------------
                  female | W   0.8238      1     3.0   192.0    13.69 0.0000 e
                         | P   0.1762            3.0   192.0    13.69 0.0000 e
                         | L   0.2139            3.0   192.0    13.69 0.0000 e
                         | R   0.2139            3.0   192.0    13.69 0.0000 e
                         |--------------------------------------------------
                    prog | W   0.7305      2     6.0   384.0    10.88 0.0000 e
                         | P   0.2712            6.0   386.0    10.09 0.0000 a
                         | L   0.3666            6.0   382.0    11.67 0.0000 a
                         | R   0.3602            3.0   193.0    23.17 0.0000 u
                         |--------------------------------------------------
             female*prog | W   0.9321      2     6.0   384.0     2.29 0.0347 e
                         | P   0.0691            6.0   386.0     2.30 0.0338 a
                         | L   0.0716            6.0   382.0     2.28 0.0356 a
                         | R   0.0381            3.0   193.0     2.45 0.0646 u
                         |--------------------------------------------------
                Residual |               194
             ------------+--------------------------------------------------
                   Total |               199
             ---------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

Example 3

Here is another example of multivariate regression. By including the corr option we can see how highly the residuals of the two equation are correlated. We also get the Breusch-Pagan test of independence.

mvreg math science = read write,  corr

Equation          Obs  Parms        RMSE    "R-sq"          F        P
----------------------------------------------------------------------
math              200      3    6.555315    0.5153   104.7222   0.0000
science           200      3    7.340989    0.4558   82.49331   0.0000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
math         |
        read |   .4169486   .0564838     7.38   0.000     .3055581    .5283391
       write |   .3411219   .0610982     5.58   0.000     .2206314    .4616124
       _cons |   12.86507    2.82162     4.56   0.000      7.30061    18.42952
-------------+----------------------------------------------------------------
science      |
        read |   .4345423   .0632535     6.87   0.000     .3098013    .5592832
       write |   .3153468    .068421     4.61   0.000     .1804151    .4502784
       _cons |   12.51143   3.159799     3.96   0.000     6.280058     18.7428
------------------------------------------------------------------------------

Correlation matrix of residuals:

            math  science
   math   1.0000
science   0.2849   1.0000

Breusch-Pagan test of independence: chi2(1) =    16.230, Pr = 0.0001
The command test read test whether the coefficient for read is zero in both equations. A more interesting test might be to see whether the coefficient for read is the same in each equation, that is, is the effect of read the same for math as it is for science.
test read

 ( 1)  [math]read = 0
 ( 2)  [science]read = 0

       F(  2,   197) =   39.61
            Prob > F =    0.0000

test [math=science]: read

 ( 1)  [math]read - [science]read = 0

       F(  1,   197) =    0.06
            Prob > F =    0.8067
Seemingly Unrelated Regression

Seemingly unrelated regressions allows us to estimate multiple models simultaneously while accounting for the correlated errors due to the fact that the models involve the same observations. This leads to efficient estimates of the coefficients and standard errors. By including the corr option with sureg we can also obtain an estimate of the correlation between the errors of the two models. Note that both the estimates of the coefficients and their standard errors are different from the OLS model estimates shown above. The bottom of the sureg output provides a Breusch-Pagan test of whether the residuals from the two equations are independent (in this case, residuals were not independent, chi-square = 6.290, Pr = 0.0121).

use http://www.gseis.ucla.edu/courses/data/hsb2
 
xi: regress write read female i.prog
i.prog            _Iprog_1-3          (naturally coded; _Iprog_1 omitted)

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  4,   195) =   43.58
       Model |  8438.77721     4   2109.6943           Prob > F      =  0.0000
    Residual |  9440.09779   195  48.4107579           R-squared     =  0.4720
-------------+------------------------------           Adj R-squared =  0.4612
       Total |   17878.875   199   89.843593           Root MSE      =  6.9578

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .4912748   .0531607     9.24   0.000     .3864311    .5961185
      female |   5.364957   .9902058     5.42   0.000     3.412069    7.317845
    _Iprog_2 |   1.674342   1.286089     1.30   0.194    -.8620872    4.210771
    _Iprog_3 |  -2.862345   1.442088    -1.98   0.049    -5.706437   -.0182527
       _cons |   24.02837   2.920993     8.23   0.000     18.26758    29.78916
------------------------------------------------------------------------------
 
test read

 ( 1)  read = 0.0

       F(  1,   195) =   85.40
            Prob > F =    0.0000
 
test _Iprog_2 _Iprog_3

 ( 1)  _Iprog_2 = 0.0
 ( 2)  _Iprog_3 = 0.0

       F(  2,   195) =    6.02
            Prob > F =    0.0029
 
xi: regress science math female i.prog
i.prog            _Iprog_1-3          (naturally coded; _Iprog_1 omitted)

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  4,   195) =   36.25
       Model |  8318.90574     4  2079.72643           Prob > F      =  0.0000
    Residual |  11188.5943   195  57.3774065           R-squared     =  0.4264
-------------+------------------------------           Adj R-squared =  0.4147
       Total |    19507.50   199  98.0276382           Root MSE      =  7.5748

------------------------------------------------------------------------------
     science |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        math |   .6954811   .0653359    10.64   0.000     .5666254    .8243368
      female |  -2.113129   1.076644    -1.96   0.051    -4.236491    .0102329
    _Iprog_2 |  -3.271645    1.41948    -2.30   0.022    -6.071149   -.4721421
    _Iprog_3 |  -2.705079   1.574137    -1.72   0.087    -5.809598    .3994395
       _cons |   18.78194   3.526991     5.33   0.000     11.82599    25.73788
------------------------------------------------------------------------------
 
test math

 ( 1)  math = 0.0

       F(  1,   195) =  113.31
            Prob > F =    0.0000
 
test _Iprog_2 _Iprog_3

 ( 1)  _Iprog_2 = 0.0
 ( 2)  _Iprog_3 = 0.0

       F(  2,   195) =    2.84
            Prob > F =    0.0611
 
xi: sureg (write read female i.prog) (science math female i.prog), corr small
i.prog            _Iprog_1-3          (naturally coded; _Iprog_1 omitted)

Seemingly unrelated regression
----------------------------------------------------------------------
Equation          Obs  Parms        RMSE    "R-sq"     F-Stat        P
----------------------------------------------------------------------
write             200      4    6.970941    0.4700      41.20   0.0000
science           200      4    7.587139    0.4246      33.52   0.0000
----------------------------------------------------------------------

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
write        |
        read |   .4456005    .051933     8.58   0.000     .3434969    .5477041
      female |   5.309756   .9777063     5.43   0.000     3.387521     7.23199
    _Iprog_2 |   1.967999    1.26896     1.55   0.122    -.5268599    4.462858
    _Iprog_3 |  -3.024374    1.42369    -2.12   0.034    -5.823442   -.2253068
       _cons |   26.33036   2.858429     9.21   0.000     20.71051    31.95022
-------------+----------------------------------------------------------------
science      |
        math |   .6433571    .063827    10.08   0.000     .5178691    .7688452
      female |  -2.148248   1.063082    -2.02   0.044    -4.238337   -.0581596
    _Iprog_2 |  -2.921167   1.400201    -2.09   0.038    -5.674053   -.1682801
    _Iprog_3 |  -2.892607   1.553968    -1.86   0.063    -5.947811     .162596
       _cons |   21.40802   3.450343     6.20   0.000     14.62442    28.19162
------------------------------------------------------------------------------

Correlation matrix of residuals:

           write  science
  write   1.0000
science   0.1773   1.0000

Breusch-Pagan test of independence: chi2(1) =     6.290, Pr = 0.0121
 
test math

 ( 1)  [science]math = 0.0


       F(  1,   393) =   30.93
            Prob > F =    0.0000
 
test read

 ( 1)  [write]read = 0.0

       F(  1,   393) =   31.75
            Prob > F =    0.0000
 
test _Iprog_2 _Iprog_3

 ( 1)  [write]_Iprog_2 = 0.0
 ( 2)  [science]_Iprog_2 = 0.0
 ( 3)  [write]_Iprog_3 = 0.0
 ( 4)  [science]_Iprog_3 = 0.0

       F(  2,   393) =    8.31
            Prob > F =    0.0003
Second Example

The ultimate in seemingly unrelated regression occurs when there are equations with no variables in common.

xi: regress socst i.prog write
i.prog            _Iprog_1-3          (naturally coded; _Iprog_1 omitted)

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   46.50
       Model |  9537.34999     3  3179.11666           Prob > F      =  0.0000
    Residual |   13398.845   196  68.3614541           R-squared     =  0.4158
-------------+------------------------------           Adj R-squared =  0.4069
       Total |   22936.195   199  115.257261           Root MSE      =  8.2681

------------------------------------------------------------------------------
       socst |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _Iprog_2 |   3.302177   1.510935     2.19   0.030     .3223989    6.281954
    _Iprog_3 |  -2.985748   1.727315    -1.73   0.085    -6.392257    .4207608
       write |   .5672562   .0681868     8.32   0.000     .4327823    .7017301
       _cons |   21.48085   3.710919     5.79   0.000     14.16239     28.7993
------------------------------------------------------------------------------

test _Iprog_2 _Iprog_3

 ( 1)  _Iprog_2 = 0.0
 ( 2)  _Iprog_3 = 0.0

       F(  2,   196) =    8.40
            Prob > F =    0.0003

regress science math read

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  2,   197) =   90.27
       Model |  9328.73944     2  4664.36972           Prob > F      =  0.0000
    Residual |  10178.7606   197  51.6688353           R-squared     =  0.4782
-------------+------------------------------           Adj R-squared =  0.4729
       Total |    19507.50   199  98.0276382           Root MSE      =  7.1881

------------------------------------------------------------------------------
     science |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        math |   .4017207   .0725922     5.53   0.000     .2585632    .5448782
        read |   .3654205   .0663299     5.51   0.000     .2346128    .4962282
       _cons |    11.6155   3.054262     3.80   0.000     5.592255    17.63875
------------------------------------------------------------------------------

xi: sureg (socst i.prog write) (science math read), small
i.prog            _Iprog_1-3          (naturally coded; _Iprog_1 omitted)

Seemingly unrelated regression
----------------------------------------------------------------------
Equation          Obs  Parms        RMSE    "R-sq"     F-Stat        P
----------------------------------------------------------------------
socst             200      3    8.268303    0.4158      47.67   0.0000
science           200      2    7.188272    0.4782      92.77   0.0000
----------------------------------------------------------------------

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
socst        |
    _Iprog_2 |   3.177816   1.495345     2.13   0.034     .2379403    6.117692
    _Iprog_3 |  -3.030467   1.709478    -1.77   0.077    -6.391332    .3303987
       write |   .5720854   .0674874     8.48   0.000     .4394039    .7047669
       _cons |   21.30246   3.672902     5.80   0.000     14.08146    28.52345
-------------+----------------------------------------------------------------
science      |
        math |   .4005705   .0720278     5.56   0.000     .2589625    .5421784
        read |   .3708297   .0658133     5.63   0.000     .2414395    .5002198
       _cons |   11.39354   3.030858     3.76   0.000     5.434813    17.35226
------------------------------------------------------------------------------

test _Iprog_2 _Iprog_3

 ( 1)  [socst]_Iprog_2 = 0.0
 ( 2)  [socst]_Iprog_3 = 0.0

       F(  2,   393) =    8.31
            Prob > F =    0.0003

test write=read

 ( 1)  [socst]write - [science]read = 0.0

       F(  1,   393) =    4.54
            Prob > F =    0.0338


Multivariate Course Page

Phil Ender, 23apr05, 21may02