Multivariate multiple regression is a logical extension of the multiple regression concept to allow for multiple response (dependent) variables. Multivariate regression estimates the same coefficients and standard errors as one would obtain using separate OLS regressions. In addition, multivariate regression, being a joint estimator, also estimates the between-equation covariances. This means that it is possible to test coefficient across equations.
The matrix formula for multivariate regression is virtually identical to the OLS formula with the only change being that Y is a matrix response variables and not a vector.
Stata Example
use http://www.gseis.ucla.edu/courses/data/hsb2 xi: regress read female i.prog i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 14.45 Model | 3789.28412 3 1263.09471 Prob > F = 0.0000 Residual | 17130.1359 196 87.3986524 R-squared = 0.1811 -------------+------------------------------ Adj R-squared = 0.1686 Total | 20919.42 199 105.122714 Root MSE = 9.3487 ------------------------------------------------------------------------------ read | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | -1.208582 1.327672 -0.91 0.364 -3.826939 1.409774 _Iprog_2 | 6.42937 1.665893 3.86 0.000 3.143993 9.714746 _Iprog_3 | -3.547498 1.921001 -1.85 0.066 -7.335983 .2409862 _cons | 50.40013 1.563197 32.24 0.000 47.31729 53.48298 ------------------------------------------------------------------------------ xi: regress write female i.prog i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 20.72 Model | 4304.40272 3 1434.80091 Prob > F = 0.0000 Residual | 13574.4723 196 69.2575116 R-squared = 0.2408 -------------+------------------------------ Adj R-squared = 0.2291 Total | 17878.875 199 89.843593 Root MSE = 8.3221 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 4.771211 1.181876 4.04 0.000 2.440385 7.102037 _Iprog_2 | 4.832929 1.482956 3.26 0.001 1.908331 7.757528 _Iprog_3 | -4.605141 1.710049 -2.69 0.008 -7.9776 -1.232683 _cons | 48.78869 1.391537 35.06 0.000 46.04438 51.533 ------------------------------------------------------------------------------ xi: regress math female i.prog i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 19.56 Model | 4024.61221 3 1341.5374 Prob > F = 0.0000 Residual | 13441.1828 196 68.5774632 R-squared = 0.2304 -------------+------------------------------ Adj R-squared = 0.2186 Total | 17465.795 199 87.7678141 Root MSE = 8.2812 ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | -.6737673 1.176059 -0.57 0.567 -2.993122 1.645587 _Iprog_2 | 6.723945 1.475657 4.56 0.000 3.81374 9.634149 _Iprog_3 | -3.59773 1.701633 -2.11 0.036 -6.953591 -.2418702 _cons | 50.38156 1.384689 36.38 0.000 47.65076 53.11237 ------------------------------------------------------------------------------ xi: mvreg read write math = female i.prog i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) Equation Obs Parms RMSE "R-sq" F P ---------------------------------------------------------------------- read 200 4 9.348725 0.1811 14.45211 0.0000 write 200 4 8.32211 0.2408 20.7169 0.0000 math 200 4 8.281151 0.2304 19.56237 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | female | -1.208582 1.327672 -0.91 0.364 -3.826939 1.409774 _Iprog_2 | 6.42937 1.665893 3.86 0.000 3.143993 9.714746 _Iprog_3 | -3.547498 1.921001 -1.85 0.066 -7.335983 .2409862 _cons | 50.40013 1.563197 32.24 0.000 47.31729 53.48298 -------------+---------------------------------------------------------------- write | female | 4.771211 1.181876 4.04 0.000 2.440385 7.102037 _Iprog_2 | 4.832929 1.482956 3.26 0.001 1.908331 7.757528 _Iprog_3 | -4.605141 1.710049 -2.69 0.008 -7.9776 -1.232683 _cons | 48.78869 1.391537 35.06 0.000 46.04438 51.533 -------------+---------------------------------------------------------------- math | female | -.6737673 1.176059 -0.57 0.567 -2.993122 1.645587 _Iprog_2 | 6.723945 1.475657 4.56 0.000 3.81374 9.634149 _Iprog_3 | -3.59773 1.701633 -2.11 0.036 -6.953591 -.2418702 _cons | 50.38156 1.384689 36.38 0.000 47.65076 53.11237 ------------------------------------------------------------------------------ test female ( 1) [read]female = 0.0 ( 2) [write]female = 0.0 ( 3) [math]female = 0.0 F( 3, 196) = 11.63 Prob > F = 0.0000 test _Iprog_2 _Iprog_3 ( 1) [read]_Iprog_2 = 0.0 ( 2) [write]_Iprog_2 = 0.0 ( 3) [math]_Iprog_2 = 0.0 ( 4) [read]_Iprog_3 = 0.0 ( 5) [write]_Iprog_3 = 0.0 ( 6) [math]_Iprog_3 = 0.0 F( 6, 196) = 11.83 Prob > F = 0.0000
The same model run using the manova command to get the multivariate tests.
manova read write math = female prog Number of obs = 200 W = Wilks' lambda L = Lawley-Hotelling trace P = Pillai's trace R = Roy's largest root Source | Statistic df F(df1, df2) = F Prob>F -----------+-------------------------------------------------- Model | W 0.6231 3 9.0 472.3 11.26 0.0000 a | P 0.4170 9.0 588.0 10.55 0.0000 a | L 0.5406 9.0 578.0 11.57 0.0000 a | R 0.3642 3.0 196.0 23.79 0.0000 u |-------------------------------------------------- Residual | 196 -----------+-------------------------------------------------- female | W 0.8489 1 3.0 194.0 11.51 0.0000 e | P 0.1511 3.0 194.0 11.51 0.0000 e | L 0.1780 3.0 194.0 11.51 0.0000 e | R 0.1780 3.0 194.0 11.51 0.0000 e |-------------------------------------------------- prog | W 0.7329 2 6.0 388.0 10.87 0.0000 e | P 0.2686 6.0 390.0 10.08 0.0000 a | L 0.3623 6.0 386.0 11.65 0.0000 a | R 0.3564 3.0 195.0 23.16 0.0000 u |-------------------------------------------------- Residual | 196 -----------+-------------------------------------------------- Total | 199 -------------------------------------------------------------- e = exact, a = approximate, u = upper bound on FExample 2
Next, we will perform an mvreg which is equivalent to a factorial multivariate analysis of variance. Using xi3 will ensure that the the main effects are estimated correctly.
xi3: mvreg read write math = e.female*e.prog e.female _Ifemale_0-1 (naturally coded; _Ifemale_0 omitted) e.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) Equation Obs Parms RMSE "R-sq" F P ---------------------------------------------------------------------- read 200 6 9.301994 0.1976 9.553455 0.0000 write 200 6 8.263856 0.2590 13.56062 0.0000 math 200 6 8.32305 0.2306 11.62587 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | _Ifemale_1 | -.8645308 .7076023 -1.22 0.223 -2.260112 .5310502 _Iprog_2 | 5.410261 .8822916 6.13 0.000 3.670146 7.150376 _Iprog_3 | -4.603099 1.039838 -4.43 0.000 -6.653939 -2.55226 _Ife1Xpr2 | .7607157 .8822916 0.86 0.390 -.9793993 2.500831 _Ife1Xpr3 | 1.371777 1.039838 1.32 0.189 -.6790623 3.422617 _cons | 50.76252 .7076023 71.74 0.000 49.36694 52.1581 -------------+---------------------------------------------------------------- write | _Ifemale_1 | 2.702201 .6286312 4.30 0.000 1.462372 3.94203 _Iprog_2 | 4.870758 .7838244 6.21 0.000 3.324847 6.41667 _Iprog_3 | -4.836331 .9237884 -5.24 0.000 -6.658289 -3.014373 _Ife1Xpr2 | -1.217608 .7838244 -1.55 0.122 -2.763519 .3283035 _Ife1Xpr3 | 1.866237 .9237884 2.02 0.045 .0442793 3.688195 _cons | 51.23086 .6286312 81.50 0.000 49.99103 52.47068 -------------+---------------------------------------------------------------- math | _Ifemale_1 | -.323731 .633134 -0.51 0.610 -1.572441 .9249787 _Iprog_2 | 5.684064 .7894389 7.20 0.000 4.127079 7.241049 _Iprog_3 | -4.63014 .9304055 -4.98 0.000 -6.465149 -2.795132 _Ife1Xpr2 | -.0332022 .7894389 -0.04 0.966 -1.590187 1.523783 _Ife1Xpr3 | -.1327907 .9304055 -0.14 0.887 -1.967799 1.702218 _cons | 51.08666 .633134 80.69 0.000 49.83795 52.33537 ------------------------------------------------------------------------------ /* using manova (multivariate analysis of variance) */ manova read write math = female prog female*prog Number of obs = 200 W = Wilks' lambda L = Lawley-Hotelling trace P = Pillai's trace R = Roy's largest root Source | Statistic df F(df1, df2) = F Prob>F ------------+-------------------------------------------------- Model | W 0.5808 5 15.0 530.4 7.69 0.0000 a | P 0.4796 15.0 582.0 7.38 0.0000 a | L 0.6206 15.0 572.0 7.89 0.0000 a | R 0.3762 5.0 194.0 14.59 0.0000 u |-------------------------------------------------- Residual | 194 ------------+-------------------------------------------------- female | W 0.8238 1 3.0 192.0 13.69 0.0000 e | P 0.1762 3.0 192.0 13.69 0.0000 e | L 0.2139 3.0 192.0 13.69 0.0000 e | R 0.2139 3.0 192.0 13.69 0.0000 e |-------------------------------------------------- prog | W 0.7305 2 6.0 384.0 10.88 0.0000 e | P 0.2712 6.0 386.0 10.09 0.0000 a | L 0.3666 6.0 382.0 11.67 0.0000 a | R 0.3602 3.0 193.0 23.17 0.0000 u |-------------------------------------------------- female*prog | W 0.9321 2 6.0 384.0 2.29 0.0347 e | P 0.0691 6.0 386.0 2.30 0.0338 a | L 0.0716 6.0 382.0 2.28 0.0356 a | R 0.0381 3.0 193.0 2.45 0.0646 u |-------------------------------------------------- Residual | 194 ------------+-------------------------------------------------- Total | 199 --------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F
Example 3
Here is another example of multivariate regression. By including the corr option we can see how highly the residuals of the two equation are correlated. We also get the Breusch-Pagan test of independence.
mvreg math science = read write, corr Equation Obs Parms RMSE "R-sq" F P ---------------------------------------------------------------------- math 200 3 6.555315 0.5153 104.7222 0.0000 science 200 3 7.340989 0.4558 82.49331 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- math | read | .4169486 .0564838 7.38 0.000 .3055581 .5283391 write | .3411219 .0610982 5.58 0.000 .2206314 .4616124 _cons | 12.86507 2.82162 4.56 0.000 7.30061 18.42952 -------------+---------------------------------------------------------------- science | read | .4345423 .0632535 6.87 0.000 .3098013 .5592832 write | .3153468 .068421 4.61 0.000 .1804151 .4502784 _cons | 12.51143 3.159799 3.96 0.000 6.280058 18.7428 ------------------------------------------------------------------------------ Correlation matrix of residuals: math science math 1.0000 science 0.2849 1.0000 Breusch-Pagan test of independence: chi2(1) = 16.230, Pr = 0.0001The command test read test whether the coefficient for read is zero in both equations. A more interesting test might be to see whether the coefficient for read is the same in each equation, that is, is the effect of read the same for math as it is for science.
test read ( 1) [math]read = 0 ( 2) [science]read = 0 F( 2, 197) = 39.61 Prob > F = 0.0000 test [math=science]: read ( 1) [math]read - [science]read = 0 F( 1, 197) = 0.06 Prob > F = 0.8067Seemingly Unrelated Regression
Seemingly unrelated regressions allows us to estimate multiple models simultaneously while accounting for the correlated errors due to the fact that the models involve the same observations. This leads to efficient estimates of the coefficients and standard errors. By including the corr option with sureg we can also obtain an estimate of the correlation between the errors of the two models. Note that both the estimates of the coefficients and their standard errors are different from the OLS model estimates shown above. The bottom of the sureg output provides a Breusch-Pagan test of whether the residuals from the two equations are independent (in this case, residuals were not independent, chi-square = 6.290, Pr = 0.0121).
use http://www.gseis.ucla.edu/courses/data/hsb2 xi: regress write read female i.prog i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 4, 195) = 43.58 Model | 8438.77721 4 2109.6943 Prob > F = 0.0000 Residual | 9440.09779 195 48.4107579 R-squared = 0.4720 -------------+------------------------------ Adj R-squared = 0.4612 Total | 17878.875 199 89.843593 Root MSE = 6.9578 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .4912748 .0531607 9.24 0.000 .3864311 .5961185 female | 5.364957 .9902058 5.42 0.000 3.412069 7.317845 _Iprog_2 | 1.674342 1.286089 1.30 0.194 -.8620872 4.210771 _Iprog_3 | -2.862345 1.442088 -1.98 0.049 -5.706437 -.0182527 _cons | 24.02837 2.920993 8.23 0.000 18.26758 29.78916 ------------------------------------------------------------------------------ test read ( 1) read = 0.0 F( 1, 195) = 85.40 Prob > F = 0.0000 test _Iprog_2 _Iprog_3 ( 1) _Iprog_2 = 0.0 ( 2) _Iprog_3 = 0.0 F( 2, 195) = 6.02 Prob > F = 0.0029 xi: regress science math female i.prog i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 4, 195) = 36.25 Model | 8318.90574 4 2079.72643 Prob > F = 0.0000 Residual | 11188.5943 195 57.3774065 R-squared = 0.4264 -------------+------------------------------ Adj R-squared = 0.4147 Total | 19507.50 199 98.0276382 Root MSE = 7.5748 ------------------------------------------------------------------------------ science | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- math | .6954811 .0653359 10.64 0.000 .5666254 .8243368 female | -2.113129 1.076644 -1.96 0.051 -4.236491 .0102329 _Iprog_2 | -3.271645 1.41948 -2.30 0.022 -6.071149 -.4721421 _Iprog_3 | -2.705079 1.574137 -1.72 0.087 -5.809598 .3994395 _cons | 18.78194 3.526991 5.33 0.000 11.82599 25.73788 ------------------------------------------------------------------------------ test math ( 1) math = 0.0 F( 1, 195) = 113.31 Prob > F = 0.0000 test _Iprog_2 _Iprog_3 ( 1) _Iprog_2 = 0.0 ( 2) _Iprog_3 = 0.0 F( 2, 195) = 2.84 Prob > F = 0.0611 xi: sureg (write read female i.prog) (science math female i.prog), corr small i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) Seemingly unrelated regression ---------------------------------------------------------------------- Equation Obs Parms RMSE "R-sq" F-Stat P ---------------------------------------------------------------------- write 200 4 6.970941 0.4700 41.20 0.0000 science 200 4 7.587139 0.4246 33.52 0.0000 ---------------------------------------------------------------------- ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- write | read | .4456005 .051933 8.58 0.000 .3434969 .5477041 female | 5.309756 .9777063 5.43 0.000 3.387521 7.23199 _Iprog_2 | 1.967999 1.26896 1.55 0.122 -.5268599 4.462858 _Iprog_3 | -3.024374 1.42369 -2.12 0.034 -5.823442 -.2253068 _cons | 26.33036 2.858429 9.21 0.000 20.71051 31.95022 -------------+---------------------------------------------------------------- science | math | .6433571 .063827 10.08 0.000 .5178691 .7688452 female | -2.148248 1.063082 -2.02 0.044 -4.238337 -.0581596 _Iprog_2 | -2.921167 1.400201 -2.09 0.038 -5.674053 -.1682801 _Iprog_3 | -2.892607 1.553968 -1.86 0.063 -5.947811 .162596 _cons | 21.40802 3.450343 6.20 0.000 14.62442 28.19162 ------------------------------------------------------------------------------ Correlation matrix of residuals: write science write 1.0000 science 0.1773 1.0000 Breusch-Pagan test of independence: chi2(1) = 6.290, Pr = 0.0121 test math ( 1) [science]math = 0.0 F( 1, 393) = 30.93 Prob > F = 0.0000 test read ( 1) [write]read = 0.0 F( 1, 393) = 31.75 Prob > F = 0.0000 test _Iprog_2 _Iprog_3 ( 1) [write]_Iprog_2 = 0.0 ( 2) [science]_Iprog_2 = 0.0 ( 3) [write]_Iprog_3 = 0.0 ( 4) [science]_Iprog_3 = 0.0 F( 2, 393) = 8.31 Prob > F = 0.0003Second Example
The ultimate in seemingly unrelated regression occurs when there are equations with no variables in common.
xi: regress socst i.prog write i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 46.50 Model | 9537.34999 3 3179.11666 Prob > F = 0.0000 Residual | 13398.845 196 68.3614541 R-squared = 0.4158 -------------+------------------------------ Adj R-squared = 0.4069 Total | 22936.195 199 115.257261 Root MSE = 8.2681 ------------------------------------------------------------------------------ socst | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Iprog_2 | 3.302177 1.510935 2.19 0.030 .3223989 6.281954 _Iprog_3 | -2.985748 1.727315 -1.73 0.085 -6.392257 .4207608 write | .5672562 .0681868 8.32 0.000 .4327823 .7017301 _cons | 21.48085 3.710919 5.79 0.000 14.16239 28.7993 ------------------------------------------------------------------------------ test _Iprog_2 _Iprog_3 ( 1) _Iprog_2 = 0.0 ( 2) _Iprog_3 = 0.0 F( 2, 196) = 8.40 Prob > F = 0.0003 regress science math read Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 90.27 Model | 9328.73944 2 4664.36972 Prob > F = 0.0000 Residual | 10178.7606 197 51.6688353 R-squared = 0.4782 -------------+------------------------------ Adj R-squared = 0.4729 Total | 19507.50 199 98.0276382 Root MSE = 7.1881 ------------------------------------------------------------------------------ science | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- math | .4017207 .0725922 5.53 0.000 .2585632 .5448782 read | .3654205 .0663299 5.51 0.000 .2346128 .4962282 _cons | 11.6155 3.054262 3.80 0.000 5.592255 17.63875 ------------------------------------------------------------------------------ xi: sureg (socst i.prog write) (science math read), small i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) Seemingly unrelated regression ---------------------------------------------------------------------- Equation Obs Parms RMSE "R-sq" F-Stat P ---------------------------------------------------------------------- socst 200 3 8.268303 0.4158 47.67 0.0000 science 200 2 7.188272 0.4782 92.77 0.0000 ---------------------------------------------------------------------- ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- socst | _Iprog_2 | 3.177816 1.495345 2.13 0.034 .2379403 6.117692 _Iprog_3 | -3.030467 1.709478 -1.77 0.077 -6.391332 .3303987 write | .5720854 .0674874 8.48 0.000 .4394039 .7047669 _cons | 21.30246 3.672902 5.80 0.000 14.08146 28.52345 -------------+---------------------------------------------------------------- science | math | .4005705 .0720278 5.56 0.000 .2589625 .5421784 read | .3708297 .0658133 5.63 0.000 .2414395 .5002198 _cons | 11.39354 3.030858 3.76 0.000 5.434813 17.35226 ------------------------------------------------------------------------------ test _Iprog_2 _Iprog_3 ( 1) [socst]_Iprog_2 = 0.0 ( 2) [socst]_Iprog_3 = 0.0 F( 2, 393) = 8.31 Prob > F = 0.0003 test write=read ( 1) [socst]write - [science]read = 0.0 F( 1, 393) = 4.54 Prob > F = 0.0338
Multivariate Course Page
Phil Ender, 23apr05, 21may02