Lecture Notes: Introduction

Ed231A
Multivariate Analysis

Introduction

Introduction to Education 231A

Multivariate Analysis

Instructor: Phil Ender

Moore Hall 3030

(310) 206-3195

Textbook:

Computer-Aided Multivariate Analysis (4th Edition)
by Afifi, Clark and May
Publisher: Chapman & Hall/CRC
Year: 2004
ISBN 1-58488-308-1

You can view textbook examples for this book using several different statistical software packages at the ATS website: Afifi, Clark & May -- Textbook Examples.

Topics Covered by Afifi et al vs Lecture

Textbook                          Lecture
                                  matrix algebra
simple linear regression          simple linear regression
multiple linear regression        multiple linear regression
                                  multivariate multiple regression
                                  Hotellings T²
                                  multivariate analysis of variance
canonical correlation             canonical correlation
discriminant analysis             discriminant analysis
logistic regression               probit regression
survival analysis
principal components analysis     principal components analysis
factor analysis                   factor analysis
cluster analysis                  cluster analysis
log-linear analysis

Course Organization

No exams

10 Computer Assignments

Programming using either Stata, SAS or R

Note: There will be class the Wednesday before Thanksgiving

Electronic Support

Multivariate Course Webpage

http://www.philender.com/courses/multivariate/

Syllabus

Lecture Notes

Help Sheets

Computer Assignments

ed231a_583244200_ender

Lecture Notes

Lectures will be used in class.

Lectures will be available on the Multivariate Course Web site.

About Assignments

Write your own programs

It is usually obvious when people copy someone else's program

Make programs general

Include comments & labels

Computers Running Stata

16 Macs in Moore Hall*

20 Macs in GSE&IS Building*

Macs & PCs in CLICC Labs in Powell Library

PCs in Social Sciences Computing Lab**

*May Require Technology Fee
**Social Science students only

Relative Course Difficulty

Let's get started...

What makes a model multivariate?

Is multiple regression multivariate?

The Afifi, Clark & May view of multivariate.

Every model has a

lhs - left hand side, and a
rhs - right hand side
model lhs = rhs

lhs variables are response variables (the so called dependent variables, outcome variables).
rhs variables are predictor or explanatory variables (aka independent variables).

Here are two univariate models.

y = x
y = x1 x2 x3

And two multivariate models.

y1 y2 y3 = x
y1 y2 y3 = x1 x2 x3

For the purposes of this class, multivariate will be taken to mean models with multiple lhs variables.

The concept of right hand side and left hand side equivalence.
There are times when rhs variables and lhs variables an be exchanged and the two models can yield the same results.

y1 y2 y3 = x
x = y1 y2 y3

Examples:

/* multivariate anova -- female is a rhs variable */
manova read write math = female

                           Number of obs =     200

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                  female | W   0.8501      1     3.0   196.0    11.52 0.0000 e
                         | P   0.1499            3.0   196.0    11.52 0.0000 e
                         | L   0.1763            3.0   196.0    11.52 0.0000 e
                         | R   0.1763            3.0   196.0    11.52 0.0000 e
                         |--------------------------------------------------
                Residual |               198
              -----------+--------------------------------------------------
                   Total |               199
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* OLS regression -- female is a lhs variable */
/* in SAS: model female = read write math     */
regress female read write math

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   11.52
       Model |  7.43351627     3  2.47783876           Prob > F      =  0.0000
    Residual |  42.1614837   196  .215109611           R-squared     =  0.1499
-------------+------------------------------           Adj R-squared =  0.1369
       Total |      49.595   199  .249221106           Root MSE      =   .4638

------------------------------------------------------------------------------
      female |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |  -.0112975   .0045153    -2.50   0.013    -.0202023   -.0023926
       write |   .0270844   .0046522     5.82   0.000     .0179095    .0362593
        math |  -.0102947   .0050408    -2.04   0.042     -.020236   -.0003535
       _cons |   .2476519   .2099033     1.18   0.239    -.1663071     .661611
------------------------------------------------------------------------------

The role of matrix algebra in multivariate analysis.

Matrix algebra gives us a concise and elegant way in which to represent multivariate models. If you are intimidated by it, please realize that the alternatives to matrix representation are worse.

Consider this univariate multiple regression model

b = (X'X)^-1X'y

Contrast it with this multivariate multiple regression model

B = (X'X)^-1X'Y

Some Examples of Multivariate Generalization of Univariate Models

These examples are in stat package pseudo-code

Regression:
model       y  = x1            /* simple linear regression */
model       y  = x1 x2 x3      /* multiple linear regression */
model y1 y2 y3 = x1 x2 x3      /* multivariate multiple regression */

Probit Analysis (the z's are binary, 0/1, variables):
model       z  = x1            /* simple probit analysis */
model       z  = x1 x2 x3      /* multiple probit analysis */
model z1 z2 z3 = x1 x2 x3      /* multivariate probit analysis */

Correlation:
model           ry,x           /* Pearson correlation */
model           Ry.x1,x2,x3    /* multiple correlation */
model R_C y1,y2,y3 = x1,x2,x3   /* cannonical correlation */

Anova:
model       y  = a             /* one-way anova */
model       y  = a b a*b       /* two-way anova */
model y1 y2 y3 = a             /* one-way multivariate anova (manova) */
model y1 y2 y3 = a b a*b       /* two-way multivariate anova (manova) */

Classifying Multivariate Models

I. Testing effects; discriminating among groups

II. Simplification of variable structure; determining dimensionality; rank reduction

III. Other

Some Multivariate Analogs to Univariate Procedures

anova -> manova

multiple linear regression -> multivariate multiple regression

multiple linear regression -> canonical correlation analysis

To be a well behaved multivariate analog the multivariate procedure with one response variable should yield equivalent results as the univariate proecedure.

Examples:

ttest write, by(female)

Two-sample t test with equal variances

------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
    male |      91    50.12088    1.080274    10.30516    47.97473    52.26703
  female |     109    54.99083    .7790686    8.133715    53.44658    56.53507
---------+--------------------------------------------------------------------
combined |     200      52.775    .6702372    9.478586    51.45332    54.09668
---------+--------------------------------------------------------------------
    diff |           -4.869947    1.304191               -7.441835   -2.298059
------------------------------------------------------------------------------
Degrees of freedom: 198

                  Ho: mean(male) - mean(female) = diff = 0

     Ha: diff < 0               Ha: diff != 0              Ha: diff > 0
       t =  -3.7341                t =  -3.7341              t =  -3.7341
   P < t =   0.0001          P > |t| =   0.0002          P > t =   0.9999

hotel write, by(female) notable

2-group Hotelling's T-squared = 13.943308
F test statistic: ((200-1-1)/(200-2)(1)) x 13.943308 = 13.943308

H0: Vectors of means are equal for the two groups
              F(1,198) =   13.9433
       Prob > F(1,198) =    0.0002

display sqrt(r(T2))
3.7340739

anova write prog

                           Number of obs =     200     R-squared     =  0.1776
                           Root MSE      = 8.63918     Adj R-squared =  0.1693

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  3175.69786     2  1587.84893      21.27     0.0000
                         |
                    prog |  3175.69786     2  1587.84893      21.27     0.0000
                         |
                Residual |  14703.1771   197   74.635417   
              -----------+----------------------------------------------------
                   Total |   17878.875   199   89.843593   

manova write = prog

                           Number of obs =     200

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                    prog | W   0.8224      2     2.0   197.0    21.27 0.0000 e
                         | P   0.1776            2.0   197.0    21.27 0.0000 e
                         | L   0.2160            2.0   197.0    21.27 0.0000 e
                         | R   0.2160            2.0   197.0    21.27 0.0000 e
                         |--------------------------------------------------
                Residual |               197
              -----------+--------------------------------------------------
                   Total |               199
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

regress write read female

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  2,   197) =   77.21
       Model |  7856.32118     2  3928.16059           Prob > F      =  0.0000
    Residual |  10022.5538   197  50.8759077           R-squared     =  0.4394
-------------+------------------------------           Adj R-squared =  0.4337
       Total |   17878.875   199   89.843593           Root MSE      =  7.1327

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .5658869   .0493849    11.46   0.000      .468496    .6632778
      female |   5.486894   1.014261     5.41   0.000      3.48669    7.487098
       _cons |   20.22837   2.713756     7.45   0.000     14.87663    25.58011
------------------------------------------------------------------------------

display sqrt(.4394192130387506) /* multiple correlation */

.66288703

mvreg write = read female

Equation          Obs  Parms        RMSE    "R-sq"          F        P
----------------------------------------------------------------------
write             200      3    7.132735    0.4394   77.21062   0.0000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
write        |
        read |   .5658869   .0493849    11.46   0.000      .468496    .6632778
      female |   5.486894   1.014261     5.41   0.000      3.48669    7.487098
       _cons |   20.22837   2.713756     7.45   0.000     14.87663    25.58011
------------------------------------------------------------------------------


canon (write) (read female)

Linear combinations for canonical correlation 1        Number of obs =     200
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
u            |
       write |    .105501   .0084684    12.46   0.000     .0888016    .1222004
-------------+----------------------------------------------------------------
v            |
        read |    .090063   .0078598    11.46   0.000     .0745639    .1055622
      female |   .8732598   .1614235     5.41   0.000     .5549397     1.19158
------------------------------------------------------------------------------
                                     (Standard errors estimated conditionally)
Canonical correlations:
  0.6629
  
display .66288703^2  /* canonical correlation squared */
  
.43941921

Multivariate Course Page

Phil Ender, 12jul07, 30sep05, 24jan05

Ed231AMultivariate Analysis Introduction

Introduction to Education 231A

Ed231A
Multivariate Analysis

Introduction