Multivariate Analysis
Canonical Correlation Analysis


According to Webster

ca•non•i•cal   | kə'nänikəl |
reduced to the simplest or clearest schema possible.

From the Hacker's Dictionary
The usual or standard state or manner of something. This word has a somewhat more technical meaning in mathematics. Two formulas such as 9 + x and x + 9 are said to be equivalent because they mean the same thing, but the second one is in 'canonical form' because it is written in the usual way, with the highest power of x first.

In the beginning...

Consider two sets of variables:

Construct

Construct the linear combinations:

Such that rzw is a maximum.

Let

The XY Matrix

Consider a matrix XY made up of p Y's and q X's.

Partitioning the Covariance Matrix

Let S be the XY covariance matrix, thus,

And

Thus, the sum of squared deviation scores can be obtained without transforming the raw scores.

Criteria

Choose u and v such that

Compute

Now let

Let μi2 = eigenvalues of A

Let u = eigenvectors of A

Next let

A & B will have the same eigenvalues.

Let v = eigenvectors of B.

Computing v

Let

Canonical Correlation

Eigenvalues of A are canonical correlations squared, therefore

Computational Notes

Matrix A is not symmetric so we will need to go through some additional steps in order to get the eigenvalues and eigenvectors using the symeigen command.

1)  C = Syx*Sxx-1*Sxy
2)  F = cholesky(Syy-1)
3)  D = F'*C*F
4)  symeigen W L = D    /* L has eigenvalues  of A */
5)  U = F*W             /* U has eigenvectors of A */

Remember the elements of L are μi2

Different Eigenvalues

Each canonical correlation has an eigenvalue related to Wilks' Lambda.

Tests of Significance

Wilks' Lambdas

Compute m = n -3/2 - (p+q)/2 once.

The following are repeated with one being subtracted from p and q until either is equal to one. Thus,

First time     p=3    q=5
Second time    p=2    q=4
Third time     p=1    q=3

df1 = pq

df2 = ms - pq/2 + 1

Rao's F Approximation

with df1 and df2 degrees of freedom.

Canonical Redundancy Coefficients

A measure of association between two sets of variables.

This measure is asymmetric:

R2x.y is the redundancy of set X given set Y

R2y.x is the redundancy of set Y given set X.

Canonical Redundancy Note

Rc2 is an estimate of the shared variance of two linear combinations of variables and not of the variance of the variables themselves. Thus, even when Rc2 is high, the redundancy of Y, X, or both may be very low.

Although it is always possible to compute both R2x.y and R2y.x, it is not always the case that both redundancy measures are meaningful. For example, when the Y variables are true dependent variables, R2y.x is useful while R2x.y does not make sense.

Redundancy

Each is a weighted sum of the squared canonical correlations, proportional to the aggregate variance of the variables in the set accounted for by successive canonical variates of that set.

What Canonical Correlation Analysis Does...

Best

Questions concerning the number and nature of mutually independent relations (dimensions) between two sets of variables.

Mediocre

Questions concerning the degree of overlap or redundancy between two sets of variables.

Not Very Well

Questions concerning the similarity between two within-set correlation or covariance matrices.

Stata Example

Stata has completely rewritten their canonical correlation procedure in Stata 9.

use http://www.philender.com/courses/data/timm, clear

canon (apt ppvt rpmt) (n s ns na ss), test(1 2 3)


Linear combinations for canonical correlations         Number of obs =      37
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
u1           |
         apt |   .0032264   .0082904     0.39   0.699    -.0135873      .02004
        ppvt |   .0762248   .0152914     4.98   0.000     .0452124    .1072372
        rpmt |   .0141323   .0588196     0.24   0.811    -.1051594    .1334239
-------------+----------------------------------------------------------------
v1           |
           n |  -.0071509    .082705    -0.09   0.932    -.1748843    .1605826
           s |  -.0756585   .0465969    -1.62   0.113    -.1701612    .0188443
          ns |  -.0353218   .0450604    -0.78   0.438    -.1267084    .0560649
          na |   .1579155    .051318     3.08   0.004     .0538377    .2619933
          ss |   .0435563   .0544885     0.80   0.429    -.0669515     .154064
-------------+----------------------------------------------------------------
u2           |
         apt |   -.033176   .0151245    -2.19   0.035      -.06385    -.002502
        ppvt |  -.0016715   .0278969    -0.06   0.953     -.058249     .054906
        rpmt |   .2756473   .1073075     2.57   0.014     .0580176     .493277
-------------+----------------------------------------------------------------
v2           |
           n |   .1599911   .1508828     1.06   0.296    -.1460134    .4659955
           s |    .042124   .0850089     0.50   0.623    -.1302821    .2145302
          ns |   .2381605   .0822059     2.90   0.006     .0714393    .4048817
          na |  -.0594188    .093622    -0.63   0.530    -.2492931    .1304555
          ss |  -.1823911    .099406    -1.83   0.075    -.3839959    .0192137
-------------+----------------------------------------------------------------
u3           |
         apt |   .0358057    .030764     1.16   0.252    -.0265865    .0981979
        ppvt |  -.0482553   .0567435    -0.85   0.401    -.1633363    .0668258
        rpmt |   .2104353   .2182681     0.96   0.341     -.232233    .6531035
-------------+----------------------------------------------------------------
v3           |
           n |   .0992871   .3069021     0.32   0.748    -.5231393    .7217135
           s |   .1746239   .1729119     1.01   0.319    -.1760577    .5253054
          ns |  -.0100806   .1672103    -0.06   0.952    -.3491988    .3290376
          na |  -.2290303   .1904313    -1.20   0.237    -.6152428    .1571822
          ss |   .2019493   .2021962     1.00   0.325    -.2081236    .6120222
------------------------------------------------------------------------------
                                     (Standard errors estimated conditionally)
Canonical correlations:
  0.7165  0.4906  0.2668

----------------------------------------------------------------------------
Tests of significance of all canonical correlations

                         Statistic      df1      df2            F     Prob>F
         Wilks' lambda     .343169       15  80.4576       2.5381     0.0039 a
        Pillai's trace     .825289       15       93       2.3529     0.0066 a
Lawley-Hotelling trace     1.44876       15       83       2.6722     0.0023 a
    Roy's largest root     1.05512        5       31       6.5417     0.0003 u
----------------------------------------------------------------------------
Test of significance of canonical correlations 1-3

                         Statistic      df1      df2            F     Prob>F
         Wilks' lambda     .343169       15  80.4576       2.5381     0.0039 a
----------------------------------------------------------------------------
Test of significance of canonical correlations 2-3

                         Statistic      df1      df2            F     Prob>F
         Wilks' lambda     .705252        8       60       1.4308     0.2025 e
----------------------------------------------------------------------------
Test of significance of canonical correlation 3

                         Statistic      df1      df2            F     Prob>F
         Wilks' lambda      .92883        3       31       0.7918     0.5078 e
----------------------------------------------------------------------------
                            e = exact, a = approximate, u = upper bound on F

canon, stdcoef


Canonical correlation analysis                         Number of obs =      37

Standardized coefficients for the first variable set

                 |        1         2         3 
    -------------+------------------------------
             apt |   0.0713   -0.7332    0.7913 
            ppvt |   0.9548   -0.0209   -0.6044 
            rpmt |   0.0437    0.8531    0.6513 
    --------------------------------------------


Standardized coefficients for the second variable set

                 |        1         2         3 
    -------------+------------------------------
               n |  -0.0211    0.4719    0.2928 
               s |  -0.3835    0.2135    0.8850 
              ns |  -0.2244    1.5132   -0.0640 
              na |   1.1438   -0.4304   -1.6589 
              ss |   0.2774   -1.1618    1.2864 
    --------------------------------------------

Canonical correlations:
  0.7165  0.4906  0.2668
  
estat correlations

Correlations for variable list 1

                 |      apt      ppvt      rpmt 
    -------------+------------------------------
             apt |   1.0000                     
            ppvt |   0.3703    1.0000           
            rpmt |   0.2114    0.3548    1.0000 
    --------------------------------------------

Correlations for variable list 2

                 |        n         s        ns        na        ss 
    -------------+--------------------------------------------------
               n |   1.0000                                         
               s |   0.4007    1.0000                               
              ns |   0.5370    0.3523    1.0000                     
              na |   0.6481    0.6478    0.7136    1.0000           
              ss |   0.6704    0.4252    0.7695    0.7951    1.0000 
    ----------------------------------------------------------------

Correlations between variable lists 1 and 2

                 |      apt      ppvt      rpmt 
    -------------+------------------------------
               n |   0.1860    0.4444    0.3504 
               s |   0.1609    0.2682    0.2386 
              ns |   0.0685    0.4692    0.4388 
              na |   0.2617    0.6720    0.3390 
              ss |   0.3341    0.5876    0.3404 
    --------------------------------------------


estat loadings

Canonical loadings for variable list 1

                 |        1         2         3 
    -------------+------------------------------
             apt |   0.4341   -0.5606    0.7052 
            ppvt |   0.9967    0.0102   -0.0803 
            rpmt |   0.3976    0.6906    0.6041 
    --------------------------------------------

Canonical loadings for variable list 2

                 |        1         2         3 
    -------------+------------------------------
               n |   0.6320    0.3122    0.4004 
               s |   0.3879    0.1630    0.4521 
              ns |   0.6588    0.6406    0.2112 
              na |   0.9422    0.1697    0.0814 
              ss |   0.8371    0.0675    0.4906 
    --------------------------------------------

Correlation between variable list 1 and canonical variates from list 2

                 |        1         2         3 
    -------------+------------------------------
             apt |   0.3111   -0.2750    0.1881 
            ppvt |   0.7142    0.0050   -0.0214 
            rpmt |   0.2849    0.3388    0.1612 
    --------------------------------------------

Correlation between variable list 2 and canonical variates from list 1

                 |        1         2         3 
    -------------+------------------------------
               n |   0.4529    0.1532    0.1068 
               s |   0.2780    0.0800    0.1206 
              ns |   0.4721    0.3143    0.0563 
              na |   0.6751    0.0833    0.0217 
              ss |   0.5998    0.0331    0.1309 
    --------------------------------------------

canred 1  /* findit canred */

Canonical redundancy analysis for canonical correlation 1

Canonical correlation coefficient          0.7165
Squared canonical correlation coefficient  0.5134

                                       own    opposite
Proportion of standardized variance  variate   variate 
           of u variables with ...    0.4467    0.2293
           of v variables with ...    0.5145    0.2641

canred 2  /* findit canred */

Canonical redundancy analysis for canonical correlation 2

Canonical correlation coefficient          0.4906
Squared canonical correlation coefficient  0.2407

                                       own    opposite
Proportion of standardized variance  variate   variate 
           of u variables with ...    0.2638    0.0635
           of v variables with ...    0.1136    0.0273


Multivariate Course Page

Phil Ender, 2may05, 29Jan98