Multivariate Analysis
More Principal Components


In this unit we will explore some properties of principle components.

We will start with the variables read and write which correlate about .6. There are two eigenvalues which are the variances of the two principle components. The larger variance is about four times the smaller.

use http://www.gseis.ucla.edu/courses/data/hsb2
 
corr read write
 
corr read write
(obs=200)
 
             |     read    write
-------------+------------------
        read |   1.0000
       write |   0.5968   1.0000
 
scatter read write
 

 
pca read write
(obs=200)
 
            (principal components; 2 components retained)
Component    Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
     1        1.59678         1.19355      0.7984         0.7984
     2        0.40322               .      0.2016         1.0000
 
               Eigenvectors
    Variable |      1          2
-------------+---------------------
        read |   0.70711    0.70711
       write |   0.70711   -0.70711
       
predict f1 f2
            (based on unrotated principal components)
               Scoring Coefficients
    Variable |      1          2
-------------+---------------------
        read |   0.70711    0.70711
       write |   0.70711   -0.70711
 
tabstat f1 f2, stat(mean sd var) col(stat)
 
    variable |      mean        sd  variance
-------------+------------------------------
          f1 |  3.56e-09  1.263636  1.596776
          f2 |  2.44e-09  .6349988  .4032235
--------------------------------------------
 
corr f1 f2
(obs=200)
 
             |       f1       f2
-------------+------------------
          f1 |   1.0000
          f2 |   0.0000   1.0000
 
scatter f1 f2, xline(0) yline(0)
 
Next, we will create a random normal variable rnorm. The correlation is close to zero and the two eigenvalues are very nearly equal.
generate rnorm = invnorm(uniform())
 
corr read rnorm
(obs=200)
 
             |     read    rnorm
-------------+------------------
        read |   1.0000
       rnorm |  -0.0539   1.0000
 
scatter read rnorm
 

 
pca read rnorm
(obs=200)
 
            (principal components; 2 components retained)
Component    Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
     1        1.05392         0.10785      0.5270         0.5270
     2        0.94608               .      0.4730         1.0000
 
               Eigenvectors
    Variable |      1          2
-------------+---------------------
        read |  -0.70711    0.70711
       rnorm |   0.70711    0.70711
Now we will create a variable that is highly correlated with read and call it read2. The correlation is about .92 and almost all of the variance falls in the first principle component.
generate read2 = read + 15*uniform()

corr read read2
(obs=200)

             |     read    read2
-------------+------------------
        read |   1.0000
       read2 |   0.9231   1.0000

scatter read read2



pca read read2
(obs=200)
 
            (principal components; 2 components retained)
Component    Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
     1        1.92305         1.84611      0.9615         0.9615
     2        0.07695               .      0.0385         1.0000
 
               Eigenvectors
    Variable |      1          2
-------------+---------------------
        read |   0.70711    0.70711
       read2 |   0.70711   -0.70711
Finally, we will do two linear transformations of our original variables read and write. The first transformation will create deviation scores amd the second transformation will create standard score. Note that the eigenvalues and eigenvectors are the same in each case.
summarize read
 
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        read |       200       52.23    10.25294         28         76
 
generate dread = read-r(mean)
 
summarize write

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       write |       200      52.775    9.478586         31         67
 
generate dwrite = write-r(mean)
 
egen zread=std(read)
egen zwrite=std(write)
 
pca read write
(obs=200)
 
            (principal components; 2 components retained)
Component    Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
     1        1.59678         1.19355      0.7984         0.7984
     2        0.40322               .      0.2016         1.0000
 
               Eigenvectors
    Variable |      1          2
-------------+---------------------
        read |   0.70711    0.70711
       write |   0.70711   -0.70711

pca dread dwrite
(obs=200)
 
            (principal components; 2 components retained)
Component    Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
     1        1.59678         1.19355      0.7984         0.7984
     2        0.40322               .      0.2016         1.0000

               Eigenvectors
    Variable |      1          2
-------------+---------------------
       dread |   0.70711    0.70711
      dwrite |   0.70711   -0.70711

pca zread zwrite
(obs=200)

            (principal components; 2 components retained)
Component    Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
     1        1.59678         1.19355      0.7984         0.7984
     2        0.40322               .      0.2016         1.0000
 
               Eigenvectors
    Variable |      1          2
-------------+---------------------
       zread |   0.70711    0.70711
      zwrite |   0.70711   -0.70711


Multivariate Course Page

Phil Ender, 25may02; 29jan98