In this unit we will explore some properties of principle components.
We will start with the variables read and write which correlate about .6. There are two eigenvalues which are the variances of the two principle components. The larger variance is about four times the smaller.
use http://www.gseis.ucla.edu/courses/data/hsb2 corr read write corr read write (obs=200) | read write -------------+------------------ read | 1.0000 write | 0.5968 1.0000 scatter read write pca read write (obs=200) (principal components; 2 components retained) Component Eigenvalue Difference Proportion Cumulative ------------------------------------------------------------------ 1 1.59678 1.19355 0.7984 0.7984 2 0.40322 . 0.2016 1.0000 Eigenvectors Variable | 1 2 -------------+--------------------- read | 0.70711 0.70711 write | 0.70711 -0.70711 predict f1 f2 (based on unrotated principal components) Scoring Coefficients Variable | 1 2 -------------+--------------------- read | 0.70711 0.70711 write | 0.70711 -0.70711 tabstat f1 f2, stat(mean sd var) col(stat) variable | mean sd variance -------------+------------------------------ f1 | 3.56e-09 1.263636 1.596776 f2 | 2.44e-09 .6349988 .4032235 -------------------------------------------- corr f1 f2 (obs=200) | f1 f2 -------------+------------------ f1 | 1.0000 f2 | 0.0000 1.0000 scatter f1 f2, xline(0) yline(0)Next, we will create a random normal variable rnorm. The correlation is close to zero and the two eigenvalues are very nearly equal.
generate rnorm = invnorm(uniform()) corr read rnorm (obs=200) | read rnorm -------------+------------------ read | 1.0000 rnorm | -0.0539 1.0000 scatter read rnorm pca read rnorm (obs=200) (principal components; 2 components retained) Component Eigenvalue Difference Proportion Cumulative ------------------------------------------------------------------ 1 1.05392 0.10785 0.5270 0.5270 2 0.94608 . 0.4730 1.0000 Eigenvectors Variable | 1 2 -------------+--------------------- read | -0.70711 0.70711 rnorm | 0.70711 0.70711Now we will create a variable that is highly correlated with read and call it read2. The correlation is about .92 and almost all of the variance falls in the first principle component.
generate read2 = read + 15*uniform() corr read read2 (obs=200) | read read2 -------------+------------------ read | 1.0000 read2 | 0.9231 1.0000 scatter read read2 pca read read2 (obs=200) (principal components; 2 components retained) Component Eigenvalue Difference Proportion Cumulative ------------------------------------------------------------------ 1 1.92305 1.84611 0.9615 0.9615 2 0.07695 . 0.0385 1.0000 Eigenvectors Variable | 1 2 -------------+--------------------- read | 0.70711 0.70711 read2 | 0.70711 -0.70711Finally, we will do two linear transformations of our original variables read and write. The first transformation will create deviation scores amd the second transformation will create standard score. Note that the eigenvalues and eigenvectors are the same in each case.
summarize read Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- read | 200 52.23 10.25294 28 76 generate dread = read-r(mean) summarize write Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- write | 200 52.775 9.478586 31 67 generate dwrite = write-r(mean) egen zread=std(read) egen zwrite=std(write) pca read write (obs=200) (principal components; 2 components retained) Component Eigenvalue Difference Proportion Cumulative ------------------------------------------------------------------ 1 1.59678 1.19355 0.7984 0.7984 2 0.40322 . 0.2016 1.0000 Eigenvectors Variable | 1 2 -------------+--------------------- read | 0.70711 0.70711 write | 0.70711 -0.70711 pca dread dwrite (obs=200) (principal components; 2 components retained) Component Eigenvalue Difference Proportion Cumulative ------------------------------------------------------------------ 1 1.59678 1.19355 0.7984 0.7984 2 0.40322 . 0.2016 1.0000 Eigenvectors Variable | 1 2 -------------+--------------------- dread | 0.70711 0.70711 dwrite | 0.70711 -0.70711 pca zread zwrite (obs=200) (principal components; 2 components retained) Component Eigenvalue Difference Proportion Cumulative ------------------------------------------------------------------ 1 1.59678 1.19355 0.7984 0.7984 2 0.40322 . 0.2016 1.0000 Eigenvectors Variable | 1 2 -------------+--------------------- zread | 0.70711 0.70711 zwrite | 0.70711 -0.70711
Multivariate Course Page
Phil Ender, 25may02; 29jan98