First Thoughts
Many students and researchers are familiar with collinearity issues through the study of OLS regression. But concerns about collinearity are common to many types of statistical models including categorical and count models. Here are some first thoughts on the matter:
Multicollinearity
Common Indicators of Collinearity
Effects of Collinearity
Checking for Collinearity in Stata
Stata Example Using -collin-
Most statistical software packages have options associated with their regression programs that are designed to check for collinearity problems. But since collinearity is a property of the set of predictor variables, it is not necessary to run regression in order to check for high collinearity. The -collin- command (findit collin) will compute a number of collinearity diagnostics.
use http://www.ats.ucla.edu/stat/data/hsbdemo, clear
collin female schtyp read write math science socst
Collinearity Diagnostics
SQRT R-
Variable VIF VIF Tolerance Squared
----------------------------------------------------
female 1.25 1.12 0.8027 0.1973
schtyp 1.02 1.01 0.9819 0.0181
read 2.45 1.57 0.4080 0.5920
write 2.52 1.59 0.3962 0.6038
math 2.28 1.51 0.4378 0.5622
science 2.12 1.46 0.4717 0.5283
socst 1.91 1.38 0.5224 0.4776
----------------------------------------------------
Mean VIF 1.94
Cond
Eigenval Index
---------------------------------
1 3.4004 1.0000
2 1.1347 1.7311
3 0.9782 1.8644
4 0.5229 2.5502
5 0.3577 3.0831
6 0.3299 3.2104
7 0.2762 3.5087
---------------------------------
Condition Number 3.5087
Eigenvalues & Cond Index computed from deviation sscp (no intercept)
Det(correlation matrix) 0.0643
use http://www.philender.com/courses/data/lahigh, clear
collin mathnce langnce mathpr langpr
Collinearity Diagnostics
SQRT R-
Variable VIF VIF Tolerance Squared
----------------------------------------------------
mathnce 24.20 4.92 0.0413 0.9587
langnce 28.31 5.32 0.0353 0.9647
mathpr 25.02 5.00 0.0400 0.9600
langpr 29.09 5.39 0.0344 0.9656
----------------------------------------------------
Mean VIF 26.65
Cond
Eigenval Index
---------------------------------
1 3.3643 1.0000
2 0.5926 2.3827
3 0.0287 10.8179
4 0.0143 15.3294
---------------------------------
Condition Number 15.3294
Eigenvalues & Cond Index computed from deviation sscp (no intercept)
Det(correlation matrix) 0.0008
collin mathnce langnce
Collinearity Diagnostics
SQRT R-
Variable VIF VIF Tolerance Squared
----------------------------------------------------
mathnce 1.90 1.38 0.5256 0.4744
langnce 1.90 1.38 0.5256 0.4744
----------------------------------------------------
Mean VIF 1.90
Cond
Eigenval Index
---------------------------------
1 1.6888 1.0000
2 0.3112 2.3295
---------------------------------
Condition Number 2.3295
Eigenvalues & Cond Index computed from deviation sscp (no intercept)
Det(correlation matrix) 0.5256Computational Examples
The following computational examples show some of the effects of high collinearity on standardized regression coefficients.
Example A
1 2 3 Y
1 - .20 .20 .50
2 - .10 .50
3 - .50
Y -
R2 = .56373 Det = .918
Beta Std Err F
1 .34314 .07001 24.025
2 .39216 .06894 32.360
3 .39216 .06894 32.360
Example B
1 2 3 Y
1 - .20 .20 .50
2 - .85 .50
3 - .50
Y -
R2 = .43079 Det = .2655
Beta Std Err F
1 .40960 .07872 27.073
2 .22599 .14642 2.382
3 .22599 .14642 2.382
Example C
1 2 3 Y
1 - .20 .20 .50
2 - .10 .50
3 - .52
Y -
R2 = .57983
Beta Std Err F
1 .33922 .06870 24.378
2 .39085 .06765 33.376
3 .41307 .06765 37.279
Example D
1 2 3 Y
1 - .20 .20 .50
2 - .85 .50
3 - .52
Y -
R2 = .44128
Beta Std Err F
1 .40734 .07799 27.277
2 .16497 .14507 1.293
3 .29831 .14507 4.229
Remedies
Categorical Data Analysis Course
Phil Ender