First Thoughts
Many students and researchers are familiar with collinearity issues through the study of OLS regression. But concerns about collinearity are common to many types of statistical models including categorical and count models. Here are some first thoughts on the matter:
Multicollinearity
Common Indicators of Collinearity
Effects of Collinearity
Checking for Collinearity in Stata
Stata Example Using -collin-
Most statistical software packages have options associated with their regression programs that are designed to check for collinearity problems. But since collinearity is a property of the set of predictor variables, it is not necessary to run regression in order to check for high collinearity. The -collin- command (findit collin) will compute a number of collinearity diagnostics.
use http://www.ats.ucla.edu/stat/data/hsbdemo, clear collin female schtyp read write math science socst Collinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- female 1.25 1.12 0.8027 0.1973 schtyp 1.02 1.01 0.9819 0.0181 read 2.45 1.57 0.4080 0.5920 write 2.52 1.59 0.3962 0.6038 math 2.28 1.51 0.4378 0.5622 science 2.12 1.46 0.4717 0.5283 socst 1.91 1.38 0.5224 0.4776 ---------------------------------------------------- Mean VIF 1.94 Cond Eigenval Index --------------------------------- 1 3.4004 1.0000 2 1.1347 1.7311 3 0.9782 1.8644 4 0.5229 2.5502 5 0.3577 3.0831 6 0.3299 3.2104 7 0.2762 3.5087 --------------------------------- Condition Number 3.5087 Eigenvalues & Cond Index computed from deviation sscp (no intercept) Det(correlation matrix) 0.0643 use http://www.philender.com/courses/data/lahigh, clear collin mathnce langnce mathpr langpr Collinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- mathnce 24.20 4.92 0.0413 0.9587 langnce 28.31 5.32 0.0353 0.9647 mathpr 25.02 5.00 0.0400 0.9600 langpr 29.09 5.39 0.0344 0.9656 ---------------------------------------------------- Mean VIF 26.65 Cond Eigenval Index --------------------------------- 1 3.3643 1.0000 2 0.5926 2.3827 3 0.0287 10.8179 4 0.0143 15.3294 --------------------------------- Condition Number 15.3294 Eigenvalues & Cond Index computed from deviation sscp (no intercept) Det(correlation matrix) 0.0008 collin mathnce langnce Collinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- mathnce 1.90 1.38 0.5256 0.4744 langnce 1.90 1.38 0.5256 0.4744 ---------------------------------------------------- Mean VIF 1.90 Cond Eigenval Index --------------------------------- 1 1.6888 1.0000 2 0.3112 2.3295 --------------------------------- Condition Number 2.3295 Eigenvalues & Cond Index computed from deviation sscp (no intercept) Det(correlation matrix) 0.5256
Computational Examples
The following computational examples show some of the effects of high collinearity on standardized regression coefficients.
Example A
1 2 3 Y 1 - .20 .20 .50 2 - .10 .50 3 - .50 Y - R2 = .56373 Det = .918 Beta Std Err F 1 .34314 .07001 24.025 2 .39216 .06894 32.360 3 .39216 .06894 32.360
Example B
1 2 3 Y 1 - .20 .20 .50 2 - .85 .50 3 - .50 Y - R2 = .43079 Det = .2655 Beta Std Err F 1 .40960 .07872 27.073 2 .22599 .14642 2.382 3 .22599 .14642 2.382
Example C
1 2 3 Y 1 - .20 .20 .50 2 - .10 .50 3 - .52 Y - R2 = .57983 Beta Std Err F 1 .33922 .06870 24.378 2 .39085 .06765 33.376 3 .41307 .06765 37.279
Example D
1 2 3 Y 1 - .20 .20 .50 2 - .85 .50 3 - .52 Y - R2 = .44128 Beta Std Err F 1 .40734 .07799 27.277 2 .16497 .14507 1.293 3 .29831 .14507 4.229
Remedies
Categorical Data Analysis Course
Phil Ender