**First Thoughts**

Many students and researchers are familiar with collinearity issues through the study of OLS regression. But concerns about collinearity are common to many types of statistical models including categorical and count models. Here are some first thoughts on the matter:

- Is it any degree of correlation? or
- Is it a matter of a high degree of intercorrelation?
- What constitutes a high degree of intercorrelation?

**Multicollinearity**

**Common Indicators of Collinearity**

- VIF values are large
- individual VIF greater than 10 should be inspected
- average VIF greater than 6

- tolerance values are small, close to zero
- tolerance less than .1
- tolerance = 1/VIF

**Effects of Collinearity**

**Checking for Collinearity in Stata**

**Stata Example Using -collin-**

Most statistical software packages have options associated with their regression programs that
are designed to check for collinearity problems. But since collinearity is a property of the
set of predictor variables, it is not necessary to run regression in order to check for
high collinearity. The **-collin-** command (**findit collin**) will compute a number
of collinearity diagnostics.

use http://www.ats.ucla.edu/stat/data/hsbdemo, clear collin female schtyp read write math science socstCollinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- female 1.25 1.12 0.8027 0.1973 schtyp 1.02 1.01 0.9819 0.0181 read 2.45 1.57 0.4080 0.5920 write 2.52 1.59 0.3962 0.6038 math 2.28 1.51 0.4378 0.5622 science 2.12 1.46 0.4717 0.5283 socst 1.91 1.38 0.5224 0.4776 ---------------------------------------------------- Mean VIF 1.94 Cond Eigenval Index --------------------------------- 1 3.4004 1.0000 2 1.1347 1.7311 3 0.9782 1.8644 4 0.5229 2.5502 5 0.3577 3.0831 6 0.3299 3.2104 7 0.2762 3.5087 --------------------------------- Condition Number 3.5087 Eigenvalues & Cond Index computed from deviation sscp (no intercept) Det(correlation matrix) 0.0643use http://www.philender.com/courses/data/lahigh, clear collin mathnce langnce mathpr langprCollinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- mathnce 24.20 4.92 0.0413 0.9587 langnce 28.31 5.32 0.0353 0.9647 mathpr 25.02 5.00 0.0400 0.9600 langpr 29.09 5.39 0.0344 0.9656 ---------------------------------------------------- Mean VIF 26.65 Cond Eigenval Index --------------------------------- 1 3.3643 1.0000 2 0.5926 2.3827 3 0.0287 10.8179 4 0.0143 15.3294 --------------------------------- Condition Number 15.3294 Eigenvalues & Cond Index computed from deviation sscp (no intercept) Det(correlation matrix) 0.0008collin mathnce langnceCollinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- mathnce 1.90 1.38 0.5256 0.4744 langnce 1.90 1.38 0.5256 0.4744 ---------------------------------------------------- Mean VIF 1.90 Cond Eigenval Index --------------------------------- 1 1.6888 1.0000 2 0.3112 2.3295 --------------------------------- Condition Number 2.3295 Eigenvalues & Cond Index computed from deviation sscp (no intercept) Det(correlation matrix) 0.5256

**Computational Examples**

The following computational examples show some of the effects of high collinearity on standardized regression coefficients.

**Example A**

1 2 3 Y 1 - .20 .20 .50 2 - .10 .50 3 - .50 Y - R2 = .56373 Det = .918 Beta Std Err F 1 .34314 .07001 24.025 2 .39216 .06894 32.360 3 .39216 .06894 32.360

**Example B**

1 2 3 Y 1 - .20 .20 .50 2 - .85 .50 3 - .50 Y - R2 = .43079 Det = .2655 Beta Std Err F 1 .40960 .07872 27.073 2 .22599 .14642 2.382 3 .22599 .14642 2.382

**Example C**

1 2 3 Y 1 - .20 .20 .50 2 - .10 .50 3 - .52 Y - R2 = .57983 Beta Std Err F 1 .33922 .06870 24.378 2 .39085 .06765 33.376 3 .41307 .06765 37.279

**Example D**

1 2 3 Y 1 - .20 .20 .50 2 - .85 .50 3 - .52 Y - R2 = .44128 Beta Std Err F 1 .40734 .07799 27.277 2 .16497 .14507 1.293 3 .29831 .14507 4.229

**Remedies**

Categorical Data Analysis Course