What about using dichotomous variables?
Since factor analysis is based on a correlation or covariance matrix, it assumes the observed indicators are measured continuously, are distributed normally, and that the associations among indicators are linear. That said, exploratory factor analysis is often used as a data reduction technique with ordered categorical indicators and dichotomous items scored 0-1. It would not be appropriate with non-orderable categorical indicators with more than 2 categories (e.g., Religion coded 1 if Protestant, 2 if Catholic, 3 if Jewish, 4 if Other Religion, 5 if no religious preference). You should note that the solution will not be optimal in the sense that the marginal distributions of the observed variables generally restrict the maximum value that product moment correlations can attain to something less than 1.0.
If the dichotomous variables are indicators of underlying continuous latent variables, some researchers recommend using tetrachoric correlations in the factor analysis. Using tetrachoric estimators assume that the dichotomous measured variables are imperfect measures of underlying latent continuous variables. Factor analysis also assumes underlying normality, or at least symmetrically distributed variables. In some cases--for example, strength of attitudes or opinions measured using a Likert scale--this assumption may be reasonable. For others, such as a nominal variable like gender or race, it clearly doesn't make sense. So the question of whether the available techniques for estimating factor analysis models involving categorical variables are "appropriate" depends on what you're willing to believe about the existence and relationship between the observed, measured variable, and the latent variable which it may be intended to measure.
With crudely measured indicators many researchers tend to favor principal components which provides an exact mathematical solution. Certainly you should not put any faith in the tests of significance associated with maximum likelihood solutions.
Please Note: Sample tetrachoric correlations are computed for pairs of variables at a time because it is numerically intractable to compute them in a multivariate fashion. Because they are computed in a pairwise fashion it is possible to get non-positive definite correlation matrices. The best way to avoid this situation is to have very large sample sizes. An sample size of 500 is good, but 1,000 is even better.
Here is an example with 600 observations. We will factor eight continuous (interval) variables and the same eight variables dichotomized (binary) at their median. Varimax loadings greater than .30 are shown in boldface.
use hsbfactor
/* with continuous (interval) variables */
correlate locus concept mot career read write math sci ss
(obs=600)
| locus concept mot career read write math sci ss
-------------+---------------------------------------------------------------------------------
locus | 1.0000
concept | 0.1712 1.0000
mot | 0.2451 0.2886 1.0000
career | 0.1156 0.0261 0.0839 1.0000
read | 0.3736 0.0607 0.2106 0.1182 1.0000
write | 0.3589 0.0194 0.2542 0.0830 0.6286 1.0000
math | 0.3373 0.0536 0.1950 0.1412 0.6793 0.6327 1.0000
sci | 0.3246 0.0698 0.1157 0.1206 0.6907 0.5691 0.6495 1.0000
ss | 0.2820 0.0115 0.2007 0.0756 0.5899 0.5852 0.5342 0.5167 1.0000
factor locus concept mot read write math sci ss, ipf factors(3)
(obs=600)
(iterated principal factors; 3 factors retained)
Factor Eigenvalue Difference Proportion Cumulative
------------------------------------------------------------------
1 3.36998 2.70623 0.7904 0.7904
2 0.66375 0.43393 0.1557 0.9461
3 0.22982 0.19956 0.0539 1.0000
4 0.03026 0.01431 0.0071 1.0071
5 0.01594 0.01069 0.0037 1.0108
6 0.00526 0.01102 0.0012 1.0121
7 -0.00577 0.03996 -0.0014 1.0107
8 -0.04572 . -0.0107 1.0000
Factor Loadings
Variable | 1 2 3 Uniqueness
-------------+-------------------------------------------
locus | 0.45184 0.21364 -0.00027 0.75020
concept | 0.11235 0.56244 0.20573 0.62871
mot | 0.30328 0.51324 -0.16866 0.61616
read | 0.83927 -0.07172 0.06727 0.28596
write | 0.78954 -0.03899 -0.22922 0.32256
math | 0.79434 -0.07828 0.03614 0.36158
sci | 0.79106 -0.14349 0.28261 0.27376
ss | 0.69044 -0.07054 -0.14425 0.49751
rotate, horst
(varimax rotation)
Rotated Factor Loadings
Variable | 1 2 3 Uniqueness
-------------+-------------------------------------------
locus | 0.38267 0.30164 -0.11126 0.75020
concept | -0.00095 0.60213 0.09342 0.62871
mot | 0.14523 0.52453 -0.29600 0.61616
read | 0.83303 0.12445 -0.06798 0.28596
write | 0.73829 0.08850 -0.35290 0.32256
math | 0.78733 0.10257 -0.08950 0.36158
sci | 0.83238 0.08647 0.16095 0.27376
ss | 0.66183 0.05321 -0.24828 0.49751
/* ordinary factor analysis with dichotomous (binary) variables */
correlate blocus bconcept bmot bread bwrite bmath bsci bss
(obs=600)
| blocus bconcept bmot bread bwrite bmath bsci bss
-------------+------------------------------------------------------------------------
blocus | 1.0000
bconcept | 0.1215 1.0000
bmot | 0.2002 0.1752 1.0000
bread | 0.2468 0.0435 0.2030 1.0000
bwrite | 0.1770 0.0093 0.1804 0.4882 1.0000
bmath | 0.2233 0.0356 0.1682 0.5164 0.4867 1.0000
bsci | 0.2168 0.0599 0.0612 0.5555 0.4110 0.4864 1.0000
bss | 0.1805 0.0503 0.1827 0.4388 0.4128 0.3534 0.3714 1.0000
factor blocus bconcept bmot bread bwrite bmath bsci bss, ipf factors(3)
(obs=600)
(iterated principal factors; 3 factors retained)
Factor Eigenvalue Difference Proportion Cumulative
------------------------------------------------------------------
1 2.52155 2.04495 0.7791 0.7791
2 0.47659 0.23840 0.1473 0.9264
3 0.23820 0.19963 0.0736 1.0000
4 0.03856 0.00594 0.0119 1.0119
5 0.03263 0.02976 0.0101 1.0220
6 0.00286 0.02751 0.0009 1.0229
7 -0.02465 0.02478 -0.0076 1.0153
8 -0.04943 . -0.0153 1.0000
Factor Loadings
Variable | 1 2 3 Uniqueness
-------------+-------------------------------------------
blocus | 0.33350 0.21359 0.12579 0.82734
bconcept | 0.09424 0.28948 0.19617 0.86884
bmot | 0.28537 0.52153 -0.02046 0.64615
bread | 0.75381 -0.03574 0.00444 0.43048
bwrite | 0.67340 -0.02867 -0.28238 0.46597
bmath | 0.67726 -0.04869 -0.05441 0.53599
bsci | 0.72381 -0.26296 0.30413 0.31445
bss | 0.56190 0.03954 -0.09090 0.67445
rotate, horst
(varimax rotation)
Rotated Factor Loadings
Variable | 1 2 3 Uniqueness
-------------+-------------------------------------------
blocus | 0.22068 0.32669 0.13130 0.82734
bconcept | -0.03121 0.35398 0.06993 0.86884
bmot | 0.21216 0.53763 -0.14065 0.64615
bread | 0.67356 0.17573 0.29147 0.43048
bwrite | 0.72803 0.05904 0.02278 0.46597
bmath | 0.63262 0.12248 0.22091 0.53599
bsci | 0.53664 0.06971 0.62666 0.31445
bss | 0.53724 0.15704 0.11077 0.67445
As you can see the rotated factor loadings are substantially different with the continuous variables
as compared with the binary ones.Next, we will run the factor analysis using the Mplus package that uses tetrachoric correlations in computing the factor solutions.
Title: Tetrachoric correlations for binary variables
Data:
File is ..\data\hsbfactor.dat ;
Variable:
Names are
blocus bconcept bmot bread bwrite bmath bsci bss;
Usevariables are
blocus bconcept bmot bread bwrite bmath bsci bss;
Categorical are
blocus bconcept bmot bread bwrite bmath bsci bss;
Analysis:
Type = basic;
SAMPLE TETRACHORIC CORRELATIONS
BLOCUS BCONCEPT BMOT BREAD BWRITE BMATH BSCI BSS
BLOCUS
BCONCEPT 0.195
BMOT 0.324 0.285
BREAD 0.378 0.070 0.327
BWRITE 0.275 0.015 0.291 0.694
BMATH 0.344 0.057 0.273 0.725 0.693
BSCI 0.334 0.097 0.100 0.766 0.602 0.692
BSS 0.288 0.083 0.296 0.653 0.618 0.543 0.565
Title: Factor analysis with binary variables
Data:
File is ..\data\hsbfactor.dat ;
Variable:
Names are
blocus bconcept bmot bread bwrite bmath bsci bss;
Usevariables are
blocus bconcept bmot bread bwrite bmath bsci bss;
Categorical are
blocus bconcept bmot bread bwrite bmath bsci bss;
Analysis:
Type = efa 3 4 ;
INPUT READING TERMINATED NORMALLY
Factor analysis with binary variables
SUMMARY OF ANALYSIS
Number of groups 1
Number of observations 600
Number of dependent variables 8
Number of independent variables 0
Number of continuous latent variables 0
Observed dependent variables
Binary and ordered categorical (ordinal)
BLOCUS BCONCEPT BMOT BREAD BWRITE BMATH
BSCI BSS
Estimator ULS
Maximum number of iterations 1000
Convergence criterion 0.500D-04
Maximum number of steepest descent iterations 20
Input data file(s)
..\data\hsbfactor.dat
Input data format FREE
RESULTS FOR EXPLORATORY FACTOR ANALYSIS
EIGENVALUES FOR SAMPLE CORRELATION MATRIX
1 2 3 4 5
________ ________ ________ ________ ________
3.974 1.281 0.749 0.722 0.469
EIGENVALUES FOR SAMPLE CORRELATION MATRIX
6 7 8
________ ________ ________
0.356 0.262 0.186
EXPLORATORY ANALYSIS WITH 3 FACTOR(S) :
ROOT MEAN SQUARE RESIDUAL IS 0.0136
VARIMAX ROTATED LOADINGS
1 2 3
________ ________ ________
BLOCUS 0.304 0.408 0.129
BCONCEPT -0.032 0.466 0.095
BMOT 0.276 0.676 -0.243
BREAD 0.832 0.214 0.224
BWRITE 0.857 0.072 -0.031
BMATH 0.780 0.153 0.179
BSCI 0.715 0.100 0.669
BSS 0.690 0.188 0.075
We can do something very similar to this in Stata using polychoric (by Stas Kolenikov
findit polychoric) and factormat (Stata 9) to analyze a dataset using tetrachoric
correlations. When variables are binary polychoric produces tetrachoric correlations.
polychoric blocus bconcept bmot bread bwrite bmath bsci bss
Polychoric correlation matrix
blocus bconcept bmot bread bwrite bmath bsci bss
blocus 1
bconcept .19521351 1
bmot .32394358 .28496224 1
bread .37831861 .07019049 .32710018 1
bwrite .27491692 .01501151 .291141 .69446683 1
bmath .34366457 .05746446 .27300471 .72544653 .69332124 1
bsci .33427115 .09668219 .10016842 .76617994 .60225076 .69222132 1
bss .2874693 .0827877 .29619049 .65309799 .61768139 .54247674 .56492181 1
matrix R = r(R)
factormat R, n(600) pcf /* factormat is Stata 9 */
(obs=600)
Factor analysis/correlation Number of obs = 600
Method: principal-component factors Retained factors = 2
Rotation: (unrotated) Number of params = 15
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 3.97411 2.69304 0.4968 0.4968
Factor2 | 1.28107 0.53179 0.1601 0.6569
Factor3 | 0.74929 0.02711 0.0937 0.7506
Factor4 | 0.72218 0.25349 0.0903 0.8408
Factor5 | 0.46869 0.11275 0.0586 0.8994
Factor6 | 0.35594 0.09368 0.0445 0.9439
Factor7 | 0.26226 0.07579 0.0328 0.9767
Factor8 | 0.18647 . 0.0233 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(28) = 2263.93 Prob>chi2 = 0.0000
Factor loadings (pattern matrix) and unique variances
-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
blocus | 0.5147 0.4141 | 0.5637
bconcept | 0.1656 0.7648 | 0.3877
bmot | 0.4348 0.6347 | 0.4081
bread | 0.8997 -0.1138 | 0.1776
bwrite | 0.8267 -0.1794 | 0.2843
bmath | 0.8480 -0.1473 | 0.2593
bsci | 0.8221 -0.2228 | 0.2744
bss | 0.7778 -0.0732 | 0.3897
-------------------------------------------------
Note: The eigenvalues are the same as for Mplus.
factormat R, n(600) ipf factors(3)
(obs=600)
Factor analysis/correlation Number of obs = 600
Method: iterated principal factors Retained factors = 3
Rotation: (unrotated) Number of params = 21
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 3.68334 2.92815 0.7719 0.7719
Factor2 | 0.75519 0.42193 0.1583 0.9302
Factor3 | 0.33327 0.28288 0.0698 1.0000
Factor4 | 0.05039 0.00336 0.0106 1.0106
Factor5 | 0.04704 0.04313 0.0099 1.0204
Factor6 | 0.00390 0.04003 0.0008 1.0212
Factor7 | -0.03613 0.02909 -0.0076 1.0137
Factor8 | -0.06522 . -0.0137 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(28) = 2263.93 Prob>chi2 = 0.0000
/* note iterated eigenvalues */
Factor loadings (pattern matrix) and unique variances
-----------------------------------------------------------
Variable | Factor1 Factor2 Factor3 | Uniqueness
-------------+------------------------------+--------------
blocus | 0.4338 0.2539 0.1524 | 0.7242
bconcept | 0.1329 0.3759 0.2595 | 0.7737
bmot | 0.3946 0.6583 -0.0659 | 0.4065
bread | 0.8866 -0.0482 -0.0185 | 0.2112
bwrite | 0.8043 -0.0678 -0.3003 | 0.2584
bmath | 0.8086 -0.0740 -0.0621 | 0.3368
bsci | 0.8544 -0.3215 0.3636 | 0.0344
bss | 0.7105 0.0168 -0.1085 | 0.4831
-----------------------------------------------------------
rotate, horst blanks(.3)
Factor analysis/correlation Number of obs = 600
Method: iterated principal factors Retained factors = 3
Rotation: orthogonal varimax (Horst on) Number of params = 21
--------------------------------------------------------------------------
Factor | Variance Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 3.20244 2.24256 0.6711 0.6711
Factor2 | 0.95988 0.35040 0.2012 0.8723
Factor3 | 0.60948 . 0.1277 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(28) = 2263.93 Prob>chi2 = 0.0000
Rotated factor loadings (pattern matrix) and unique variances
-----------------------------------------------------------
Variable | Factor1 Factor2 Factor3 | Uniqueness
-------------+------------------------------+--------------
blocus | 0.3048 0.4083 | 0.7242
bconcept | 0.4652 | 0.7737
bmot | 0.6761 | 0.4065
bread | 0.8334 | 0.2112
bwrite | 0.8575 | 0.2584
bmath | 0.7807 | 0.3368
bsci | 0.7192 0.6621 | 0.0344
bss | 0.6905 | 0.4831
-----------------------------------------------------------
(blanks represent abs(loading)<.3)
Factor rotation matrix
-----------------------------------------
| Factor1 Factor2 Factor3
-------------+---------------------------
Factor1 | 0.9235 0.2974 0.2424
Factor2 | -0.1711 0.8847 -0.4337
Factor3 | -0.3434 0.3590 0.8679
-----------------------------------------
These results are very close to those using Mplus.
Multivariate Course Page
Phil Ender, 16nov05, 26nov03