What about using dichotomous variables?
Since factor analysis is based on a correlation or covariance matrix, it assumes the observed indicators are measured continuously, are distributed normally, and that the associations among indicators are linear. That said, exploratory factor analysis is often used as a data reduction technique with ordered categorical indicators and dichotomous items scored 0-1. It would not be appropriate with non-orderable categorical indicators with more than 2 categories (e.g., Religion coded 1 if Protestant, 2 if Catholic, 3 if Jewish, 4 if Other Religion, 5 if no religious preference). You should note that the solution will not be optimal in the sense that the marginal distributions of the observed variables generally restrict the maximum value that product moment correlations can attain to something less than 1.0.
If the dichotomous variables are indicators of underlying continuous latent variables, some researchers recommend using tetrachoric correlations in the factor analysis. Using tetrachoric estimators assume that the dichotomous measured variables are imperfect measures of underlying latent continuous variables. Factor analysis also assumes underlying normality, or at least symmetrically distributed variables. In some cases--for example, strength of attitudes or opinions measured using a Likert scale--this assumption may be reasonable. For others, such as a nominal variable like gender or race, it clearly doesn't make sense. So the question of whether the available techniques for estimating factor analysis models involving categorical variables are "appropriate" depends on what you're willing to believe about the existence and relationship between the observed, measured variable, and the latent variable which it may be intended to measure.
With crudely measured indicators many researchers tend to favor principal components which provides an exact mathematical solution. Certainly you should not put any faith in the tests of significance associated with maximum likelihood solutions.
Please Note: Sample tetrachoric correlations are computed for pairs of variables at a time because it is numerically intractable to compute them in a multivariate fashion. Because they are computed in a pairwise fashion it is possible to get non-positive definite correlation matrices. The best way to avoid this situation is to have very large sample sizes. An sample size of 500 is good, but 1,000 is even better.
Here is an example with 600 observations. We will factor eight continuous (interval) variables and the same eight variables dichotomized (binary) at their median. Varimax loadings greater than .30 are shown in boldface.
use hsbfactor /* with continuous (interval) variables */ correlate locus concept mot career read write math sci ss (obs=600) | locus concept mot career read write math sci ss -------------+--------------------------------------------------------------------------------- locus | 1.0000 concept | 0.1712 1.0000 mot | 0.2451 0.2886 1.0000 career | 0.1156 0.0261 0.0839 1.0000 read | 0.3736 0.0607 0.2106 0.1182 1.0000 write | 0.3589 0.0194 0.2542 0.0830 0.6286 1.0000 math | 0.3373 0.0536 0.1950 0.1412 0.6793 0.6327 1.0000 sci | 0.3246 0.0698 0.1157 0.1206 0.6907 0.5691 0.6495 1.0000 ss | 0.2820 0.0115 0.2007 0.0756 0.5899 0.5852 0.5342 0.5167 1.0000 factor locus concept mot read write math sci ss, ipf factors(3) (obs=600) (iterated principal factors; 3 factors retained) Factor Eigenvalue Difference Proportion Cumulative ------------------------------------------------------------------ 1 3.36998 2.70623 0.7904 0.7904 2 0.66375 0.43393 0.1557 0.9461 3 0.22982 0.19956 0.0539 1.0000 4 0.03026 0.01431 0.0071 1.0071 5 0.01594 0.01069 0.0037 1.0108 6 0.00526 0.01102 0.0012 1.0121 7 -0.00577 0.03996 -0.0014 1.0107 8 -0.04572 . -0.0107 1.0000 Factor Loadings Variable | 1 2 3 Uniqueness -------------+------------------------------------------- locus | 0.45184 0.21364 -0.00027 0.75020 concept | 0.11235 0.56244 0.20573 0.62871 mot | 0.30328 0.51324 -0.16866 0.61616 read | 0.83927 -0.07172 0.06727 0.28596 write | 0.78954 -0.03899 -0.22922 0.32256 math | 0.79434 -0.07828 0.03614 0.36158 sci | 0.79106 -0.14349 0.28261 0.27376 ss | 0.69044 -0.07054 -0.14425 0.49751 rotate, horst (varimax rotation) Rotated Factor Loadings Variable | 1 2 3 Uniqueness -------------+------------------------------------------- locus | 0.38267 0.30164 -0.11126 0.75020 concept | -0.00095 0.60213 0.09342 0.62871 mot | 0.14523 0.52453 -0.29600 0.61616 read | 0.83303 0.12445 -0.06798 0.28596 write | 0.73829 0.08850 -0.35290 0.32256 math | 0.78733 0.10257 -0.08950 0.36158 sci | 0.83238 0.08647 0.16095 0.27376 ss | 0.66183 0.05321 -0.24828 0.49751 /* ordinary factor analysis with dichotomous (binary) variables */ correlate blocus bconcept bmot bread bwrite bmath bsci bss (obs=600) | blocus bconcept bmot bread bwrite bmath bsci bss -------------+------------------------------------------------------------------------ blocus | 1.0000 bconcept | 0.1215 1.0000 bmot | 0.2002 0.1752 1.0000 bread | 0.2468 0.0435 0.2030 1.0000 bwrite | 0.1770 0.0093 0.1804 0.4882 1.0000 bmath | 0.2233 0.0356 0.1682 0.5164 0.4867 1.0000 bsci | 0.2168 0.0599 0.0612 0.5555 0.4110 0.4864 1.0000 bss | 0.1805 0.0503 0.1827 0.4388 0.4128 0.3534 0.3714 1.0000 factor blocus bconcept bmot bread bwrite bmath bsci bss, ipf factors(3) (obs=600) (iterated principal factors; 3 factors retained) Factor Eigenvalue Difference Proportion Cumulative ------------------------------------------------------------------ 1 2.52155 2.04495 0.7791 0.7791 2 0.47659 0.23840 0.1473 0.9264 3 0.23820 0.19963 0.0736 1.0000 4 0.03856 0.00594 0.0119 1.0119 5 0.03263 0.02976 0.0101 1.0220 6 0.00286 0.02751 0.0009 1.0229 7 -0.02465 0.02478 -0.0076 1.0153 8 -0.04943 . -0.0153 1.0000 Factor Loadings Variable | 1 2 3 Uniqueness -------------+------------------------------------------- blocus | 0.33350 0.21359 0.12579 0.82734 bconcept | 0.09424 0.28948 0.19617 0.86884 bmot | 0.28537 0.52153 -0.02046 0.64615 bread | 0.75381 -0.03574 0.00444 0.43048 bwrite | 0.67340 -0.02867 -0.28238 0.46597 bmath | 0.67726 -0.04869 -0.05441 0.53599 bsci | 0.72381 -0.26296 0.30413 0.31445 bss | 0.56190 0.03954 -0.09090 0.67445 rotate, horst (varimax rotation) Rotated Factor Loadings Variable | 1 2 3 Uniqueness -------------+------------------------------------------- blocus | 0.22068 0.32669 0.13130 0.82734 bconcept | -0.03121 0.35398 0.06993 0.86884 bmot | 0.21216 0.53763 -0.14065 0.64615 bread | 0.67356 0.17573 0.29147 0.43048 bwrite | 0.72803 0.05904 0.02278 0.46597 bmath | 0.63262 0.12248 0.22091 0.53599 bsci | 0.53664 0.06971 0.62666 0.31445 bss | 0.53724 0.15704 0.11077 0.67445As you can see the rotated factor loadings are substantially different with the continuous variables as compared with the binary ones.
Next, we will run the factor analysis using the Mplus package that uses tetrachoric correlations in computing the factor solutions.
Title: Tetrachoric correlations for binary variables Data: File is ..\data\hsbfactor.dat ; Variable: Names are blocus bconcept bmot bread bwrite bmath bsci bss; Usevariables are blocus bconcept bmot bread bwrite bmath bsci bss; Categorical are blocus bconcept bmot bread bwrite bmath bsci bss; Analysis: Type = basic; SAMPLE TETRACHORIC CORRELATIONS BLOCUS BCONCEPT BMOT BREAD BWRITE BMATH BSCI BSS BLOCUS BCONCEPT 0.195 BMOT 0.324 0.285 BREAD 0.378 0.070 0.327 BWRITE 0.275 0.015 0.291 0.694 BMATH 0.344 0.057 0.273 0.725 0.693 BSCI 0.334 0.097 0.100 0.766 0.602 0.692 BSS 0.288 0.083 0.296 0.653 0.618 0.543 0.565 Title: Factor analysis with binary variables Data: File is ..\data\hsbfactor.dat ; Variable: Names are blocus bconcept bmot bread bwrite bmath bsci bss; Usevariables are blocus bconcept bmot bread bwrite bmath bsci bss; Categorical are blocus bconcept bmot bread bwrite bmath bsci bss; Analysis: Type = efa 3 4 ; INPUT READING TERMINATED NORMALLY Factor analysis with binary variables SUMMARY OF ANALYSIS Number of groups 1 Number of observations 600 Number of dependent variables 8 Number of independent variables 0 Number of continuous latent variables 0 Observed dependent variables Binary and ordered categorical (ordinal) BLOCUS BCONCEPT BMOT BREAD BWRITE BMATH BSCI BSS Estimator ULS Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20 Input data file(s) ..\data\hsbfactor.dat Input data format FREE RESULTS FOR EXPLORATORY FACTOR ANALYSIS EIGENVALUES FOR SAMPLE CORRELATION MATRIX 1 2 3 4 5 ________ ________ ________ ________ ________ 3.974 1.281 0.749 0.722 0.469 EIGENVALUES FOR SAMPLE CORRELATION MATRIX 6 7 8 ________ ________ ________ 0.356 0.262 0.186 EXPLORATORY ANALYSIS WITH 3 FACTOR(S) : ROOT MEAN SQUARE RESIDUAL IS 0.0136 VARIMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ BLOCUS 0.304 0.408 0.129 BCONCEPT -0.032 0.466 0.095 BMOT 0.276 0.676 -0.243 BREAD 0.832 0.214 0.224 BWRITE 0.857 0.072 -0.031 BMATH 0.780 0.153 0.179 BSCI 0.715 0.100 0.669 BSS 0.690 0.188 0.075We can do something very similar to this in Stata using polychoric (by Stas Kolenikov findit polychoric) and factormat (Stata 9) to analyze a dataset using tetrachoric correlations. When variables are binary polychoric produces tetrachoric correlations.
polychoric blocus bconcept bmot bread bwrite bmath bsci bss Polychoric correlation matrix blocus bconcept bmot bread bwrite bmath bsci bss blocus 1 bconcept .19521351 1 bmot .32394358 .28496224 1 bread .37831861 .07019049 .32710018 1 bwrite .27491692 .01501151 .291141 .69446683 1 bmath .34366457 .05746446 .27300471 .72544653 .69332124 1 bsci .33427115 .09668219 .10016842 .76617994 .60225076 .69222132 1 bss .2874693 .0827877 .29619049 .65309799 .61768139 .54247674 .56492181 1 matrix R = r(R) factormat R, n(600) pcf /* factormat is Stata 9 */ (obs=600) Factor analysis/correlation Number of obs = 600 Method: principal-component factors Retained factors = 2 Rotation: (unrotated) Number of params = 15 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 3.97411 2.69304 0.4968 0.4968 Factor2 | 1.28107 0.53179 0.1601 0.6569 Factor3 | 0.74929 0.02711 0.0937 0.7506 Factor4 | 0.72218 0.25349 0.0903 0.8408 Factor5 | 0.46869 0.11275 0.0586 0.8994 Factor6 | 0.35594 0.09368 0.0445 0.9439 Factor7 | 0.26226 0.07579 0.0328 0.9767 Factor8 | 0.18647 . 0.0233 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(28) = 2263.93 Prob>chi2 = 0.0000 Factor loadings (pattern matrix) and unique variances ------------------------------------------------- Variable | Factor1 Factor2 | Uniqueness -------------+--------------------+-------------- blocus | 0.5147 0.4141 | 0.5637 bconcept | 0.1656 0.7648 | 0.3877 bmot | 0.4348 0.6347 | 0.4081 bread | 0.8997 -0.1138 | 0.1776 bwrite | 0.8267 -0.1794 | 0.2843 bmath | 0.8480 -0.1473 | 0.2593 bsci | 0.8221 -0.2228 | 0.2744 bss | 0.7778 -0.0732 | 0.3897 -------------------------------------------------Note: The eigenvalues are the same as for Mplus.
factormat R, n(600) ipf factors(3) (obs=600) Factor analysis/correlation Number of obs = 600 Method: iterated principal factors Retained factors = 3 Rotation: (unrotated) Number of params = 21 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 3.68334 2.92815 0.7719 0.7719 Factor2 | 0.75519 0.42193 0.1583 0.9302 Factor3 | 0.33327 0.28288 0.0698 1.0000 Factor4 | 0.05039 0.00336 0.0106 1.0106 Factor5 | 0.04704 0.04313 0.0099 1.0204 Factor6 | 0.00390 0.04003 0.0008 1.0212 Factor7 | -0.03613 0.02909 -0.0076 1.0137 Factor8 | -0.06522 . -0.0137 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(28) = 2263.93 Prob>chi2 = 0.0000 /* note iterated eigenvalues */ Factor loadings (pattern matrix) and unique variances ----------------------------------------------------------- Variable | Factor1 Factor2 Factor3 | Uniqueness -------------+------------------------------+-------------- blocus | 0.4338 0.2539 0.1524 | 0.7242 bconcept | 0.1329 0.3759 0.2595 | 0.7737 bmot | 0.3946 0.6583 -0.0659 | 0.4065 bread | 0.8866 -0.0482 -0.0185 | 0.2112 bwrite | 0.8043 -0.0678 -0.3003 | 0.2584 bmath | 0.8086 -0.0740 -0.0621 | 0.3368 bsci | 0.8544 -0.3215 0.3636 | 0.0344 bss | 0.7105 0.0168 -0.1085 | 0.4831 ----------------------------------------------------------- rotate, horst blanks(.3) Factor analysis/correlation Number of obs = 600 Method: iterated principal factors Retained factors = 3 Rotation: orthogonal varimax (Horst on) Number of params = 21 -------------------------------------------------------------------------- Factor | Variance Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 3.20244 2.24256 0.6711 0.6711 Factor2 | 0.95988 0.35040 0.2012 0.8723 Factor3 | 0.60948 . 0.1277 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(28) = 2263.93 Prob>chi2 = 0.0000 Rotated factor loadings (pattern matrix) and unique variances ----------------------------------------------------------- Variable | Factor1 Factor2 Factor3 | Uniqueness -------------+------------------------------+-------------- blocus | 0.3048 0.4083 | 0.7242 bconcept | 0.4652 | 0.7737 bmot | 0.6761 | 0.4065 bread | 0.8334 | 0.2112 bwrite | 0.8575 | 0.2584 bmath | 0.7807 | 0.3368 bsci | 0.7192 0.6621 | 0.0344 bss | 0.6905 | 0.4831 ----------------------------------------------------------- (blanks represent abs(loading)<.3) Factor rotation matrix ----------------------------------------- | Factor1 Factor2 Factor3 -------------+--------------------------- Factor1 | 0.9235 0.2974 0.2424 Factor2 | -0.1711 0.8847 -0.4337 Factor3 | -0.3434 0.3590 0.8679 -----------------------------------------These results are very close to those using Mplus.
Multivariate Course Page
Phil Ender, 16nov05, 26nov03