Multivariate Analysis
More Exploratory Factor Analysis


What about using dichotomous variables?

Since factor analysis is based on a correlation or covariance matrix, it assumes the observed indicators are measured continuously, are distributed normally, and that the associations among indicators are linear. That said, exploratory factor analysis is often used as a data reduction technique with ordered categorical indicators and dichotomous items scored 0-1. It would not be appropriate with non-orderable categorical indicators with more than 2 categories (e.g., Religion coded 1 if Protestant, 2 if Catholic, 3 if Jewish, 4 if Other Religion, 5 if no religious preference). You should note that the solution will not be optimal in the sense that the marginal distributions of the observed variables generally restrict the maximum value that product moment correlations can attain to something less than 1.0.

If the dichotomous variables are indicators of underlying continuous latent variables, some researchers recommend using tetrachoric correlations in the factor analysis. Using tetrachoric estimators assume that the dichotomous measured variables are imperfect measures of underlying latent continuous variables. Factor analysis also assumes underlying normality, or at least symmetrically distributed variables. In some cases--for example, strength of attitudes or opinions measured using a Likert scale--this assumption may be reasonable. For others, such as a nominal variable like gender or race, it clearly doesn't make sense. So the question of whether the available techniques for estimating factor analysis models involving categorical variables are "appropriate" depends on what you're willing to believe about the existence and relationship between the observed, measured variable, and the latent variable which it may be intended to measure.

With crudely measured indicators many researchers tend to favor principal components which provides an exact mathematical solution. Certainly you should not put any faith in the tests of significance associated with maximum likelihood solutions.

Please Note: Sample tetrachoric correlations are computed for pairs of variables at a time because it is numerically intractable to compute them in a multivariate fashion. Because they are computed in a pairwise fashion it is possible to get non-positive definite correlation matrices. The best way to avoid this situation is to have very large sample sizes. An sample size of 500 is good, but 1,000 is even better.

Here is an example with 600 observations. We will factor eight continuous (interval) variables and the same eight variables dichotomized (binary) at their median. Varimax loadings greater than .30 are shown in boldface.

use hsbfactor

/* with continuous (interval) variables */

correlate locus concept mot career read write math sci ss
(obs=600)

             |    locus  concept      mot   career     read    write     math      sci       ss
-------------+---------------------------------------------------------------------------------
       locus |   1.0000
     concept |   0.1712   1.0000
         mot |   0.2451   0.2886   1.0000
      career |   0.1156   0.0261   0.0839   1.0000
        read |   0.3736   0.0607   0.2106   0.1182   1.0000
       write |   0.3589   0.0194   0.2542   0.0830   0.6286   1.0000
        math |   0.3373   0.0536   0.1950   0.1412   0.6793   0.6327   1.0000
         sci |   0.3246   0.0698   0.1157   0.1206   0.6907   0.5691   0.6495   1.0000
          ss |   0.2820   0.0115   0.2007   0.0756   0.5899   0.5852   0.5342   0.5167   1.0000


factor locus concept mot read write math sci ss, ipf factors(3)

(obs=600)
            (iterated principal factors; 3 factors retained)
  Factor     Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
     1        3.36998         2.70623      0.7904         0.7904
     2        0.66375         0.43393      0.1557         0.9461
     3        0.22982         0.19956      0.0539         1.0000
     4        0.03026         0.01431      0.0071         1.0071
     5        0.01594         0.01069      0.0037         1.0108
     6        0.00526         0.01102      0.0012         1.0121
     7       -0.00577         0.03996     -0.0014         1.0107
     8       -0.04572               .     -0.0107         1.0000

               Factor Loadings
    Variable |      1          2          3    Uniqueness
-------------+-------------------------------------------
       locus |   0.45184    0.21364   -0.00027    0.75020
     concept |   0.11235    0.56244    0.20573    0.62871
         mot |   0.30328    0.51324   -0.16866    0.61616
        read |   0.83927   -0.07172    0.06727    0.28596
       write |   0.78954   -0.03899   -0.22922    0.32256
        math |   0.79434   -0.07828    0.03614    0.36158
         sci |   0.79106   -0.14349    0.28261    0.27376
          ss |   0.69044   -0.07054   -0.14425    0.49751

rotate, horst

            (varimax rotation)
               Rotated Factor Loadings
    Variable |      1          2          3    Uniqueness
-------------+-------------------------------------------
       locus |   0.38267    0.30164   -0.11126    0.75020
     concept |  -0.00095    0.60213    0.09342    0.62871
         mot |   0.14523    0.52453   -0.29600    0.61616
        read |   0.83303    0.12445   -0.06798    0.28596
       write |   0.73829    0.08850   -0.35290    0.32256
        math |   0.78733    0.10257   -0.08950    0.36158
         sci |   0.83238    0.08647    0.16095    0.27376
          ss |   0.66183    0.05321   -0.24828    0.49751

/* ordinary factor analysis with dichotomous (binary) variables */

correlate blocus bconcept bmot bread bwrite bmath bsci bss
(obs=600)

             |   blocus bconcept     bmot    bread   bwrite    bmath     bsci      bss
-------------+------------------------------------------------------------------------
      blocus |   1.0000
    bconcept |   0.1215   1.0000
        bmot |   0.2002   0.1752   1.0000
       bread |   0.2468   0.0435   0.2030   1.0000
      bwrite |   0.1770   0.0093   0.1804   0.4882   1.0000
       bmath |   0.2233   0.0356   0.1682   0.5164   0.4867   1.0000
        bsci |   0.2168   0.0599   0.0612   0.5555   0.4110   0.4864   1.0000
         bss |   0.1805   0.0503   0.1827   0.4388   0.4128   0.3534   0.3714   1.0000


factor blocus bconcept bmot bread bwrite bmath bsci bss, ipf factors(3)

(obs=600)

 (iterated principal factors; 3 factors retained)
  Factor     Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
     1        2.52155         2.04495      0.7791         0.7791
     2        0.47659         0.23840      0.1473         0.9264
     3        0.23820         0.19963      0.0736         1.0000
     4        0.03856         0.00594      0.0119         1.0119
     5        0.03263         0.02976      0.0101         1.0220
     6        0.00286         0.02751      0.0009         1.0229
     7       -0.02465         0.02478     -0.0076         1.0153
     8       -0.04943               .     -0.0153         1.0000

               Factor Loadings
    Variable |      1          2          3    Uniqueness
-------------+-------------------------------------------
      blocus |   0.33350    0.21359    0.12579    0.82734
    bconcept |   0.09424    0.28948    0.19617    0.86884
        bmot |   0.28537    0.52153   -0.02046    0.64615
       bread |   0.75381   -0.03574    0.00444    0.43048
      bwrite |   0.67340   -0.02867   -0.28238    0.46597
       bmath |   0.67726   -0.04869   -0.05441    0.53599
        bsci |   0.72381   -0.26296    0.30413    0.31445
         bss |   0.56190    0.03954   -0.09090    0.67445

rotate, horst
 
            (varimax rotation)
               Rotated Factor Loadings
    Variable |      1          2          3    Uniqueness
-------------+-------------------------------------------
      blocus |   0.22068    0.32669    0.13130    0.82734
    bconcept |  -0.03121    0.35398    0.06993    0.86884
        bmot |   0.21216    0.53763   -0.14065    0.64615
       bread |   0.67356    0.17573    0.29147    0.43048
      bwrite |   0.72803    0.05904    0.02278    0.46597
       bmath |   0.63262    0.12248    0.22091    0.53599
        bsci |   0.53664    0.06971    0.62666    0.31445
         bss |   0.53724    0.15704    0.11077    0.67445
As you can see the rotated factor loadings are substantially different with the continuous variables as compared with the binary ones.

Next, we will run the factor analysis using the Mplus package that uses tetrachoric correlations in computing the factor solutions.

Title:  Tetrachoric correlations for binary variables
Data:
  File is ..\data\hsbfactor.dat ;
Variable:
  Names are
      blocus bconcept bmot bread bwrite bmath bsci bss;
  Usevariables are
      blocus bconcept bmot bread bwrite bmath bsci bss;
  Categorical are
      blocus bconcept bmot bread bwrite bmath bsci bss;
Analysis:
  Type = basic;

           SAMPLE TETRACHORIC CORRELATIONS

              BLOCUS       BCONCEPT    BMOT      BREAD     BWRITE    BMATH     BSCI     BSS
 BLOCUS
 BCONCEPT      0.195
 BMOT          0.324       0.285
 BREAD         0.378       0.070       0.327
 BWRITE        0.275       0.015       0.291     0.694
 BMATH         0.344       0.057       0.273     0.725     0.693
 BSCI          0.334       0.097       0.100     0.766     0.602     0.692
 BSS           0.288       0.083       0.296     0.653     0.618     0.543     0.565

Title:  Factor analysis with binary variables
Data:
   File is ..\data\hsbfactor.dat ;
Variable:
  Names are
    blocus bconcept bmot bread bwrite bmath bsci bss;
  Usevariables are
    blocus bconcept bmot bread bwrite bmath bsci bss;
  Categorical are
    blocus bconcept bmot bread bwrite bmath bsci bss;
Analysis:
  Type = efa 3 4 ;
  
INPUT READING TERMINATED NORMALLY

Factor analysis with binary variables

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         600

Number of dependent variables                                    8
Number of independent variables                                  0
Number of continuous latent variables                            0

Observed dependent variables

  Binary and ordered categorical (ordinal)
   BLOCUS      BCONCEPT    BMOT        BREAD       BWRITE      BMATH
   BSCI        BSS


Estimator                                                      ULS
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20

Input data file(s)
  ..\data\hsbfactor.dat

Input data format  FREE

RESULTS FOR EXPLORATORY FACTOR ANALYSIS

           EIGENVALUES FOR SAMPLE CORRELATION MATRIX
                  1             2             3             4             5
              ________      ________      ________      ________      ________
                3.974         1.281         0.749         0.722         0.469

           EIGENVALUES FOR SAMPLE CORRELATION MATRIX
                  6             7             8
              ________      ________      ________
                0.356         0.262         0.186


           EXPLORATORY ANALYSIS WITH 3 FACTOR(S) :

           ROOT MEAN SQUARE RESIDUAL IS        0.0136

           VARIMAX ROTATED LOADINGS
                  1             2             3
              ________      ________      ________
 BLOCUS         0.304         0.408         0.129
 BCONCEPT      -0.032         0.466         0.095
 BMOT           0.276         0.676        -0.243
 BREAD          0.832         0.214         0.224
 BWRITE         0.857         0.072        -0.031
 BMATH          0.780         0.153         0.179
 BSCI           0.715         0.100         0.669
 BSS            0.690         0.188         0.075
We can do something very similar to this in Stata using polychoric (by Stas Kolenikov findit polychoric) and factormat (Stata 9) to analyze a dataset using tetrachoric correlations. When variables are binary polychoric produces tetrachoric correlations.
polychoric blocus bconcept bmot bread bwrite bmath bsci bss

Polychoric correlation matrix

             blocus   bconcept       bmot      bread     bwrite      bmath       bsci   bss
  blocus          1
bconcept  .19521351          1
    bmot  .32394358  .28496224          1
   bread  .37831861  .07019049  .32710018          1
  bwrite  .27491692  .01501151    .291141  .69446683          1
   bmath  .34366457  .05746446  .27300471  .72544653  .69332124          1
    bsci  .33427115  .09668219  .10016842  .76617994  .60225076  .69222132          1
     bss  .2874693   .0827877   .29619049  .65309799  .61768139  .54247674  .56492181     1

matrix R = r(R)

factormat R, n(600) pcf             /* factormat is Stata 9 */
(obs=600)

Factor analysis/correlation                        Number of obs    =      600
    Method: principal-component factors            Retained factors =        2
    Rotation: (unrotated)                          Number of params =       15

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      3.97411      2.69304            0.4968       0.4968
        Factor2  |      1.28107      0.53179            0.1601       0.6569
        Factor3  |      0.74929      0.02711            0.0937       0.7506
        Factor4  |      0.72218      0.25349            0.0903       0.8408
        Factor5  |      0.46869      0.11275            0.0586       0.8994
        Factor6  |      0.35594      0.09368            0.0445       0.9439
        Factor7  |      0.26226      0.07579            0.0328       0.9767
        Factor8  |      0.18647            .            0.0233       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(28) = 2263.93 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

    -------------------------------------------------
        Variable |  Factor1   Factor2 |   Uniqueness 
    -------------+--------------------+--------------
          blocus |   0.5147    0.4141 |      0.5637  
        bconcept |   0.1656    0.7648 |      0.3877  
            bmot |   0.4348    0.6347 |      0.4081  
           bread |   0.8997   -0.1138 |      0.1776  
          bwrite |   0.8267   -0.1794 |      0.2843  
           bmath |   0.8480   -0.1473 |      0.2593  
            bsci |   0.8221   -0.2228 |      0.2744  
             bss |   0.7778   -0.0732 |      0.3897  
    -------------------------------------------------
Note: The eigenvalues are the same as for Mplus.
factormat R, n(600) ipf factors(3)
(obs=600)

Factor analysis/correlation                        Number of obs    =      600
    Method: iterated principal factors             Retained factors =        3
    Rotation: (unrotated)                          Number of params =       21

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      3.68334      2.92815            0.7719       0.7719
        Factor2  |      0.75519      0.42193            0.1583       0.9302
        Factor3  |      0.33327      0.28288            0.0698       1.0000
        Factor4  |      0.05039      0.00336            0.0106       1.0106
        Factor5  |      0.04704      0.04313            0.0099       1.0204
        Factor6  |      0.00390      0.04003            0.0008       1.0212
        Factor7  |     -0.03613      0.02909           -0.0076       1.0137
        Factor8  |     -0.06522            .           -0.0137       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(28) = 2263.93 Prob>chi2 = 0.0000
    
    /* note iterated eigenvalues */

Factor loadings (pattern matrix) and unique variances

    -----------------------------------------------------------
        Variable |  Factor1   Factor2   Factor3 |   Uniqueness 
    -------------+------------------------------+--------------
          blocus |   0.4338    0.2539    0.1524 |      0.7242  
        bconcept |   0.1329    0.3759    0.2595 |      0.7737  
            bmot |   0.3946    0.6583   -0.0659 |      0.4065  
           bread |   0.8866   -0.0482   -0.0185 |      0.2112  
          bwrite |   0.8043   -0.0678   -0.3003 |      0.2584  
           bmath |   0.8086   -0.0740   -0.0621 |      0.3368  
            bsci |   0.8544   -0.3215    0.3636 |      0.0344  
             bss |   0.7105    0.0168   -0.1085 |      0.4831  
    -----------------------------------------------------------

rotate, horst blanks(.3)

Factor analysis/correlation                        Number of obs    =      600
    Method: iterated principal factors             Retained factors =        3
    Rotation: orthogonal varimax (Horst on)        Number of params =       21

    --------------------------------------------------------------------------
         Factor  |     Variance   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      3.20244      2.24256            0.6711       0.6711
        Factor2  |      0.95988      0.35040            0.2012       0.8723
        Factor3  |      0.60948            .            0.1277       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(28) = 2263.93 Prob>chi2 = 0.0000

Rotated factor loadings (pattern matrix) and unique variances

    -----------------------------------------------------------
        Variable |  Factor1   Factor2   Factor3 |   Uniqueness 
    -------------+------------------------------+--------------
          blocus |   0.3048    0.4083           |      0.7242  
        bconcept |             0.4652           |      0.7737  
            bmot |             0.6761           |      0.4065  
           bread |   0.8334                     |      0.2112  
          bwrite |   0.8575                     |      0.2584  
           bmath |   0.7807                     |      0.3368  
            bsci |   0.7192              0.6621 |      0.0344  
             bss |   0.6905                     |      0.4831  
    -----------------------------------------------------------
    (blanks represent abs(loading)<.3)

Factor rotation matrix

    -----------------------------------------
                 | Factor1  Factor2  Factor3 
    -------------+---------------------------
         Factor1 |  0.9235   0.2974   0.2424 
         Factor2 | -0.1711   0.8847  -0.4337 
         Factor3 | -0.3434   0.3590   0.8679 
    -----------------------------------------
These results are very close to those using Mplus.


Multivariate Course Page

Phil Ender, 16nov05, 26nov03