Applied Categorical & Nonnormal Data Analysis

Latent Profile & Latent Class Models

Introduction

Cluster analysis techniques and not the only way to find non-observed groupings in your data. In fact, from several perspectives cluster analysis may not be the best way to determine these groupings. There are several latent variable approaches that are available. In this unit we will explore two of them: Latent profile analysis and latent class analysis.

The advantages of these approaches over cluster analysis are that they are model based, generating probabilities for group membership. It is possible to test these models and to analyze their goodness of fit. The downside to this approach is that it requires sepcialized software that is more complex to run than typical statistical packages. We will demonstrate these techniques using the Mplus software from Muthén & Muthén. We will also use Stata for descriptive and subsidiary analyses.

Latent profile analysis will use continuous predictors and the latent class analysis will use binary predictor variables. We will use the reading, writing, math, science and social studies test scores from the hsb6a dataset. For the binary predictor variables we will do median splits on each of the tests to create hiread, hiwrite, himath, hisci and hiss.

Looking at the data

use hsb6a

describe

Contains data from hsb6a.dta
  obs:           600                          highschool and beyond (600
                                                cases)
 vars:            23                          24 Oct 2003 14:18
 size:        31,200 (99.0% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
id              int    %9.0g
gender          byte   %9.0g       gl
race            byte   %12.0g      rl
ses             byte   %9.0g       sl
sch             byte   %9.0g       scl
prog            byte   %9.0g       pl
locus           float  %9.0g                  locus of control
concept         float  %9.0g                  self-concept
mot             float  %9.0g                  motivation
career          byte   %14.0g      cl         career choice
read            float  %9.0g                  reading score
write           float  %9.0g                  writing score
math            float  %9.0g                  math score
sci             float  %9.0g                  science score
ss              float  %9.0g                  social studies score
hiread          byte   %9.0g
hiwrite         byte   %9.0g
himath          byte   %9.0g
hisci           byte   %9.0g
hiss            byte   %9.0g

sum read write math sci ss hiread hiwrite himath hisci hiss

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        read |       600    51.90183    10.10298       28.3         76
       write |       600    52.38483    9.726455       25.5       67.1
        math |       600      51.849    9.414736       31.8       75.5
         sci |       600    51.76333    9.706179         26       74.2
          ss |       600    52.04567    9.879228       25.7       70.5
-------------+--------------------------------------------------------
      hiread |       600        .525    .4997913          0          1
     hiwrite |       600         .54    .4988133          0          1
      himath |       600    .4966667    .5004061          0          1
       hisci |       600    .5266667     .499705          0          1
        hiss |       600    .6483333     .477889          0          1

A 2 Class Latent Profile Model

Data:
  File is I:\mplus\hsb6.dat ;
Variable:
  Names are
   id gender race ses sch prog locus concept mot career read write math
   sci ss hiread hiwrite himath hisci hiss academic;
  Usevariables are
     read write math sci ss ;
  classes = c(2);
Analysis:
  Type=mixture;
MODEL:
  %C#1%
  [read math sci ss write  * 30 ];

  %C#2%
  [read math sci ss write  * 60];
OUTPUT:
  TECH8;
SAVEDATA:
  file is lca_ex1.txt ;
  save is cprob;
  format is free;


THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Loglikelihood

          H0 Value                       -5213.102

Information Criteria

          Number of Free Parameters             16
          Akaike (AIC)                   10458.203
          Bayesian (BIC)                 10517.464
          Sample-Size Adjusted BIC       10466.721
            (n* = (n + 2) / 24)
          Entropy                            0.865



FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

  Class 1        123.03223          0.41011
  Class 2        176.96777          0.58989


CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions

  Class 1              120          0.40000
  Class 2              180          0.60000


Average Class Probabilities by Class

                 1        2
  Class 1     0.961    0.039
  Class 2     0.043    0.957


MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

CLASS 1
 Means
    READ              43.151    0.820     52.641
    WRITE             44.524    1.024     43.485
    MATH              43.860    0.757     57.947
    SCI               43.322    1.051     41.239
    SS                45.119    0.946     47.707

 Variances
    READ              49.035    4.175     11.745
    WRITE             44.303    3.927     11.283
    MATH              45.062    3.768     11.958
    SCI               48.986    5.184      9.450
    SS                55.410    4.445     12.465

CLASS 2
 Means
    READ              57.915    0.847     68.403
    WRITE             58.115    0.625     93.039
    MATH              57.136    0.800     71.386
    SCI               56.729    0.668     84.953
    SS                57.220    0.723     79.137

 Variances
    READ              49.035    4.175     11.745
    WRITE             44.303    3.927     11.283
    MATH              45.062    3.768     11.958
    SCI               48.986    5.184      9.450
    SS                55.410    4.445     12.465

LATENT CLASS REGRESSION MODEL PART
 Means
    C#1               -0.364    0.179     -2.032


QUALITY OF NUMERICAL RESULTS

     Condition Number for the Information Matrix              0.462E-03
       (ratio of smallest to largest eigenvalue)

A 3 Class Latent Profile Model

Data:
  File is I:\mplus\hsb6.dat ;
Variable:
  Names are
   id gender race ses sch prog locus concept mot career read write math
   sci ss hiread hiwrite himath hisci hiss academic;
  Usevariables are
     read write math sci ss ;
  classes = c(3);
Analysis:
  Type=mixture;
MODEL:
  %C#1%
  [read math sci ss write  *30 ];

  %C#2%
  [read math sci ss write  *45];

  %C#3%
  [read math sci ss write  *60];
OUTPUT:
  TECH8;
SAVEDATA:
  file is lca_ex2.txt ;
  save is cprob;
  format is free;


THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Loglikelihood

          H0 Value                       -5100.544

Information Criteria

          Number of Free Parameters             22
          Akaike (AIC)                   10245.087
          Bayesian (BIC)                 10326.571
          Sample-Size Adjusted BIC       10256.800
            (n* = (n + 2) / 24)
          Entropy                            0.877



FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

  Class 1         98.08460          0.32695
  Class 2        137.86474          0.45955
  Class 3         64.05066          0.21350


CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions

  Class 1               99          0.33000
  Class 2              138          0.46000
  Class 3               63          0.21000


Average Class Probabilities by Class

                 1        2        3
  Class 1     0.961    0.039    0.000
  Class 2     0.021    0.940    0.039
  Class 3     0.000    0.068    0.932


MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

CLASS 1
 Means
    READ              41.866    0.614     68.208
    WRITE             43.080    0.870     49.514
    MATH              42.447    0.549     77.337
    SCI               41.409    0.748     55.358
    SS                44.232    0.819     54.010

 Variances
    READ              33.867    3.334     10.159
    WRITE             40.042    4.168      9.607
    MATH              28.667    2.980      9.619
    SCI               34.199    3.411     10.027
    SS                48.355    4.323     11.185

CLASS 2
 Means
    READ              53.058    0.726     73.044
    WRITE             55.195    0.677     81.493
    MATH              52.704    0.683     77.191
    SCI               53.195    0.600     88.727
    SS                53.377    0.745     71.657

 Variances
    READ              33.867    3.334     10.159
    WRITE             40.042    4.168      9.607
    MATH              28.667    2.980      9.619
    SCI               34.199    3.411     10.027
    SS                48.355    4.323     11.185

CLASS 3
 Means
    READ              64.588    0.949     68.070
    WRITE             61.318    0.624     98.232
    MATH              63.667    0.907     70.167
    SCI               62.043    0.873     71.064
    SS                62.139    0.827     75.163

 Variances
    READ              33.867    3.334     10.159
    WRITE             40.042    4.168      9.607
    MATH              28.667    2.980      9.619
    SCI               34.199    3.411     10.027
    SS                48.355    4.323     11.185

LATENT CLASS REGRESSION MODEL PART
 Means
    C#1                0.426    0.201      2.120
    C#2                0.767    0.196      3.901


QUALITY OF NUMERICAL RESULTS

     Condition Number for the Information Matrix              0.461E-03
       (ratio of smallest to largest eigenvalue)

A 2 Class Latent Class Model

Data:
  File is h:\mplus\hsb6.dat ;
Variable:
  Names are
   id gender race ses sch prog locus concept mot career read write math
   sci ss hiread hiwrite himath hisci hiss academic;
  Usevariables are
     hiread hiwrite himath hisci hiss ;
  categorical = hiread hiwrite himath hisci hiss;
  classes = c(2);
Analysis:
  Type=mixture;
MODEL:
  %C#1%
  [hiread$1 *2 himath$1 *2 hisci$1 *2 hiss$1 *2 hiwrite$1  *2 ];
  %C#2%
  [hiread$1 *-2 himath$1 *-2 hisci$1 *-2 hiss$1 *-2 hiwrite$1 *-2 ];

OUTPUT:
  TECH8;
SAVEDATA:
  file is lca_ex7.txt ;
  save is cprob;
  format is free;


THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Loglikelihood

          H0 Value                        -849.157

Information Criteria

          Number of Free Parameters             11
          Akaike (AIC)                    1720.315
          Bayesian (BIC)                  1761.057
          Sample-Size Adjusted BIC        1726.171
            (n* = (n + 2) / 24)
          Entropy                            0.815

Chi-Square Test of Model Fit for the Latent Class Indicator Model Part

          Pearson Chi-Square

          Value                             44.642
          Degrees of Freedom                    20
          P-Value                           0.0012

          Likelihood Ratio Chi-Square

          Value                             45.747
          Degrees of Freedom                    20
          P-Value                           0.0009


FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

  Class 1        123.33019          0.41110
  Class 2        176.66981          0.58890


CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions

  Class 1              127          0.42333
  Class 2              173          0.57667


Average Class Probabilities by Class

                 1        2
  Class 1     0.930    0.070
  Class 2     0.030    0.970


MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

CLASS 1
CLASS 2

LATENT CLASS INDICATOR MODEL PART

 Class 1
 Thresholds
    HIREAD$1           2.273    0.424      5.354
    HIWRITE$1          1.376    0.276      4.990
    HIMATH$1           2.081    0.399      5.209
    HISCI$1            2.035    0.411      4.947
    HISS$1             0.642    0.231      2.780

 Class 2
 Thresholds
    HIREAD$1          -1.540    0.264     -5.823
    HIWRITE$1         -1.488    0.244     -6.109
    HIMATH$1          -1.217    0.217     -5.616
    HISCI$1           -1.264    0.213     -5.927
    HISS$1            -2.047    0.279     -7.328

LATENT CLASS REGRESSION MODEL PART
 Means
    C#1               -0.359    0.161     -2.231


LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE

 Class 1
 HIREAD
    Category 1         0.907    0.036     25.221
    Category 2         0.093    0.036      2.599
 HIWRITE
    Category 1         0.798    0.044     17.985
    Category 2         0.202    0.044      4.542
 HIMATH
    Category 1         0.889    0.039     22.555
    Category 2         0.111    0.039      2.816
 HISCI
    Category 1         0.884    0.042     21.036
    Category 2         0.116    0.042      2.748
 HISS
    Category 1         0.655    0.052     12.564
    Category 2         0.345    0.052      6.615

 Class 2
 HIREAD
    Category 1         0.177    0.038      4.592
    Category 2         0.823    0.038     21.417
 HIWRITE
    Category 1         0.184    0.037      5.031
    Category 2         0.816    0.037     22.288
 HIMATH
    Category 1         0.228    0.038      5.980
    Category 2         0.772    0.038     20.197
 HISCI
    Category 1         0.220    0.037      6.015
    Category 2         0.780    0.037     21.288
 HISS
    Category 1         0.114    0.028      4.043
    Category 2         0.886    0.028     31.304


QUALITY OF NUMERICAL RESULTS
     Condition Number for the Information Matrix              0.654E-01
       (ratio of smallest to largest eigenvalue)

A 3 Class Latent Class Model

Data:
  File is h:\mplus\hsb6.dat ;
Variable:
  Names are
   id gender race ses sch prog locus concept mot career read write math
   sci ss hiread hiwrite himath hisci hiss academic;
  Usevariables are
     hiread hiwrite himath hisci hiss ;
  categorical = hiread hiwrite himath hisci hiss;
  classes = c(3);
Analysis:
  Type=mixture;
MODEL:
  %C#1%
  [hiread$1 *2 himath$1 *2 hisci$1 *2 hiss$1 *2 hiwrite$1  *2 ];
  %C#2%
  [hiread$1 *0 himath$1 *0 hisci$1 *0 hiss$1 *0 hiwrite$1  *0 ];
  %C#3%
  [hiread$1 *-2 himath$1 *-2 hisci$1 *-2 hiss$1 *-2 hiwrite$1 *-2 ];

OUTPUT:
  TECH8;
SAVEDATA:
  file is lca_ex8.txt ;
  save is cprob;
  format is free;


THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Loglikelihood

          H0 Value                        -839.066

Information Criteria

          Number of Free Parameters             17
          Akaike (AIC)                    1712.132
          Bayesian (BIC)                  1775.096
          Sample-Size Adjusted BIC        1721.182
            (n* = (n + 2) / 24)
          Entropy                            0.682

Chi-Square Test of Model Fit for the Latent Class Indicator Model Part

          Pearson Chi-Square

          Value                             21.369
          Degrees of Freedom                    14
          P-Value                           0.0925

          Likelihood Ratio Chi-Square

          Value                             25.564
          Degrees of Freedom                    14
          P-Value                           0.0294



FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

  Class 1         95.51732          0.31839
  Class 2        127.98211          0.42661
  Class 3         76.50058          0.25500


CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions
  Class 1               94          0.31333
  Class 2              130          0.43333
  Class 3               76          0.25333


Average Class Probabilities by Class

                 1        2        3
  Class 1     0.913    0.087    0.000
  Class 2     0.074    0.826    0.099
  Class 3     0.000    0.163    0.837


MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

CLASS 1
CLASS 2
CLASS 3

LATENT CLASS INDICATOR MODEL PART

 Class 1
 Thresholds
    HIREAD$1           2.883    0.671      4.296
    HIWRITE$1          1.735    0.418      4.150
    HIMATH$1           2.863    0.739      3.877
    HISCI$1            3.007    0.861      3.492
    HISS$1             0.991    0.319      3.106

 Class 2
 Thresholds
    HIREAD$1          -0.392    0.348     -1.128
    HIWRITE$1         -0.451    0.445     -1.013
    HIMATH$1          -0.258    0.342     -0.754
    HISCI$1           -0.453    0.269     -1.688
    HISS$1            -1.201    0.400     -2.999

 Class 3
 Thresholds
    HIREAD$1          -4.377    6.575     -0.666
    HIWRITE$1        -15.000    0.000      0.000
    HIMATH$1          -2.932    1.699     -1.726
    HISCI$1           -2.257    0.986     -2.289
    HISS$1            -3.761    2.143     -1.755

LATENT CLASS REGRESSION MODEL PART
 Means
    C#1                0.222    0.398      0.558
    C#2                0.515    0.499      1.032


LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE

 Class 1
 HIREAD
    Category 1         0.947    0.034     28.108
    Category 2         0.053    0.034      1.574
 HIWRITE
    Category 1         0.850    0.053     15.951
    Category 2         0.150    0.053      2.815
 HIMATH
    Category 1         0.946    0.038     25.073
    Category 2         0.054    0.038      1.431
 HISCI
    Category 1         0.953    0.039     24.648
    Category 2         0.047    0.039      1.219
 HISS
    Category 1         0.729    0.063     11.577
    Category 2         0.271    0.063      4.298

 Class 2
 HIREAD
    Category 1         0.403    0.084      4.819
    Category 2         0.597    0.084      7.134
 HIWRITE
    Category 1         0.389    0.106      3.680
    Category 2         0.611    0.106      5.775
 HIMATH
    Category 1         0.436    0.084      5.177
    Category 2         0.564    0.084      6.702
 HISCI
    Category 1         0.389    0.064      6.090
    Category 2         0.611    0.064      9.582
 HISS
    Category 1         0.231    0.071      3.249
    Category 2         0.769    0.071     10.797

 Class 3
 HIREAD
    Category 1         0.012    0.081      0.154
    Category 2         0.988    0.081     12.253
 HIWRITE
    Category 1         0.000    0.000      0.000
    Category 2         1.000    0.000      0.000
 HIMATH
    Category 1         0.051    0.082      0.620
    Category 2         0.949    0.082     11.641
 HISCI
    Category 1         0.095    0.085      1.120
    Category 2         0.905    0.085     10.700
 HISS
    Category 1         0.023    0.048      0.477
    Category 2         0.977    0.048     20.530


QUALITY OF NUMERICAL RESULTS
     Condition Number for the Information Matrix              0.323E-03
       (ratio of smallest to largest eigenvalue)

Categorical Data Analysis Course

Phil Ender -- 24apr03