Latent Class Analysis is a type of latent variable analysis in which the observed predictor variables are categorical and the latent (unobserved) response variable is also categorical. More formally, latent class analysis is a statistical method for finding subtypes of related cases (latent classes) from multivariate categorical data. In a sense, latent class analysis is like cluster analysis, in that, it attempts to find groups or classes of observations that are similar to one another.
MI Example
Four diagnostic criteria binary indicators
use http://www.gseis.ucla.edu/courses/data/rindskopf2a list, sep(0) noobs +-------------------------------+ | pat y1 y2 y3 y4 wt2 | |-------------------------------| | 1 1 1 1 1 24 | | 2 0 1 1 1 5 | | 3 1 0 1 1 4 | | 4 0 0 1 1 3 | | 5 1 1 0 1 3 | | 6 0 1 0 1 5 | | 7 1 0 0 1 2 | | 8 0 0 0 1 7 | | 9 1 1 1 0 0 | | 10 0 1 1 0 0 | | 11 1 0 1 0 0 | | 12 0 0 1 0 1 | | 13 1 1 0 0 0 | | 14 0 1 0 0 7 | | 15 1 0 0 0 0 | | 16 0 0 0 0 33 | +-------------------------------+ reshape long y, i(pat) j(var) list in 1/8, sep(4) +---------------------+ | pat var y wt2 | |---------------------| 1. | 1 1 1 24 | 2. | 1 2 1 24 | 3. | 1 3 1 24 | 4. | 1 4 1 24 | |---------------------| 5. | 2 1 0 5 | 6. | 2 2 1 5 | 7. | 2 3 1 5 | 8. | 2 4 1 5 | +---------------------+ for num 1/4: generate vX = var==X list in 1/8, sep(4) +-----------------------------------------+ | pat var y wt2 v1 v2 v3 v4 | |-----------------------------------------| 1. | 1 1 1 24 1 0 0 0 | 2. | 1 2 1 24 0 1 0 0 | 3. | 1 3 1 24 0 0 1 0 | 4. | 1 4 1 24 0 0 0 1 | |-----------------------------------------| 5. | 2 1 0 5 1 0 0 0 | 6. | 2 2 1 5 0 1 0 0 | 7. | 2 3 1 5 0 0 1 0 | 8. | 2 4 1 5 0 0 0 1 | +-----------------------------------------+Now that the data are organized the way we want, we can begin the latent class analysis. The option nip(2) indicates that we want two latent classes.
eq v1: v1 eq v2: v2 eq v3: v3 eq v4: v4 gllamm y, i(pat) ip(fn) nrf(4) eqs(v1 v2 v3 v4) we(wt) nip(2) l(logit) f(binom) nocons number of level 1 units = 376 number of level 2 units = 94 Condition Number = 4452.1073 gllamm model log likelihood = -180.69771 No fixed effects Probabilities and locations of random effects ------------------------------------------------------------------------------ ***level 2 (pat) loc1: -17.584, 1.1903 var(1): 87.492802 loc2: -1.4173, 1.3333 cov(2,1): 12.818057 var(2): 1.8778983 loc3: -3.5875, 1.5708 cov(3,1): 24.038823 cov(3,2): 3.5217869 var(3): 6.6047149 loc4: -1.4143, 16.845 cov(4,1): 85.093401 cov(4,2): 12.466535 cov(4,3): 23.379583 var(4): 82.759801 prob: 0.5422, 0.4578 log odds parameters class 1 _cons: .16913132 (.22601687) ------------------------------------------------------------------------------ display 94*0.5422 50.9668 display 94*0.4578 43.0332 /* reorganizing the output manually */ class1 class2 loc1: -17.584 1.1903 loc2: -1.4173 1.3333 loc3: -3.5875 1.5708 loc4: -1.4143 16.845 prob: 0.5422 0.4578 ecount: 50.9668 43.0332Class 2 is the class most likely to have an MI (Pr = .4578) with an expected count of 43.0332.
If you wish to see the latent class coefficients along with their standard errors, try this code:
mat b=e(b) mat V = e(V) ereturn post b V, ereturn display ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- z2_1_1 v1 | -17.58414 952.4511 -0.02 0.985 -1884.354 1849.186 -------------+---------------------------------------------------------------- z2_2_1 v2 | -1.417265 .3881472 -3.65 0.000 -2.178019 -.6565103 -------------+---------------------------------------------------------------- z2_3_1 v3 | -3.587496 1.008792 -3.56 0.000 -5.564692 -1.610299 -------------+---------------------------------------------------------------- z2_4_1 v4 | -1.414294 .4120723 -3.43 0.001 -2.221941 -.6066475 -------------+---------------------------------------------------------------- p2_1 _cons | .1691313 .2260169 0.75 0.454 -.2738536 .6121162 -------------+---------------------------------------------------------------- z2_1_2 v1 | 1.190308 .4176254 2.85 0.004 .3717775 2.008839 -------------+---------------------------------------------------------------- z2_2_2 v2 | 1.33327 .3892167 3.43 0.001 .5704188 2.09612 -------------+---------------------------------------------------------------- z2_3_2 v3 | 1.570822 .4738385 3.32 0.001 .6421153 2.499528 -------------+---------------------------------------------------------------- z2_4_2 v4 | 16.84528 701.791 0.02 0.981 -1358.64 1392.33 ------------------------------------------------------------------------------We can obtain the predicted probability for each pattern of being in a class, Pr(c=1|yj) and Pr(c=2|yj), using the gllapred command.
gllapred prob, p merge pat using rindskopf2a, keep(y1 y2 y3 y4) list pat y1 y2 y3 y4 wt2 prob1 prob2 if var==1, sep(0) noobs +-------------------------------------------------------+ | pat y1 y2 y3 y4 wt2 prob1 prob2 | |-------------------------------------------------------| | 1 1 1 1 1 24 5.589e-11 1 | | 2 0 1 1 1 5 .00789839 .99210161 | | 3 1 0 1 1 4 8.748e-10 1 | | 4 0 0 1 1 3 .11079636 .88920364 | | 5 1 1 0 1 3 9.718e-09 .99999999 | | 6 0 1 0 1 5 .58057906 .41942094 | | 7 1 0 0 1 2 1.521e-07 .99999985 | | 8 0 0 0 1 7 .95587857 .04412143 | | 9 1 1 1 0 0 .00473495 .99526505 | | 10 0 1 1 0 0 .99999852 1.476e-06 | | 11 1 0 1 0 0 .06929933 .93070067 | | 12 0 0 1 0 1 .99999991 9.428e-08 | | 13 1 1 0 0 0 .45271194 .54728806 | | 14 0 1 0 0 7 .99999999 8.487e-09 | | 15 1 0 0 0 0 .92829674 .07170326 | | 16 0 0 0 0 33 1 5.423e-10 | +-------------------------------------------------------+We can classify the observation into the latent classes based upon which class has the larger probability.
generate class2 = prob2>prob1 list pat y1 y2 y3 y4 wt2 class2 if var==1, sep(0) noobs +----------------------------------------+ | pat y1 y2 y3 y4 wt2 class2 | |----------------------------------------| | 1 1 1 1 1 24 1 | | 2 0 1 1 1 5 1 | | 3 1 0 1 1 4 1 | | 4 0 0 1 1 3 1 | | 5 1 1 0 1 3 1 | | 6 0 1 0 1 5 0 | | 7 1 0 0 1 2 1 | | 8 0 0 0 1 7 0 | | 9 1 1 1 0 0 1 | | 10 0 1 1 0 0 0 | | 11 1 0 1 0 0 1 | | 12 0 0 1 0 1 0 | | 13 1 1 0 0 0 1 | | 14 0 1 0 0 7 0 | | 15 1 0 0 0 0 0 | | 16 0 0 0 0 33 0 | +----------------------------------------+We can also use gllapred to compute the conditional response probabilities, in particular, Pr(yij=1|c=2), also know as the sensitivity.
generate e1=1.1903 generate e2=1.3333 generate e3=1.5706 generate e4=16.845 gllapred cprob, mu us(e) li v1-v4 cprob in 1/4, noobs +-------------------------------+ | v1 v2 v3 v4 cprob | |-------------------------------| | 1 0 0 0 .76679471 | | 0 1 0 0 .79138597 | | 0 0 1 0 .82786913 | | 0 0 0 1 .99999995 | +-------------------------------+Next, we will use gllapred to obtain the expected counts for each of the patterns.
gllapred l, ll (ll will be stored in l) log-likelihood:-180.69771 generate count = e(N)*exp(l) list pat y1 y2 y3 y4 wt2 count if var==1, sep(0) noobs +------------------------------------------+ | pat y1 y2 y3 y4 wt2 count | |------------------------------------------| | 1 1 1 1 1 24 21.62042 | | 2 0 1 1 1 5 6.627714 | | 3 1 0 1 1 4 5.699445 | | 4 0 0 1 1 3 1.949337 | | 5 1 1 0 1 3 4.49433 | | 6 0 1 0 1 5 3.258896 | | 7 1 0 0 1 2 1.184768 | | 8 0 0 0 1 7 8.166566 | | 9 1 1 1 0 0 . | | 10 0 1 1 0 0 . | | 11 1 0 1 0 0 . | | 12 0 0 1 0 1 .8884496 | | 13 1 1 0 0 0 . | | 14 0 1 0 0 7 7.783092 | | 15 1 0 0 0 0 . | | 16 0 0 0 0 33 32.11164 | +------------------------------------------+
Categorical Data Analysis Course
Phil Ender