In addition to discriminanting between groups discriminant analysis allows for the classification of observations into groups. Classification can be performed using the observed response variables or more commonly the values of the linear discriminant function. We will begin by looking at classification using observed variables.
Using observed variables
Let xbar1 be the mean vector for group 1 and let xbarj be the mean vector for group j on the observed variables X.
Let x1 be a vector the observed scores for subject 1 and let xi be a vector of observed scores for subject i.
Let d11 be the difference vector for subject 1 using the mean vector for group 1, that is, d11 = x1 - xbar1. And let d1j = x1 - xbarj.
Using the pooled within covariance matrix, Cw
Cw = (S1 + S2 + ... + Sk)/(N-k), where k is the number of groups.
For each subject and each group compute the quantity
χij2 = dij*Cw-1*dij'
Thus, if there are three groups there will be three values of χij2 computed for each subject. The subject will the be classified into the group with the smallest χij2.
Using a separate covariance matrix for each group, Cj
Cj = Sj/(nj - 1), where j is the group identifier.
Let xi, xbarj and dij be defined as above.
For each subject and each group compute the quantity
χ'ij2 = dij*Cj-1*dij' + ln|Cj|
Again, the subject will the be classified into the group with the smallest χ'ij2.
Taking prior probabilities into account
Let xi, xbarj and dij be defined as above.
Let p1 = n1/N be the prior probability for group 1 and let pk = nk/N be the prior probability for group k. For each subject and each group compute the quantity
χ''ij2 = χ'ij2 - 2ln pk
The subject will the be classified into the group with the smallest χ''ij2.
Example using hsb2 with variables write and math
/* mean vectors for each group */
mat xb1 = (51.33, 50.02) /* n1 = 45 */
mat xb2 = (56.26, 56.73) /* n2 = 105 */
mat xb3 = (46.76, 46.42) /* n3 = 50 */
/* pooled within covariance matrix */
mat list Cw
symmetric Cw[2,2]
write math
write 74.635417
math 37.430998 68.34361
/* separate group covariance matriced */
mat list C1
symmetric C1[2,2]
write math
write 88.318182
math 25.083333 55.385859
mat list C2
symmetric C2[2,2]
write math
write 63.096703
math 42.511538 76.216667
mat list C3
symmetric C3[2,2]
write math
write 86.839184
math 37.73551 63.26898
/* determinants of covariance matrices */
scalar det1 = det(C1)
scalar det2 = det(C2)
scalar det3 = det(C3)
display det1 " " det2 " " det3
4262.4047 3001.7895 4070.2578
/* prior probabilities */
scalar p1 = 45/200
scalar p2 = 105/200
scalar p3 = 50/200
display p1 " " p2 " " p3
.225 .525 .25
/* scores for subject 1 & 2 */
mat x1 = (52, 41) /* actual group membership -- group 1 */
mat x2 = (41, 44) /* actual group membership -- group 3 */
mat x3 = (64, 70) /* actual group membership -- group 2 */
/* difference vectors for subject 1 from each group */
mat d11 = x1-xb1
mat d12 = x1-xb2
mat d13 = x1-xb3
/* chi-square for subject 1 */
mat xc11 = d11*syminv(Cw)*d11'
mat xc12 = d12*syminv(Cw)*d12'
mat xc13 = d13*syminv(Cw)*d13'
symmetric xc11[1,1]
r1
r1 1.7718562
symmetric xc12[1,1]
r1
r1 3.9707944
symmetric xc13[1,1]
r1
r1 1.6744838
/* classify subject 1 into group 3 */
/* chi-square prime for subject 1 */
mat xcp11 = d11*syminv(C1)*d11' + ln(det1)
mat xcp12 = d12*syminv(C2)*d12' + ln(det2)
mat xcp13 = d13*syminv(C3)*d13' + ln(det3)
symmetric xcp11[1,1]
c1
r1 4264.1675
symmetric xcp12[1,1]
c1
r1 3005.5532
symmetric xcp13[1,1]
c1
r1 4071.838
/* classify subject 1 into group 2 */
/* chi-square double prime for subject 1 */
mat xcpp11 = xcp11 - 2*ln(p1)
mat xcpp12 = xcp12 - 2*ln(p2)
mat xcpp13 = xcp13 - 2*ln(p3)
symmetric xcpp11[1,1]
c1
r1 4267.1508
symmetric xcpp12[1,1]
c1
r1 3006.842
symmetric xcpp13[1,1]
c1
r1 4074.6106
/* classify subject 1 into group 2 */
/* table for all three subjects */
using pooled within covariance matrix
S Grp1 Grp2 Grp3 Class as
1 1.77 3.97 1.67 3
2 1.44 3.64 .45 3
3 5.89 2.58 8.48 2
using separate group covariance matrices
S Grp1 Grp2 Grp3 Class as
1 10.12 11.77 9.89 3
2 9.76 11.82 8.69 3
3 15.74 10.32 17.26 2
using prior probabilities
S Grp1 Grp2 Grp3 Class as
1 13.10 13.06 12.66 3
2 12.75 13.11 11.47 3
3 18.72 11.61 20.03 2
Since there are often more observed variables than discriminant functions, it is usually
more efficient to do the classification using the the discriminant function scores. The
computations are exactly the same as with observed variables.
In this small example there is no particular saving by using the discriminant scores from all of the dimensions.
Using discriminant function scores
candisc write math in 1/200, group(prog) notable nomeans nostruct Canonical linear discriminant analysis | | Like- | Canon. Eigen- Variance | lihood Fcn | Corr. value Prop. Cumul. | Ratio F df1 df2 Prob>F ----+---------------------------------+------------------------------------ 1 | 0.5038 .340237 0.9882 0.9882 | 0.7431 15.683 4 392 0.0000 e 2 | 0.0636 .004058 0.0118 1.0000 | 0.9960 .79952 1 197 0.3723 e --------------------------------------------------------------------------- Ho: this and smaller canon. corr. are zero; e = exact F Standardized canonical discriminant function coefficients | function1 function2 -------------+---------------------- write | .4198644 -1.096543 math | .7138331 .9322744 /* display raw coefficients */ mat lis e(L_unstd) e(L_unstd)[3,2] function1 function2 write .04860004 -.12692679 math .08634709 .11277032 _cons -7.1106094 .76176766 /* generate the discriminant function scores generate f1 = -7.1106 + 0.0486*write + 0.0863*math generate f2 = .7618 - 0.1269*write + 0.1128*math /* mean vector for each function and group */ mat fb1 = (-0.2965, -0.1128) mat fb2 = ( 0.5222, 0.0191) mat fb3 = (-0.8298, 0.0615) /* discriminant function scores for each subject */ mat f1 = (-1.0451, -1.2122) mat f2 = (-1.3208, .5221) mat f2 = ( 2.0408, .5362) /* table for all three subjects using discriminant scores */ using pooled within covariance matrix differs from using observed variables only by rounding error S Grp1 Grp2 Grp3 Class as 1 1.77 3.97 1.67 3 2 1.45 3.65 .45 3 3 5.88 2.57 8.47 2 using separate group covariance matrices S Grp1 Grp2 Grp3 Class as 1 1.90 3.55 1.67 3 2 1.55 3.62 .48 3 3 7.50 2.10 9.03 2 using prior probabilities S Grp1 Grp2 Grp3 Class as 1 4.89 4.84 4.44 3 2 4.54 4.91 3.25 3 3 10.49 3.39 11.80 2One advantage to using discriminant functions scores is that you may want to use only the scores from the significant dimensions. In our example only the first dimension is statistically significant. Since we are using fewer scores this approach can be considered to be using reduced dimensionality.
Using only the significant discriminant function scores
/* mean vector for each group on the single function */ mat fb1 = (-0.2965) mat fb2 = ( 0.5222) mat fb3 = (-0.8298) /* discriminant function scores for each subject */ mat f1 = (-1.0451) mat f2 = (-1.3208) mat f2 = ( 2.0408) /* table for all three subjects using discriminant scores */ using pooled within variance matrix S Grp1 Grp2 Grp3 Class as 1 .56 2.46 .046 3 2 1.05 3.40 .241 3 3 5.46 2.31 8.240 2 using separate group variance matrices S Grp1 Grp2 Grp3 Class as 1 1.51 3.36 1.040 3 2 2.09 4.24 1.236 3 3 7.40 3.22 9.287 2 using prior probabilities S Grp1 Grp2 Grp3 Class as 1 4.49 4.65 3.81 3 2 5.08 5.53 4.01 3 3 10.38 4.51 12.06 2Classifying with unknown group membership
The real utility of classification comes when you have scores on individuals with unknown group membership. Using the hsb2 dataset we will create three new cases and then using candisc and predict classify them into groups.
use http://www.gseis.ucla.edu/courses/data/hsb2, clear set obs 203 replace read =40 in 201 replace write=40 in 201 replace math =40 in 201 replace read =50 in 202 replace write=50 in 202 replace math =50 in 202 replace read =60 in 203 replace write=60 in 203 replace math =60 in 203 list read write math prog in 201/203, clean read write math prog 201. 40 40 40 . 202. 50 50 50 . 203. 60 60 60 . candisc read write math in 1/200, group(prog) notable nostruct Canonical linear discriminant analysis | | Like- | Canon. Eigen- Variance | lihood Fcn | Corr. value Prop. Cumul. | Ratio F df1 df2 Prob>F ----+---------------------------------+------------------------------------ 1 | 0.5125 .356283 0.9874 0.9874 | 0.7340 10.87 6 390 0.0000 e 2 | 0.0672 .004543 0.0126 1.0000 | 0.9955 .44518 2 196 0.6414 e --------------------------------------------------------------------------- Ho: this and smaller canon. corr. are zero; e = exact F Standardized canonical discriminant function coefficients | function1 function2 -------------+---------------------- read | .2728524 .4097932 write | .3310784 -1.183414 math | .5815538 .655658 Group means on canonical variables prog | function1 function2 -------------+---------------------- general | -.3120021 -.1190423 academic | .5358515 .0196809 vocation | -.8444861 .0658081 predict class, classification list read write math prog class in 201/203, clean read write math prog class 201. 40 40 40 . 3 202. 50 50 50 . 1 203. 60 60 60 . 2
Multivariate Course Page
Phil Ender, 29jul07, 30oct05