In the days before the development of multinomial logistic regression many researchers used discriminant function analysis to investigate categorical response variables. Stata does not have a discriminant analysis command built-in so we will use the daoneway command from ATS. The daoneway command is still in development so it doesn't have classifications built into it so we will also include some SPSS output for the second example.
It would be well to keep in mind that discriminant analysis has as an assumption that that the predictor variables have a multivariate normal distribution. For discriminant analysis you can use standard multivariate analysis refrences, such as, Morrison (Multivariate Statistical Methods, 1990) or Affifi & Clark (Computer-aided Multivariate Analysis,1996).
Example 1
use http://www.gseis.ucla.edu/courses/data/hsb2 daoneway female math socst, by(prog) One-way Disciminant Function Analysis Observations = 200 Variables = 3 Groups = 3 Pct of Cum Canonical After Wilks' Fcn Eigenvalue Variance Pct Corr Fcn Lambda Chi-square df P-value | 0 0.71390 66.053 6 0.0000 1 0.3955 99.05 99.05 0.5324 | 1 0.99623 0.739 2 0.6910 2 0.0038 0.95 100.00 0.0614 | Unstandardized discriminant function coefficients func1 func2 female 0.0363 -0.4789 math 0.0770 -0.1076 socst 0.0573 0.0989 _cons -7.0729 0.7421 Standardized discriminant function coefficients func1 func2 female 0.0182 -0.2403 math 0.6363 -0.8891 socst 0.5494 0.9485 Discriminant structure matrix func1 func2 female 0.0210 -0.1542 math 0.8657 -0.4820 socst 0.8169 0.5634 Group means on discriminant functions func1 func2 prog=1 -0.3057 0.1092 prog=2 0.5606 -0.0191 prog=3 -0.9022 -0.0582Example 2
daoneway female read write math science socst, by(prog) One-way Disciminant Function Analysis Observations = 200 Variables = 6 Groups = 3 Pct of Cum Canonical After Wilks' Fcn Eigenvalue Variance Pct Corr Fcn Lambda Chi-square df P-value | 0 0.66607 79.038 12 0.0000 1 0.4559 93.59 93.59 0.5596 | 1 0.96972 5.980 5 0.3082 2 0.0312 6.41 100.00 0.1740 | Unstandardized discriminant function coefficients func1 func2 female -0.2219 -0.0360 read 0.0247 -0.0668 write 0.0358 0.0183 math 0.0757 -0.0662 science -0.0503 0.1225 socst 0.0429 0.0437 _cons -6.6863 -2.6198 Standardized discriminant function coefficients func1 func2 female -0.1113 -0.0180 read 0.2307 -0.6240 write 0.3093 0.1584 math 0.6262 -0.5473 science -0.4812 1.1717 socst 0.4118 0.4197 Discriminant structure matrix func1 func2 female 0.0205 -0.0480 read 0.6882 0.0704 write 0.6811 0.3800 math 0.8073 0.0610 science 0.3814 0.7202 socst 0.7549 0.4109 Group means on discriminant functions func1 func2 prog=1 -0.4162 0.3067 prog=2 0.6158 -0.0430 prog=3 -0.9187 -0.1856 SPSS Output Summary of Canonical Discriminant Functions Eigenvalues Function | Eigenvalue | % of Variance | Cumulative % | Canonical Correlation | 1 | .456(a) | 93.6 | 93.6 | .560 | 2 | .031(a) | 6.4 | 100.0 | .174 | a. First 2 canonical discriminant functions were used in the analysis. Wilks' Lambda Test of Function(s) | Wilks' Lambda | Chi-square | df | Sig. | 1 through 2 | .666 | 79.038 | 12 | .000 | 2 | .970 | 5.980 | 5 | .308 | Standardized Canonical Discriminant Function Coefficients | Function | | 1 | 2 | FEMALE | -.111 | -.018 | reading score | .231 | -.624 | writing score | .309 | .158 | math score | .626 | -.547 | science score | -.481 | 1.172 | social studies score | .412 | .420 | Structure Matrix | Function | | 1 | 2 | math score | .807(*) | .061 | social studies score | .755(*) | .411 | reading score | .688(*) | .070 | writing score | .681(*) | .380 | FEMALE | .021 | -.048(*) | Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function. * Largest absolute correlation between each variable and any discriminant function Functions at Group Centroids | Function | type of program | 1 | 2 | general | -.416 | .307 | academic | .616 | -.004305 | vocation | -.919 | -.186 | Unstandardized canonical discriminant functions evaluated at group means Classification Statistics Classification Processing Summary Processed | 200 | Excluded | Missing or out-of-range group codes | 0 | | At least one missing discriminating variable | 0 | Used in Output | 200 | Prior Probabilities for Groups | Prior | Cases Used in Analysis | type of program | | Unweighted | Weighted | general | .333 | 45 | 45.000 | academic | .333 | 105 | 105.000 | vocation | .333 | 50 | 50.000 | Total |1.000 | 200 | 200.000 | Classification Function Coefficients | type of program | | general | academic | vocation | FEMALE | 1.522 | 1.306 | 1.652 | reading score | .007201 | .121 | 9.248E-02 | writing score | .237 | .267 | .210 | math score | .341 | .442 | .336 | science score | .166 | .007074 | .130 | social studies score | .210 | .239 | .166 | (Constant) | -27.548 | -33.589 | -23.204 | Fisher's linear discriminant functions Territorial Map Canonical Discriminant Function 2 -3.0 -2.0 -1.0 .0 1.0 2.0 3.0 _____________________________________________________________ 3.0 | 12 | | 12 | | 12 | |1 12 | |311 12 | | 3311 12 | 2.0 | 331 12 | | 311 12 | | 331 12 | | 311 12 | | 3311 12 | | 331 12 | 1.0 | 311 12 | | 3311 12 | | 331 12 | | 311 12 | | 331 * 12 | | 311 12 | .0 | 3311 ™12 * | | * 331 12 | | 311 12 | | 3312 | | 32 | | 32 | -1.0 | 32 | | 32 | | 32 | | 32 | | 32 | | 32 | -2.0 | 32 | | 32 | | 32 | | 32 | | 32 | | 32 | -3.0 | 32 | _____________________________________________________________ -3.0 -2.0 -1.0 .0 1.0 2.0 3.0 Canonical Discriminant Function 1 Symbols used in territorial map Symbol Group Label ------ ----- -------------------- 1 1 general 2 2 academic 3 3 vocation * Indicates a group centroid Classification Results(a) | Predicted Group Membership | Total | | type of program | general | academic | vocation | | Original | Count | general | 18 | 10 | 17 | 45 | | | academic | 21 | 69 | 15 | 105 | | | vocation | 13 | 9 | 28 | 50 | | ----- | --------------- | -------------------- | -------- | -------- | ----- | | % | general | 40.0 | 22.2 | 37.8 | 100.0 | | | academic | 20.0 | 65.7 | 14.3 | 100.0 | | | vocation | 26.0 | 18.0 | 56.0 | 100.0 | 57.5% of original grouped cases correctly classified.Example 3
daoneway read write math science socst, by(race) One-way Disciminant Function Analysis Observations = 200 Variables = 5 Groups = 4 Pct of Cum Canonical After Wilks' Fcn Eigenvalue Variance Pct Corr Fcn Lambda Chi-square df P-value | 0 0.75404 54.908 15 0.0000 1 0.2260 73.62 73.62 0.4294 | 1 0.92447 15.275 8 0.0540 2 0.0712 23.18 96.80 0.2578 | 2 0.99028 1.901 3 0.5933 3 0.0098 3.20 100.00 0.0986 | Unstandardized discriminant function coefficients func1 func2 func3 read 0.0018 0.0383 0.0437 write 0.0295 -0.1041 0.0664 math 0.0189 -0.0866 -0.0752 science 0.0887 0.0805 -0.0530 socst -0.0193 0.0551 0.0604 _cons -6.2311 0.9914 -2.2475 Standardized discriminant function coefficients func1 func2 func3 read 0.0178 0.3785 0.4326 write 0.2666 -0.9393 0.5995 math 0.1687 -0.7729 -0.6715 science 0.8025 0.7289 -0.4798 socst -0.2048 0.5832 0.6398 Discriminant structure matrix func1 func2 func3 read 0.6241 0.1339 0.4481 write 0.6745 -0.4389 0.5860 math 0.6975 -0.3338 0.0034 science 0.9639 0.2541 -0.0284 socst 0.4011 0.1609 0.6931 Group means on discriminant functions func1 func2 func3 race=1 -0.7802 0.1218 -0.2051 race=2 0.2333 -1.0659 -0.0790 race=3 -1.0015 -0.1127 0.2032 race=4 0.2496 0.0762 0.0119
Categorical Data Analysis Course
Phil Ender