Displaying Raw Anova Data
Althought it isn't done for every analysis, there will be times that you want to display the raw data for an anova. The tabdisp command allows you to display the anova data in tabular form.
use http://www.philender.com/courses/data/cr4a, clear sort grp by grp: generate order = _n /* number observations within each group */ tabdisp order grp, cellvar(y) ---------------------------------- | grp order | 1 2 3 4 ----------+----------------------- 1 | 3 4 7 7 2 | 6 5 8 8 3 | 3 4 7 9 4 | 3 3 6 8 5 | 1 2 5 10 6 | 2 3 6 10 7 | 2 4 5 9 8 | 2 3 6 11 ---------------------------------- use http://www.philender.com/courses/data/crf33, clear sort a b by a b: generate order = _n /* number observations within each cell */ tabdisp order b a, cellvar(y) -------------------------------------------------------------------- | a and b | ------- 1 ------ ------- 2 ------ ------- 3 ------ order | 1 2 3 1 2 3 1 2 3 ----------+--------------------------------------------------------- 1 | 37 44 38 34 35 36 21 39 52 2 | 42 36 28 30 27 45 31 50 53 3 | 29 27 48 26 40 26 10 34 64 4 | 33 43 29 39 31 46 20 41 42 5 | 24 25 47 21 22 27 18 36 49 --------------------------------------------------------------------General Descriptive and Exploratory Data Analysis
In general, it is important to look at your data, to try to understand it as best you can. You can use all of the tools in descriptive statistics and exploratory data analysis that were covered in the regression part of the course.
use http://www.philender.com/courses/data/hsb2 describe Contains data from http://www.gseis.ucla.edu/courses/data/hsb2.dta obs: 200 highschool and beyond (200 cases) vars: 11 21 Jun 2000 08:54 size: 9,600 (98.9% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- id float %9.0g female float %9.0g fl race float %12.0g rl ses float %9.0g sl schtyp float %9.0g scl type of school prog float %9.0g sel type of program read float %9.0g reading score write float %9.0g writing score math float %9.0g math score science float %9.0g science score socst float %9.0g social studies score ------------------------------------------------------------------------------- summarize read write math science Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- read | 200 52.23 10.25294 28 76 write | 200 52.775 9.478586 31 67 math | 200 52.645 9.368448 33 75 science | 200 51.85 9.900891 26 74 tab1 female prog -> tabulation of female female | Freq. Percent Cum. ------------+----------------------------------- male | 91 45.50 45.50 female | 109 54.50 100.00 ------------+----------------------------------- Total | 200 100.00 -> tabulation of prog type of | program | Freq. Percent Cum. ------------+----------------------------------- general | 45 22.50 22.50 academic | 105 52.50 75.00 vocation | 50 25.00 100.00 ------------+----------------------------------- Total | 200 100.00 stem write Stem-and-leaf plot for write (writing score) 3* | 1111 3t | 3333 3f | 55 3s | 66777 3. | 899999 4* | 0001111111111 4t | 223 4f | 4444444444445 4s | 66666666677 4. | 99999999999 5* | 00 5t | 2222222222222223 5f | 44444444444444444555 5s | 777777777777 5. | 9999999999999999999999999 6* | 00001111 6t | 2222222222222222223333 6f | 5555555555555555 6s | 7777777 pnorm write qnorm write histogram write, start(30) width(5) normal freq kdensity write, normalLooking at Data by Group
In addition to exploratory data analysis and general descriptive statistics you will want to look at data separately for each group in order to check on how well our data meet the assumptions of analysis of variance, in particular, the assumptions of normality and homogeneity of variance (homoscedasticity).
In analysis of variance, we often want to look at the variability and shape of the distribution within each cell of the design. Say that we wanted to look at the anova model for write with female and prog as our categorical variables. There are two levels of female and three levels of prog resulting in a total of six cells. Here are some commands that we can use to look at the data at the marginal and/or cell level.
graph box write, over(female) graph box write, over(prog) graph box write, over(female) over(prog) tabstat write, stat(n mean sd var) by(prog) Summary for variables: write by categories of: prog (type of program) prog | N mean sd variance ---------+---------------------------------------- general | 45 51.33333 9.397775 88.31818 academic | 105 56.25714 7.943343 63.0967 vocation | 50 46.76 9.318754 86.83918 ---------+---------------------------------------- Total | 200 52.775 9.478586 89.84359 -------------------------------------------------- tabulate female prog, summ(write) Means, Standard Deviations and Frequencies of writing score | type of program female | general academic vocation | Total -----------+---------------------------------+---------- male | 49.142857 54.617021 41.826087 | 50.120879 | 10.364776 8.6566215 8.0037047 | 10.305161 | 21 47 23 | 91 -----------+---------------------------------+---------- female | 53.25 57.586207 50.962963 | 54.990826 | 8.2052475 7.1156721 8.3411929 | 8.1337152 | 24 58 27 | 109 -----------+---------------------------------+---------- Total | 51.333333 56.257143 46.76 | 52.775 | 9.3977754 7.9433433 9.3187544 | 9.478586 | 45 105 50 | 200 /* or you could use */ table female prog, contents(freq mean write sd write) row col -------------------------------------------------- | type of program female | general academic vocation Total ----------+--------------------------------------- male | 21 47 23 91 | 49.14286 54.61702 41.82609 50.12088 | 10.36478 8.656622 8.003705 10.30516 | female | 24 58 27 109 | 53.25 57.58621 50.96296 54.99083 | 8.205248 7.115672 8.341193 8.133716 | Total | 45 105 50 200 | 51.33333 56.25714 46.76 52.775 | 9.397776 7.943343 9.318754 9.478586 -------------------------------------------------- /* or you could use margins */ quietly anova write female##prog margins female#prog, asbalanced Adjusted predictions Number of obs = 200 Expression : Linear prediction, predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female#prog | 0 1 | 49.14286 1.803321 27.25 0.000 45.60841 52.6773 0 2 | 54.61702 1.205407 45.31 0.000 52.25447 56.97958 0 3 | 41.82609 1.723133 24.27 0.000 38.44881 45.20337 1 1 | 53.25 1.686852 31.57 0.000 49.94383 56.55617 1 2 | 57.58621 1.085097 53.07 0.000 55.45946 59.71296 1 3 | 50.96296 1.59038 32.04 0.000 47.84588 54.08005 ------------------------------------------------------------------------------ /* or you could use */ sort female prog by female prog: sum(write) _______________________________________________________________________________ -> female = male, prog = general Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 21 49.14286 10.36478 31 65 _______________________________________________________________________________ -> female = male, prog = academic Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 47 54.61702 8.656621 33 67 _______________________________________________________________________________ -> female = male, prog = vocation Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 23 41.82609 8.003705 31 63 _______________________________________________________________________________ -> female = female, prog = general Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 24 53.25 8.205248 36 67 _______________________________________________________________________________ -> female = female, prog = academic Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 58 57.58621 7.115672 37 67 _______________________________________________________________________________ -> female = female, prog = vocation Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 27 50.96296 8.341193 35 67 hist write, by(female prog) normal start(30) width(5) /* stata 8 */ (bin=7, start=30, width=5) twoway kdensity write, by(prog) twoway (kdensity write if prog==1)(kdensity write if prog==2)(kdensity write if prog==3), legend(off)Graphing Cell Means
In addition to looking at the data to check on assumptions, it is often useful to graph the cell means as a way to help understand interactions. In this example, we will be using the anovaplot command.
anova write prog female prog#female Number of obs = 200 R-squared = 0.2590 Root MSE = 8.26386 Adj R-squared = 0.2399 Source | Partial SS df MS F Prob > F ------------+---------------------------------------------------- Model | 4630.36091 5 926.072182 13.56 0.0000 | prog | 3274.35082 2 1637.17541 23.97 0.0000 female | 1261.85329 1 1261.85329 18.48 0.0000 prog#female | 325.958189 2 162.979094 2.39 0.0946 | Residual | 13248.5141 194 68.2913097 ------------+---------------------------------------------------- Total | 17878.875 199 89.843593 anovaplot prog female, scatter(msym(none)) /* findit anovaplot */
Linear Statistical Models Course
Phil Ender, 17sep10, 18mar03; 27mar02; 23feb01