Stata is an easy to use statistical software package. Stata is command driven and is in some respects similar to SAS, although much easier to learn. This Computer Module is designed to introduce students to Stata and to allow them to begin to statistically analyze data.
Students have several choices when it comes to using Stata, they can use the interactive desktop version of Stata in one of the campus computer labs or they can purchase their own copy of Stata for around $100.
View Useful Stata Commands.
View ATS Stata Class Notes.
input id x 13 34 17 21 14 25 9 33 18 40 12 33 4 44 11 41 17 21 end sort id list +---------+ | id x | |---------| 1. | 13 34 | 2. | 17 21 | 3. | 14 25 | 4. | 9 33 | 5. | 18 40 | |---------| 6. | 12 33 | 7. | 4 44 | 8. | 11 41 | 9. | 17 21 | +---------+
use http://www.philender.com/courses/data/hsb2, clear describe Contains data from http://www.gseis.ucla.edu/courses/data/hsb2.dta obs: 200 highschool and beyond (200 cases) vars: 11 21 Jun 2000 08:54 size: 9,600 (98.9% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- id float %9.0g female float %9.0g fl race float %12.0g rl ses float %9.0g sl schtyp float %9.0g scl type of school prog float %9.0g sel type of program read float %9.0g reading score write float %9.0g writing score math float %9.0g math score science float %9.0g science score socst float %9.0g social studies score ------------------------------------------------------------------------------- list list Observation 1 id 70 female male race white ses low schtyp public prog general read 57 write 52 math 41 science 47 socst 57 Observation 2 id 121 female female race white ses middle schtyp public prog vocation read 68 write 59 math 53 science 63 socst 61 Observation 3 id 86 female male race white ses high schtyp public prog general read 44 write 33 math 54 science 58 socst 31 Observation 4 id 141 female male race white ses high schtyp public prog vocation read 63 write 44 math 47 science 53 socst 56 Observation 5 id 172 female male race white ses middle schtyp public prog academic read 47 write 52 math 57 science 53 socst 61 ... list id female race ses prog read in 1/20 +--------------------------------------------------------+ | id female race ses prog read | |--------------------------------------------------------| 1. | 70 male white low general 57 | 2. | 121 female white middle vocation 68 | 3. | 86 male white high general 44 | 4. | 141 male white high vocation 63 | 5. | 172 male white middle academic 47 | |--------------------------------------------------------| 6. | 113 male white middle academic 44 | 7. | 50 male african-amer middle general 50 | 8. | 11 male hispanic middle academic 34 | 9. | 84 male white middle general 63 | 10. | 48 male african-amer middle academic 57 | |--------------------------------------------------------| 11. | 75 male white middle vocation 60 | 12. | 60 male white middle academic 57 | 13. | 95 male white high academic 73 | 14. | 104 male white high academic 54 | 15. | 38 male african-amer low academic 45 | |--------------------------------------------------------| 16. | 115 male white low general 42 | 17. | 76 male white high academic 47 | 18. | 195 male white middle general 57 | 19. | 114 male white high academic 68 | 20. | 85 male white middle general 55 | +--------------------------------------------------------+ list id female race ses prog read in 1/20, clean id female race ses prog read 1. 70 male white low general 57 2. 121 female white middle vocation 68 3. 86 male white high general 44 4. 141 male white high vocation 63 5. 172 male white middle academic 47 6. 113 male white middle academic 44 7. 50 male african-amer middle general 50 8. 11 male hispanic middle academic 34 9. 84 male white middle general 63 10. 48 male african-amer middle academic 57 11. 75 male white middle vocation 60 12. 60 male white middle academic 57 13. 95 male white high academic 73 14. 104 male white high academic 54 15. 38 male african-amer low academic 45 16. 115 male white low general 42 17. 76 male white high academic 47 18. 195 male white middle general 57 19. 114 male white high academic 68 20. 85 male white middle general 55 list id female race ses prog read in 1/20, clean nolabel id female race ses prog read 1. 70 0 4 1 1 57 2. 121 1 4 2 3 68 3. 86 0 4 3 1 44 4. 141 0 4 3 3 63 5. 172 0 4 2 2 47 6. 113 0 4 2 2 44 7. 50 0 3 2 1 50 8. 11 0 1 2 2 34 9. 84 0 4 2 1 63 10. 48 0 3 2 2 57 11. 75 0 4 2 3 60 12. 60 0 4 2 2 57 13. 95 0 4 3 2 73 14. 104 0 4 3 2 54 15. 38 0 3 1 2 45 16. 115 0 4 1 1 42 17. 76 0 4 3 2 47 18. 195 0 4 2 1 57 19. 114 0 4 3 2 68 20. 85 0 4 2 1 55 summarize Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- id | 200 100.5 57.87918 1 200 female | 200 .545 .4992205 0 1 race | 200 3.43 1.039472 1 4 ses | 200 2.055 .7242914 1 3 schtyp | 200 1.16 .367526 1 2 prog | 200 2.025 .6904772 1 3 read | 200 52.23 10.25294 28 76 write | 200 52.775 9.478586 31 67 math | 200 52.645 9.368448 33 75 science | 200 51.85 9.900891 26 74 socst | 200 52.405 10.73579 26 71 summarize write Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 200 52.775 9.478586 31 67 histogram write histogram write, start(30) width(5) normal tabulate write writing | score | Freq. Percent Cum. ------------+----------------------------------- 31 | 4 2.00 2.00 33 | 4 2.00 4.00 35 | 2 1.00 5.00 36 | 2 1.00 6.00 37 | 3 1.50 7.50 38 | 1 0.50 8.00 39 | 5 2.50 10.50 40 | 3 1.50 12.00 41 | 10 5.00 17.00 42 | 2 1.00 18.00 43 | 1 0.50 18.50 44 | 12 6.00 24.50 45 | 1 0.50 25.00 46 | 9 4.50 29.50 47 | 2 1.00 30.50 49 | 11 5.50 36.00 50 | 2 1.00 37.00 52 | 15 7.50 44.50 53 | 1 0.50 45.00 54 | 17 8.50 53.50 55 | 3 1.50 55.00 57 | 12 6.00 61.00 59 | 25 12.50 73.50 60 | 4 2.00 75.50 61 | 4 2.00 77.50 62 | 18 9.00 86.50 63 | 4 2.00 88.50 65 | 16 8.00 96.50 67 | 7 3.50 100.00 ------------+----------------------------------- Total | 200 100.00 sort prog by prog: summarize write _______________________________________________________________________________ -> prog = general Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 45 51.33333 9.397775 31 67 _______________________________________________________________________________ -> prog = academic Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 105 56.25714 7.943343 33 67 _______________________________________________________________________________ -> prog = vocation Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 50 46.76 9.318754 31 67 summarize write, detail writing score ------------------------------------------------------------- Percentiles Smallest 1% 31 31 5% 35.5 31 10% 39 31 Obs 200 25% 45.5 31 Sum of Wgt. 200 50% 54 Mean 52.775 Largest Std. Dev. 9.478586 75% 60 67 90% 65 67 Variance 89.84359 95% 65 67 Skewness -.4784158 99% 67 67 Kurtosis 2.238527 stem write Stem-and-leaf plot for write (writing score) 3* | 1111 3t | 3333 3f | 55 3s | 66777 3. | 899999 4* | 0001111111111 4t | 223 4f | 4444444444445 4s | 66666666677 4. | 99999999999 5* | 00 5t | 2222222222222223 5f | 44444444444444444555 5s | 777777777777 5. | 9999999999999999999999999 6* | 00001111 6t | 2222222222222222223333 6f | 5555555555555555 6s | 7777777 graph box write graph box write, over(prog)
use http://www.philender.com/courses/data/hsb2, clear tabulate prog type of | program | Freq. Percent Cum. ------------+----------------------------------- general | 45 22.50 22.50 academic | 105 52.50 75.00 vocation | 50 25.00 100.00 ------------+----------------------------------- Total | 200 100.00 summarize write if prog==1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- write | 45 51.33333 9.397775 31 67
Example 2
You will have to clear and reload the data after this example.
keep if prog==1 (155 observations deleted) tabulate prog type of | program | Freq. Percent Cum. ------------+----------------------------------- general | 45 100.00 100.00 ------------+----------------------------------- Total | 45 100.00 summarize write Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- write | 45 51.33333 9.397775 31 67
use http://www.philender.com/courses/data/hsb2 scatter write read scatter write read, jitter(2) scatter write read, jitter(2) msym(Oh) twoway (scatter write read, jitter(2) msym(Oh))(lfit write read) correlate write read math female (obs=200) | write read math female -------------+------------------------------------ write | 1.0000 read | 0.5968 1.0000 math | 0.6174 0.6623 1.0000 female | 0.2565 -0.0531 -0.0293 1.0000 sort female by female: correlate write read math ------------------------------------------------------------------------------- -> female = male (obs=91) | write read math -------------+--------------------------- write | 1.0000 read | 0.6485 1.0000 math | 0.6268 0.6085 1.0000 -------------------------------------------------------------------------------- -> female = female (obs=109) | write read math -------------+--------------------------- write | 1.0000 read | 0.6209 1.0000 math | 0.6749 0.7111 1.0000 regress write read Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 109.52 Model | 6367.42127 1 6367.42127 Prob > F = 0.0000 Residual | 11511.4537 198 58.1386552 R-squared = 0.3561 -------------+------------------------------ Adj R-squared = 0.3529 Total | 17878.875 199 89.843593 Root MSE = 7.6249 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .5517051 .0527178 10.47 0.000 .4477445 .6556656 _cons | 23.95944 2.805744 8.54 0.000 18.42647 29.49242 ------------------------------------------------------------------------------ regress write read female Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 77.21 Model | 7856.32118 2 3928.16059 Prob > F = 0.0000 Residual | 10022.5538 197 50.8759077 R-squared = 0.4394 -------------+------------------------------ Adj R-squared = 0.4337 Total | 17878.875 199 89.843593 Root MSE = 7.1327 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .5658869 .0493849 11.46 0.000 .468496 .6632778 female | 5.486894 1.014261 5.41 0.000 3.48669 7.487098 _cons | 20.22837 2.713756 7.45 0.000 14.87663 25.58011 ------------------------------------------------------------------------------
use http://www.philender.com/courses/data/hsb2, clear ttest write, by(female) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- male | 91 50.12088 1.080274 10.30516 47.97473 52.26703 female | 109 54.99083 .7790686 8.133715 53.44658 56.53507 ---------+-------------------------------------------------------------------- combined | 200 52.775 .6702372 9.478586 51.45332 54.09668 ---------+-------------------------------------------------------------------- diff | -4.869947 1.304191 -7.441835 -2.298059 ------------------------------------------------------------------------------ diff = mean(male) - mean(female) t = -3.7341 Ho: diff = 0 degrees of freedom = 198 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0001 Pr(|T| > |t|) = 0.0002 Pr(T > t) = 0.9999
Example 2: Dependent t-test
ttest write = read Paired t test ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- write | 200 52.775 .6702372 9.478586 51.45332 54.09668 read | 200 52.23 .7249921 10.25294 50.80035 53.65965 ---------+-------------------------------------------------------------------- diff | 200 .545 .6283822 8.886666 -.6941424 1.784142 ------------------------------------------------------------------------------ mean(diff) = mean(write - read) t = 0.8673 Ho: mean(diff) = 0 degrees of freedom = 199 Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0 Pr(T < t) = 0.8066 Pr(|T| > |t|) = 0.3868 Pr(T > t) = 0.1934
tabulate prog female, all type of | female program | male female | Total -----------+----------------------+---------- general | 21 24 | 45 academic | 47 58 | 105 vocation | 23 27 | 50 -----------+----------------------+---------- Total | 91 109 | 200 Pearson chi2(2) = 0.0528 Pr = 0.974 likelihood-ratio chi2(2) = 0.0528 Pr = 0.974 Cramér's V = 0.0162 gamma = 0.0066 ASE = 0.122 Kendall's tau-b = 0.0036 ASE = 0.067
use http://www.philender.com/courses/data/missing, clear describe Contains data from missing.dta obs: 15 vars: 2 14 Jul 2006 17:56 size: 180 (99.9% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- mt float %9.0g final float %9.0g ------------------------------------------------------------------------------- Sorted by: list, clean mt final 1. 43 48 2. . 41 3. 41 44 4. 40 44 5. 38 43 6. 46 42 7. 41 40 8. 48 . 9. 42 45 10. 41 40 11. 43 46 12. . 45 13. 44 48 14. 39 42 15. 40 45 generate total = mt + final (3 missing values generated) list, clean mt final total 1. 43 48 91 2. . 41 . 3. 41 44 85 4. 40 44 84 5. 38 43 81 6. 46 42 88 7. 41 40 81 8. 48 . . 9. 42 45 87 10. 41 40 81 11. 43 46 89 12. . 45 . 13. 44 48 92 14. 39 42 81 15. 40 45 85 summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- mt | 13 42 2.798809 38 48 final | 14 43.78571 2.607049 40 48 total | 12 85.41667 4.010403 81 92 correlate (obs=12) | mt final total -------------+--------------------------- mt | 1.0000 final | 0.3263 1.0000 total | 0.7755 0.8498 1.0000 pwcorr, obs | mt final total -------------+--------------------------- mt | 1.0000 | 13 | final | 0.3263 1.0000 | 12 14 | total | 0.7755 0.8498 1.0000 | 12 12 12
log using mylog1.log [ a bunch of Stata commands ] log close type mylog1.log
Variable Name Variable Label Value Labels CASENUM Case number Possible range= 100 to 3000 MATHTYPE Level of math class 1-N/A 2-Low 3-Average 4-High 5-Algebra 6-Honors Algebra LUNCH2 School lunch 1-Yes 2-No TOTALC Total accuracy score Possible range= 0 to 25
use http://www.philender.com/courses/data/clean, clear describe Contains data from http://www.philender.com/courses/data/clean.dta obs: 199 vars: 4 14 Jul 2006 17:54 size: 3,980 (99.9% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- id float %9.0g mathtype float %9.0g lunch2 float %9.0g totalc float %9.0g ------------------------------------------------------------------------------- Sorted by: list, clean id mathtype lunch2 totalc 1. 884 2 1 7 2. 885 2 1 11 3. 886 2 1 13 4. 887 2 1 14 5. 888 2 1 6 ... 195. 756 6 1 18 196. 757 6 1 14 197. 758 6 1 17 198. 761 6 1 15 199. 299 6 2 23 summarize id totalc, detail id ------------------------------------------------------------- Percentiles Smallest 1% 106 102 5% 148 106 10% 176 108 Obs 199 25% 472 141 Sum of Wgt. 199 50% 755 Mean 819.4774 Largest Std. Dev. 519.5533 75% 1068 2126 90% 1158 2133 Variance 269935.6 95% 2037 3121 Skewness 1.481881 99% 3121 3123 Kurtosis 6.65515 totalc ------------------------------------------------------------- Percentiles Smallest 1% 1 0 5% 3 1 10% 5 1 Obs 199 25% 9 1 Sum of Wgt. 199 50% 14 Mean 14.1407 Largest Std. Dev. 7.793438 75% 19 24 90% 22 25 Variance 60.73768 95% 24 33 Skewness 2.549864 99% 33 77 Kurtosis 22.51892 tab1 mathtype lunch2 totalc -> tabulation of mathtype mathtype | Freq. Percent Cum. ------------+----------------------------------- 2 | 44 22.11 22.11 3 | 44 22.11 44.22 4 | 44 22.11 66.33 5 | 43 21.61 87.94 6 | 21 10.55 98.49 8 | 1 0.50 98.99 9 | 2 1.01 100.00 ------------+----------------------------------- Total | 199 100.00 -> tabulation of lunch2 lunch2 | Freq. Percent Cum. ------------+----------------------------------- 0 | 1 0.50 0.50 1 | 110 55.28 55.78 2 | 87 43.72 99.50 3 | 1 0.50 100.00 ------------+----------------------------------- Total | 199 100.00 -> tabulation of totalc totalc | Freq. Percent Cum. ------------+----------------------------------- 0 | 1 0.50 0.50 1 | 3 1.51 2.01 2 | 2 1.01 3.02 3 | 5 2.51 5.53 4 | 3 1.51 7.04 5 | 8 4.02 11.06 6 | 10 5.03 16.08 7 | 8 4.02 20.10 8 | 7 3.52 23.62 9 | 7 3.52 27.14 10 | 11 5.53 32.66 11 | 13 6.53 39.20 12 | 6 3.02 42.21 13 | 9 4.52 46.73 14 | 8 4.02 50.75 15 | 10 5.03 55.78 16 | 13 6.53 62.31 17 | 10 5.03 67.34 18 | 10 5.03 72.36 19 | 12 6.03 78.39 20 | 9 4.52 82.91 21 | 5 2.51 85.43 22 | 10 5.03 90.45 23 | 9 4.52 94.97 24 | 7 3.52 98.49 25 | 1 0.50 98.99 33 | 1 0.50 99.50 77 | 1 0.50 100.00 ------------+----------------------------------- Total | 199 100.00 sort id list if id == id[_n+1], clean id mathtype lunch2 totalc 155. 1101 4 1 16 list if id == 1101, clean id mathtype lunch2 totalc 155. 1101 4 1 16 156. 1101 4 2 12 list if id>3000 | id<1, clean id mathtype lunch2 totalc 198. 3121 5 2 21 199. 3123 5 2 22
Phil Ender, 25Sep00