Stata Multiple Regression Session
use http://www.gseis.ucla.edu/courses/data/hsb2, clear describe Contains data from http://www.philender.com/courses/data/hsbdemo, clear obs: 200 highschool and beyond (200 cases) vars: 11 21 Jun 2000 08:54 size: 9,600 (98.0% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- id float %9.0g female float %9.0g fl race float %12.0g rl ses float %9.0g sl schtyp float %9.0g scl type of school prog float %9.0g sel type of program read float %9.0g reading score write float %9.0g writing score math float %9.0g math score science float %9.0g science score socst float %9.0g social studies score ------------------------------------------------------------------------------- Sorted by: summarize write read math female Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 200 52.775 9.478586 31 67 read | 200 52.23 10.25294 28 76 math | 200 52.645 9.368448 33 75 female | 200 .545 .4992205 0 1 summarize write read math female Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- write | 200 52.775 9.478586 31 67 read | 200 52.23 10.25294 28 76 math | 200 52.645 9.368448 33 75 female | 200 .545 .4992205 0 1 stem write Stem-and-leaf plot for write (writing score) 3* | 1111 3t | 3333 3f | 55 3s | 66777 3. | 899999 4* | 0001111111111 4t | 223 4f | 4444444444445 4s | 66666666677 4. | 99999999999 5* | 00 5t | 2222222222222223 5f | 44444444444444444555 5s | 777777777777 5. | 9999999999999999999999999 6* | 00001111 6t | 2222222222222222223333 6f | 5555555555555555 6s | 7777777 stem read, lines(2) Stem-and-leaf plot for read (reading score) 2. | 8 3* | 1444444 3. | 56667799999999 4* | 112222222222222334444444444444 4. | 5567777777777777777777777777778 5* | 0000000000000000002222222222222234 5. | 555555555555577777777777777 6* | 00000000013333333333333333 6. | 555555555688888888888 7* | 1133333 7. | 66 stem math, lines(2) Stem-and-leaf plot for math (math score) 3* | 3 3. | 5788999999 4* | 00000000001111111222222233333334444 4. | 5555555566666666777888889999999999 5* | 00000001111111122222233333334444444444 5. | 555556666666777777777777788888899 6* | 00000111111122223333344444 6. | 555666677899 7* | 011112223 7. | 55 kdbox write, normal mean /* findit kdbox */ kdbox read, normal mean kdbox math, normal mean [graphs omitted] /* shortcut for the 3 kdensity graphs */ foreach var of varlist write read math { kdbox `var', normal mean more } [graphs omitted] foreach var of varlist write read math { pnorm `var' more qnorm `var' more } [graphs omitted] graph matrix read math female write, half [graph omitted] correlate write read math female (obs=200) | write read math female -------------+------------------------------------ write | 1.0000 read | 0.5968 1.0000 math | 0.6174 0.6623 1.0000 female | 0.2565 -0.0531 -0.0293 1.0000 pcorr write read math female (obs=200) Partial correlation of write with Variable | Corr. Sig. -------------+------------------ read | 0.3573 0.000 math | 0.3931 0.000 female | 0.3840 0.000 regress write read math female Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 72.52 Model | 9405.34864 3 3135.11621 Prob > F = 0.0000 Residual | 8473.52636 196 43.2322773 R-squared = 0.5261 -------------+------------------------------ Adj R-squared = 0.5188 Total | 17878.875 199 89.843593 Root MSE = 6.5751 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .3252389 .0607348 5.36 0.000 .2054613 .4450166 math | .3974826 .0664037 5.99 0.000 .266525 .5284401 female | 5.44337 .9349987 5.82 0.000 3.59942 7.287319 _cons | 11.89566 2.862845 4.16 0.000 6.249728 17.5416 ------------------------------------------------------------------------------ predict e, resid predict rstu, rstu predict p graph twoway scatter rstu p, yline(-2.5 2.5) ylabel(-3(1)3) jitter(2) rvfplot2, rstu yline(2.5 -2.5) jitter(2) /* findit rvfplot2 */ [graphs omitted] rvpplot2 read, rstu yline(0 -2.5 2.5) jitter(2) /* findit rvpplot2 */ rvpplot math, yline(0 -2.5 2.5) jitter(2) rvpplot female, yline(0 -2.5 2.5) jitter(2) [graphs omitted] graph twoway scatter rstu read, yline(0 -2.5 2.5) ylabel(-3(1)3) jitter(2) [graph omitted] avplot read avplot math avplot female [graphs omitted] kdensity e, normal graph twoway scatter write p, jitter(2) graph twoway (scatter write p, jitter(2)) (lfit write p) graph twoway scatter rstu id, yline(0) indexplot rstu, scatter /* findit indexplot */ [graphs omitted] list id write rstu if abs(rstu)>=2.5 id write rstu 31. 126 31 -2.697508 198. 187 41 -2.72472 lvr2plot, ylabel xlabel dfbeta list id write rstu DFread if abs(DFread)>2/sqrt(e(N)) id write rstu DFread 169. 150 41 -1.113306 .1435211 172. 141 44 -1.092409 -.1484074 190. 170 62 1.636351 -.1785097 194. 103 52 -1.564255 -.2235134 196. 86 33 -2.276461 .2035398 198. 3 65 2.106786 .2715756 199. 62 65 2.00872 .2973564 200. 126 31 -2.697508 .3477473 list id write rstu DFmath if abs(DFmath)>2/sqrt(e(N)) id write rstu DFmath 166. 24 62 1.074772 .1493585 167. 189 59 1.047505 .1459866 175. 32 67 1.107803 .1665842 189. 83 62 1.871348 -.197515 190. 170 62 1.636351 .1939547 193. 200 54 -1.52912 -.202688 195. 50 59 2.194752 -.2067871 196. 86 33 -2.276461 -.1484884 197. 133 31 -2.026189 .2327446 198. 3 65 2.106786 -.2397425 199. 62 65 2.00872 -.2541649 200. 126 31 -2.697508 -.2931431 list id write rstu DFfemale if abs(DFfemale)>2/sqrt(e(N)) id write rstu DFfemale 178. 85 39 -2.073712 .1599997 184. 18 33 -2.262443 .1778462 185. 81 43 -1.982814 .1469716 187. 60 65 2.210802 -.1678427 188. 16 31 -2.114106 .1683168 191. 187 41 -2.72472 -.1817335 195. 50 59 2.194752 -.1720256 196. 86 33 -2.276461 .186213 197. 133 31 -2.026189 .1588308 198. 3 65 2.106786 -.1553218 199. 62 65 2.00872 -.146781 200. 126 31 -2.697508 .2246109 /* alternate code */ sort DFread list id write DFread in 1/10 list id write DFread in -10/l sort DFmath list id write DFmath in 1/10 list id write DFmath in -10/l sort DFfemale list id write DFfemale in 1/10 list id write DFfemale in -10/l indexplot leverage, scatter predict lev, leverage sort lev list id write rstu lev in -10/l id write lev 191. 103 52 .0376407 192. 164 36 .0378285 193. 34 61 .0378289 194. 33 65 .037994 195. 161 62 .037994 196. 19 46 .0387017 197. 200 54 .0389156 198. 143 63 .0417192 199. 61 63 .0425231 200. 167 49 .0752208 indexplot cooksd, scatter predict d, cooksd sort d list id write rstu lev d in -10/l id write rstu lev d 191. 187 41 -2.72472 .0107086 .0194529 192. 117 49 1.634066 .028638 .0195144 193. 200 54 -1.52912 .0389156 .0235088 194. 103 52 -1.564255 .0376407 .023751 195. 50 59 2.194752 .0200704 .0241933 196. 86 33 -2.276461 .018896 .0244312 197. 133 31 -2.026189 .0242461 .0251059 198. 3 65 2.106786 .0285327 .032029 199. 62 65 2.00872 .0335684 .0345036 200. 126 31 -2.697508 .0280834 .0509327 vif Variable | VIF 1/VIF -------------+---------------------- read | 1.78 0.560251 math | 1.78 0.561351 female | 1.00 0.997122 -------------+---------------------- Mean VIF | 1.52 collin read math female /* available from ATS vis the Internet */ Collinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- read 1.78 1.34 0.5603 0.4397 math 1.78 1.33 0.5614 0.4386 female 1.00 1.00 0.9971 0.0029 ---------------------------------------------------- Mean VIF 1.52 Cond Eigenval Index --------------------------------- 1 1.6674 1.0000 2 0.9953 1.2943 3 0.3373 2.2234 --------------------------------- Condition Number 2.2234 Eigenvalues & Cond Index computed from deviation sscp (no intercept) Det(correlation matrix) 0.5598 linktest Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 116.16 Model | 9674.70222 2 4837.35111 Prob > F = 0.0000 Residual | 8204.17278 197 41.6455471 R-squared = 0.5411 -------------+------------------------------ Adj R-squared = 0.5365 Total | 17878.875 199 89.843593 Root MSE = 6.4533 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _hat | 3.306865 .9095168 3.64 0.000 1.513226 5.100504 _hatsq | -.0215942 .008491 -2.54 0.012 -.0383392 -.0048492 _cons | -60.58511 24.08436 -2.52 0.013 -108.0814 -13.08885 ------------------------------------------------------------------------------ ovtest Ramsey RESET test using powers of the fitted values of write Ho: model has no omitted variables F(3, 193) = 3.06 Prob > F = 0.0295 hettest Cook-Weisberg test for heteroskedasticity using fitted values of write Ho: Constant variance chi2(1) = 6.64 Prob > chi2 = 0.0100 whitetst /* Downloaded from Stata (STB 55, sg137) via the Internet */ White's general test statistic : 15.17126 Chi-sq( 8) P-value = .0559
Linear Statistical Models Course
Phil Ender, 5feb04; 13jan00