Linear Statistical Models: Regression

Multiple Linear Regression Stata Session


Stata Multiple Regression Session

use http://www.gseis.ucla.edu/courses/data/hsb2, clear

describe

Contains data from http://www.philender.com/courses/data/hsbdemo, clear
  obs:           200                          highschool and beyond (200
                                                cases)
 vars:            11                          21 Jun 2000 08:54
 size:         9,600 (98.0% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
id              float  %9.0g                  
female          float  %9.0g       fl         
race            float  %12.0g      rl         
ses             float  %9.0g       sl         
schtyp          float  %9.0g       scl        type of school
prog            float  %9.0g       sel        type of program
read            float  %9.0g                  reading score
write           float  %9.0g                  writing score
math            float  %9.0g                  math score
science         float  %9.0g                  science score
socst           float  %9.0g                  social studies score
-------------------------------------------------------------------------------
Sorted by:  

summarize write read math female

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
       write |     200      52.775   9.478586         31         67
        read |     200       52.23   10.25294         28         76
        math |     200      52.645   9.368448         33         75
      female |     200        .545   .4992205          0          1

summarize write read math female

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       write |       200      52.775    9.478586         31         67
        read |       200       52.23    10.25294         28         76
        math |       200      52.645    9.368448         33         75
      female |       200        .545    .4992205          0          1

stem write


Stem-and-leaf plot for write (writing score)

  3* | 1111
  3t | 3333
  3f | 55
  3s | 66777
  3. | 899999
  4* | 0001111111111
  4t | 223
  4f | 4444444444445
  4s | 66666666677
  4. | 99999999999
  5* | 00
  5t | 2222222222222223
  5f | 44444444444444444555
  5s | 777777777777
  5. | 9999999999999999999999999
  6* | 00001111
  6t | 2222222222222222223333
  6f | 5555555555555555
  6s | 7777777

stem read, lines(2)

Stem-and-leaf plot for read (reading score)

  2. | 8
  3* | 1444444
  3. | 56667799999999
  4* | 112222222222222334444444444444
  4. | 5567777777777777777777777777778
  5* | 0000000000000000002222222222222234
  5. | 555555555555577777777777777
  6* | 00000000013333333333333333
  6. | 555555555688888888888
  7* | 1133333
  7. | 66

stem math, lines(2)

Stem-and-leaf plot for math (math score)

  3* | 3
  3. | 5788999999
  4* | 00000000001111111222222233333334444
  4. | 5555555566666666777888889999999999
  5* | 00000001111111122222233333334444444444
  5. | 555556666666777777777777788888899
  6* | 00000111111122223333344444
  6. | 555666677899
  7* | 011112223
  7. | 55

kdbox write, normal mean                  /* findit kdbox */
kdbox read, normal mean
kdbox math, normal mean
[graphs omitted]

/* shortcut for the 3 kdensity graphs */
foreach var of varlist write read math { 
  kdbox `var', normal mean 
  more
}
[graphs omitted]

foreach var of varlist write read math { 
  pnorm `var'
  more
  qnorm `var'
  more
}
[graphs omitted]

graph matrix read math female write, half
[graph omitted]

correlate write read math female
(obs=200)

             |    write     read     math   female
-------------+------------------------------------
       write |   1.0000
        read |   0.5968   1.0000
        math |   0.6174   0.6623   1.0000
      female |   0.2565  -0.0531  -0.0293   1.0000

pcorr write read math female 
(obs=200)

Partial correlation of write with

    Variable |    Corr.     Sig.
-------------+------------------
        read |   0.3573    0.000
        math |   0.3931    0.000
      female |   0.3840    0.000
 
regress write read math female

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   72.52
       Model |  9405.34864     3  3135.11621           Prob > F      =  0.0000
    Residual |  8473.52636   196  43.2322773           R-squared     =  0.5261
-------------+------------------------------           Adj R-squared =  0.5188
       Total |   17878.875   199   89.843593           Root MSE      =  6.5751

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .3252389   .0607348     5.36   0.000     .2054613    .4450166
        math |   .3974826   .0664037     5.99   0.000      .266525    .5284401
      female |    5.44337   .9349987     5.82   0.000      3.59942    7.287319
       _cons |   11.89566   2.862845     4.16   0.000     6.249728     17.5416
------------------------------------------------------------------------------

predict e, resid
predict rstu, rstu
predict p

graph twoway scatter rstu p, yline(-2.5 2.5) ylabel(-3(1)3) jitter(2)
rvfplot2, rstu yline(2.5 -2.5) jitter(2)          /* findit rvfplot2 */
[graphs omitted]
  
rvpplot2 read, rstu yline(0 -2.5 2.5) jitter(2)    /* findit rvpplot2  */
rvpplot math, yline(0 -2.5 2.5) jitter(2)
rvpplot female, yline(0 -2.5 2.5)  jitter(2)
[graphs omitted]

graph twoway scatter rstu read, yline(0 -2.5 2.5) ylabel(-3(1)3) jitter(2)
[graph omitted]

avplot read
avplot math
avplot female
[graphs omitted]
  
  
kdensity e, normal
graph twoway scatter write p, jitter(2)
graph twoway (scatter write p, jitter(2)) (lfit write p)
graph twoway scatter rstu id, yline(0) 
indexplot rstu, scatter  /*  findit indexplot  */
[graphs omitted]
  
list id write rstu if abs(rstu)>=2.5

            id      write       rstu 
 31.       126         31  -2.697508  
198.       187         41   -2.72472 

lvr2plot, ylabel xlabel

dfbeta
list id write rstu DFread if abs(DFread)>2/sqrt(e(N))

            id      write       rstu     DFread 
169.       150         41  -1.113306   .1435211  
172.       141         44  -1.092409  -.1484074  
190.       170         62   1.636351  -.1785097  
194.       103         52  -1.564255  -.2235134  
196.        86         33  -2.276461   .2035398  
198.         3         65   2.106786   .2715756  
199.        62         65    2.00872   .2973564  
200.       126         31  -2.697508   .3477473 
 
list id write rstu DFmath if abs(DFmath)>2/sqrt(e(N))

            id      write       rstu     DFmath 
166.        24         62   1.074772   .1493585  
167.       189         59   1.047505   .1459866  
175.        32         67   1.107803   .1665842  
189.        83         62   1.871348   -.197515  
190.       170         62   1.636351   .1939547  
193.       200         54   -1.52912   -.202688  
195.        50         59   2.194752  -.2067871  
196.        86         33  -2.276461  -.1484884  
197.       133         31  -2.026189   .2327446  
198.         3         65   2.106786  -.2397425  
199.        62         65    2.00872  -.2541649  
200.       126         31  -2.697508  -.2931431 
  
list id write rstu DFfemale if abs(DFfemale)>2/sqrt(e(N))

            id      write       rstu   DFfemale 
178.        85         39  -2.073712   .1599997  
184.        18         33  -2.262443   .1778462  
185.        81         43  -1.982814   .1469716  
187.        60         65   2.210802  -.1678427  
188.        16         31  -2.114106   .1683168  
191.       187         41   -2.72472  -.1817335  
195.        50         59   2.194752  -.1720256  
196.        86         33  -2.276461    .186213  
197.       133         31  -2.026189   .1588308  
198.         3         65   2.106786  -.1553218  
199.        62         65    2.00872   -.146781  
200.       126         31  -2.697508   .2246109  
 
/* alternate code */
sort DFread
list id write DFread in 1/10
list id write DFread in -10/l
sort DFmath
list id write DFmath in 1/10
list id write DFmath in -10/l
sort DFfemale
list id write DFfemale in 1/10
list id write DFfemale in -10/l 

  
indexplot leverage, scatter
predict lev, leverage
sort lev
list id write rstu lev in -10/l

            id      write        lev 
191.       103         52   .0376407  
192.       164         36   .0378285  
193.        34         61   .0378289  
194.        33         65    .037994  
195.       161         62    .037994  
196.        19         46   .0387017  
197.       200         54   .0389156  
198.       143         63   .0417192  
199.        61         63   .0425231  
200.       167         49   .0752208  


indexplot cooksd, scatter
predict d, cooksd  
sort d
list id write rstu lev d in -10/l

            id      write       rstu        lev          d 
191.       187         41   -2.72472   .0107086   .0194529  
192.       117         49   1.634066    .028638   .0195144  
193.       200         54   -1.52912   .0389156   .0235088  
194.       103         52  -1.564255   .0376407    .023751  
195.        50         59   2.194752   .0200704   .0241933  
196.        86         33  -2.276461    .018896   .0244312  
197.       133         31  -2.026189   .0242461   .0251059  
198.         3         65   2.106786   .0285327    .032029  
199.        62         65    2.00872   .0335684   .0345036  
200.       126         31  -2.697508   .0280834   .0509327   

vif

    Variable |       VIF       1/VIF  
-------------+----------------------
        read |      1.78    0.560251
        math |      1.78    0.561351
      female |      1.00    0.997122
-------------+----------------------
    Mean VIF |      1.52

collin read math female         /* available from ATS vis the Internet */
 
  Collinearity Diagnostics

                        SQRT                   R-
  Variable      VIF     VIF    Tolerance    Squared
----------------------------------------------------
      read      1.78    1.34    0.5603      0.4397
      math      1.78    1.33    0.5614      0.4386
    female      1.00    1.00    0.9971      0.0029
----------------------------------------------------
  Mean VIF      1.52

                           Cond
        Eigenval          Index
---------------------------------
    1     1.6674          1.0000
    2     0.9953          1.2943
    3     0.3373          2.2234
---------------------------------
 Condition Number         2.2234 
 Eigenvalues & Cond Index computed from deviation sscp (no intercept)
 Det(correlation matrix)    0.5598


linktest


      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  2,   197) =  116.16
       Model |  9674.70222     2  4837.35111           Prob > F      =  0.0000
    Residual |  8204.17278   197  41.6455471           R-squared     =  0.5411
-------------+------------------------------           Adj R-squared =  0.5365
       Total |   17878.875   199   89.843593           Root MSE      =  6.4533

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        _hat |   3.306865   .9095168     3.64   0.000     1.513226    5.100504
      _hatsq |  -.0215942    .008491    -2.54   0.012    -.0383392   -.0048492
       _cons |  -60.58511   24.08436    -2.52   0.013    -108.0814   -13.08885
------------------------------------------------------------------------------

ovtest

Ramsey RESET test using powers of the fitted values of write
       Ho:  model has no omitted variables
                 F(3, 193) =      3.06
                  Prob > F =      0.0295

hettest

Cook-Weisberg test for heteroskedasticity using fitted values of write
     Ho: Constant variance
         chi2(1)      =      6.64
         Prob > chi2  =      0.0100

whitetst  /* Downloaded from Stata (STB 55, sg137) via the Internet */

White's general test statistic :  15.17126  Chi-sq( 8)  P-value =  .0559


Linear Statistical Models Course

Phil Ender, 5feb04; 13jan00