Applied Categorical & Nonnormal Data Analysis

OLS versus Logistic

After laying all of the theoretical foundation for logistic regression, it must be admitted that for many models there is very little difference between the OLS results and the logistic regression results. Here is a small example in which there doesn't seem to be much difference between OLS and logit.

First Example

use http://www.gseis.ucla.edu/courses/data/honors
 
regress honors lang female

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  2,   197) =   35.85
       Model |  10.3957196     2  5.19785982           Prob > F      =  0.0000
    Residual |  28.5592804   197  .144970966           R-squared     =  0.2669
-------------+------------------------------           Adj R-squared =  0.2594
       Total |      38.955   199  .195753769           Root MSE      =  .38075

------------------------------------------------------------------------------
      honors |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        lang |   .0214989   .0026362     8.16   0.000     .0163001    .0266977
      female |   .1467375    .054142     2.71   0.007     .0399652    .2535098
       _cons |  -.9378584   .1448623    -6.47   0.000    -1.223538   -.6521786
------------------------------------------------------------------------------
 
predict p1
 
logit honors lang female, nolog
 
Logit estimates                                   Number of obs   =        200
                                                  LR chi2(2)      =      60.40
                                                  Prob > chi2     =     0.0000
Log likelihood =  -85.44372                       Pseudo R2       =     0.2612

------------------------------------------------------------------------------
      honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        lang |   .1443657   .0233337     6.19   0.000     .0986325    .1900989
      female |   1.120926   .4081028     2.75   0.006      .321059    1.920793
       _cons |  -9.603365   1.426404    -6.73   0.000    -12.39906   -6.807665
------------------------------------------------------------------------------
 
predict p2
 
summarize p1 p2

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
          p1 |     200        .265   .2285603  -.2713931   .8427939
          p2 |     200        .265   .2408362   .0058933   .9233922
 
corr p1 p2
(obs=200)

             |       p1       p2
-------------+------------------
          p1 |   1.0000
          p2 |   0.9490   1.0000
 
list p1 p2 in 1/20
 
            p1         p2
  1.  .0473354   .0545689
  2. -.1423999   .0139005
  3. -.1638987   .0120544
  4.   .283823   .2202598
  5.  .5633085   .6485339
  6.  .0725889   .0563498
  7. -.0386602   .0313819
  8.  .1763286   .1206826
  9.  .0473354   .0545689
 10. -.1891523   .0116561
 11.  .3268208   .2738011
 12.  .1800833   .1094523
 13.  .0295912   .0428232
 14.  -.060159   .0272784
 15.  .1548298    .106182
 16.  .2193264   .1548247
 17.    .24458   .1593262
 18.  .2193264   .1548247
 19.  .4343152   .4369393
 20.  .1548298    .106182
 
/* classification tables */ 
 
generate c1 = p1>.5
 
generate c2 = p2>.5
 
tabulate c1 c2
 
           |          c2
        c1 |         0          1 |     Total
-----------+----------------------+----------
         0 |       164          4 |       168 
         1 |         0         32 |        32 
-----------+----------------------+----------
     Total |       164         36 |       200

Note the out-of range predictions, negative values, in the example above.

Next, let's look at a counter example in which OLS and logistic produce different results.

Counter Example

use http://www.gseis.ucla.edu/courses/data/apilog, clear
 
regress hiqual enroll meals avg_ed
 
      Source |       SS       df       MS              Number of obs =    1149
-------------+------------------------------           F(  3,  1145) =  522.92
       Model |  145.625156     3  48.5417186           Prob > F      =  0.0000
    Residual |  106.287812  1145  .092827783           R-squared     =  0.5781
-------------+------------------------------           Adj R-squared =  0.5770
       Total |  251.912968  1148  .219436383           Root MSE      =  .30468

------------------------------------------------------------------------------
      hiqual |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      enroll |   .0000233   .0000521     0.45   0.654    -.0000789    .0001255
       meals |  -.0077637   .0005286   -14.69   0.000    -.0088009   -.0067266
      avg_ed |   .1697195   .0210965     8.04   0.000     .1283274    .2111116
       _cons |   .2513082    .083063     3.03   0.003     .0883353     .414281
------------------------------------------------------------------------------
 
predict p1
(51 missing values generated)
 
logit hiqual enroll meals avg_ed, nolog
 
Logit estimates                                   Number of obs   =       1149
                                                  LR chi2(3)      =     917.65
                                                  Prob > chi2     =     0.0000
Log likelihood = -265.40191                       Pseudo R2       =     0.6335

------------------------------------------------------------------------------
      hiqual |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      enroll |  -.0019593    .000735    -2.67   0.008    -.0033999   -.0005187
       meals |  -.0785112   .0076189   -10.30   0.000     -.093444   -.0635784
      avg_ed |   2.148565    .299792     7.17   0.000     1.560984    2.736147
       _cons |  -3.302163   1.030206    -3.21   0.001     -5.32133   -1.282996
------------------------------------------------------------------------------
 
predict p2
(51 missing values generated)
 
summarize p1 p2

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
          p1 |    1149    .3246301   .3561617  -.3461998   1.101522
          p2 |    1149    .3246301   .3848081    .000036   .9986064
 
corr p1 p2
(obs=1149)

             |       p1       p2
-------------+------------------
          p1 |   1.0000
          p2 |   0.9256   1.0000
 
list p1 p2 in 1/20
 
            p1         p2
  1.  .0686489   .0087564
  2. -.2424376   .0002072
  3.  .4535227    .348036
  4.  .5269313   .4345559
  5.  .9753819   .9948776
  6. -.0621362   .0028704
  7. -.0452434   .0019718
  8.  .4313931   .2918692
  9.  .2042937   .0150685
 10.  .7003989   .8609163
 11.         .          .
 12. -.0974959   .0009283
 13.  .5642942   .6845942
 14.  .1766382   .0204579
 15.  .4774918   .3685073
 16. -.1551605   .0002245
 17.  .8235874   .9654739
 18.  .0041572   .0047186
 19. -.0109633   .0019114
 20. -.1417836   .0005978
  
/* classification tables */  
  
generate c1 = p1>.5 if p1~=.
 
generate c2 = p2>.5 if p1~=.
 
tabulate c1 c2
 
           |          c2
        c1 |         0          1 |     Total
-----------+----------------------+----------
         0 |       750          0 |       750 
         1 |        30        369 |       399 
-----------+----------------------+----------
     Total |       780        369 |      1149

Categorical Data Analysis Course

Phil Ender