Education 231C

Applied Categorical & Nonnormal Data Analysis

OLS versus Logistic

After laying all of the theoretical foundation for logistic regression, it must be admitted that for many models there is very little difference between the OLS results and the logistic regression results. Here is a small example in which there doesn't seem to be much difference between OLS and logit.

First Example

regress honors lang female

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  2,   197) =   35.85
       Model |  10.3957196     2  5.19785982           Prob > F      =  0.0000
    Residual |  28.5592804   197  .144970966           R-squared     =  0.2669
-------------+------------------------------           Adj R-squared =  0.2594
       Total |      38.955   199  .195753769           Root MSE      =  .38075

      honors |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        lang |   .0214989   .0026362     8.16   0.000     .0163001    .0266977
      female |   .1467375    .054142     2.71   0.007     .0399652    .2535098
       _cons |  -.9378584   .1448623    -6.47   0.000    -1.223538   -.6521786
predict p1
logit honors lang female, nolog
Logit estimates                                   Number of obs   =        200
                                                  LR chi2(2)      =      60.40
                                                  Prob > chi2     =     0.0000
Log likelihood =  -85.44372                       Pseudo R2       =     0.2612

      honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        lang |   .1443657   .0233337     6.19   0.000     .0986325    .1900989
      female |   1.120926   .4081028     2.75   0.006      .321059    1.920793
       _cons |  -9.603365   1.426404    -6.73   0.000    -12.39906   -6.807665
predict p2
summarize p1 p2

    Variable |     Obs        Mean   Std. Dev.       Min        Max
          p1 |     200        .265   .2285603  -.2713931   .8427939
          p2 |     200        .265   .2408362   .0058933   .9233922
corr p1 p2

             |       p1       p2
          p1 |   1.0000
          p2 |   0.9490   1.0000
list p1 p2 in 1/20
            p1         p2
  1.  .0473354   .0545689
  2. -.1423999   .0139005
  3. -.1638987   .0120544
  4.   .283823   .2202598
  5.  .5633085   .6485339
  6.  .0725889   .0563498
  7. -.0386602   .0313819
  8.  .1763286   .1206826
  9.  .0473354   .0545689
 10. -.1891523   .0116561
 11.  .3268208   .2738011
 12.  .1800833   .1094523
 13.  .0295912   .0428232
 14.  -.060159   .0272784
 15.  .1548298    .106182
 16.  .2193264   .1548247
 17.    .24458   .1593262
 18.  .2193264   .1548247
 19.  .4343152   .4369393
 20.  .1548298    .106182
/* classification tables */ 
generate c1 = p1>.5
generate c2 = p2>.5
tabulate c1 c2
           |          c2
        c1 |         0          1 |     Total
         0 |       164          4 |       168 
         1 |         0         32 |        32 
     Total |       164         36 |       200

Note the out-of range predictions, negative values, in the example above.

Next, let's look at a counter example in which OLS and logistic produce different results.

Counter Example

use, clear
regress hiqual enroll meals avg_ed
      Source |       SS       df       MS              Number of obs =    1149
-------------+------------------------------           F(  3,  1145) =  522.92
       Model |  145.625156     3  48.5417186           Prob > F      =  0.0000
    Residual |  106.287812  1145  .092827783           R-squared     =  0.5781
-------------+------------------------------           Adj R-squared =  0.5770
       Total |  251.912968  1148  .219436383           Root MSE      =  .30468

      hiqual |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      enroll |   .0000233   .0000521     0.45   0.654    -.0000789    .0001255
       meals |  -.0077637   .0005286   -14.69   0.000    -.0088009   -.0067266
      avg_ed |   .1697195   .0210965     8.04   0.000     .1283274    .2111116
       _cons |   .2513082    .083063     3.03   0.003     .0883353     .414281
predict p1
(51 missing values generated)
logit hiqual enroll meals avg_ed, nolog
Logit estimates                                   Number of obs   =       1149
                                                  LR chi2(3)      =     917.65
                                                  Prob > chi2     =     0.0000
Log likelihood = -265.40191                       Pseudo R2       =     0.6335

      hiqual |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      enroll |  -.0019593    .000735    -2.67   0.008    -.0033999   -.0005187
       meals |  -.0785112   .0076189   -10.30   0.000     -.093444   -.0635784
      avg_ed |   2.148565    .299792     7.17   0.000     1.560984    2.736147
       _cons |  -3.302163   1.030206    -3.21   0.001     -5.32133   -1.282996
predict p2
(51 missing values generated)
summarize p1 p2

    Variable |     Obs        Mean   Std. Dev.       Min        Max
          p1 |    1149    .3246301   .3561617  -.3461998   1.101522
          p2 |    1149    .3246301   .3848081    .000036   .9986064
corr p1 p2

             |       p1       p2
          p1 |   1.0000
          p2 |   0.9256   1.0000
list p1 p2 in 1/20
            p1         p2
  1.  .0686489   .0087564
  2. -.2424376   .0002072
  3.  .4535227    .348036
  4.  .5269313   .4345559
  5.  .9753819   .9948776
  6. -.0621362   .0028704
  7. -.0452434   .0019718
  8.  .4313931   .2918692
  9.  .2042937   .0150685
 10.  .7003989   .8609163
 11.         .          .
 12. -.0974959   .0009283
 13.  .5642942   .6845942
 14.  .1766382   .0204579
 15.  .4774918   .3685073
 16. -.1551605   .0002245
 17.  .8235874   .9654739
 18.  .0041572   .0047186
 19. -.0109633   .0019114
 20. -.1417836   .0005978
/* classification tables */  
generate c1 = p1>.5 if p1~=.
generate c2 = p2>.5 if p1~=.
tabulate c1 c2
           |          c2
        c1 |         0          1 |     Total
         0 |       750          0 |       750 
         1 |        30        369 |       399 
     Total |       780        369 |      1149 

Categorical Data Analysis Course

Phil Ender