Linear Statistical Models: Regression

Logistic Regression

Updated for Stata 11


Classical Regression vs Logistic Regression

Different Assumptions

Logistic Regression Assumptions

  1. The model is correctly specified, i.e., 1) the true conditional probabilities are a logistic function of the indpendent variables, 2) no important variables are omitted, 3) no extraneous variables are included, and 4) the independent variables are measured without error.
  2. The cases are independent.
  3. The independent variables are not linear combinations of each other. Perfect multicolinearity makes estimation impossible, while strong multicolinearity makes estimates imprecise.

Logit

Note: I would like to thank John Napier (1550-1617), lord of Merchiston (near Edinburgh), for developing the idea of logarithms.

About Logistic Regression

Intrepreting Logistic Coefficients

  • Logistic slope coefficients can be interpreted as the effect of a unit of change in the X variable on the predicted logits with the other variables in the model held constant. That is, how a one unit change in X effects the log of the odds when the other variables in the model held constant.

    Intrepreting Odds Ratios

  • Odds ratios in logistic regression can be interpreted as the effect of a one unit of change in X in the predicted odds ratio with the other variables in the model held constant.

    Example Dataset

    
    input apt gender admit
    8 1 1
    7 1 0
    5 1 1
    3 1 0
    3 1 0
    5 1 1
    7 1 1
    8 1 1
    5 1 1
    5 1 1
    4 0 0
    7 0 1
    3 0 1
    2 0 0
    4 0 0
    2 0 0
    3 0 0
    4 0 1
    3 0 0
    2 0 0
    end
    
      
    Example 1: Categorical Independent Variable
      
    logit admit i.gender
    
    Iteration 0:   log likelihood = -13.862944  
    Iteration 1:   log likelihood = -12.222013  
    Iteration 2:   log likelihood = -12.217286  
    Iteration 3:   log likelihood = -12.217286  
    
    Logistic regression                               Number of obs   =         20
                                                      LR chi2(1)      =       3.29
                                                      Prob > chi2     =     0.0696
    Log likelihood = -12.217286                       Pseudo R2       =     0.1187
    
    ------------------------------------------------------------------------------
           admit |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
        1.gender |   1.694596   .9759001     1.74   0.082    -.2181333    3.607325
           _cons |  -.8472978   .6900656    -1.23   0.220    -2.199801    .5052058
    ------------------------------------------------------------------------------
    
    logit admit gender, or
    
    Logistic regression                               Number of obs   =         20
                                                      LR chi2(1)      =       3.29
                                                      Prob > chi2     =     0.0696
    Log likelihood = -12.217286                       Pseudo R2       =     0.1187
    
    ------------------------------------------------------------------------------
           admit | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
        1.gender |   5.444444   5.313233     1.74   0.082     .8040182    36.86729
    ------------------------------------------------------------------------------
    
    Example 2: Continuous Independent Variable
    
    logit admit apt
    
    Iteration 0:   log likelihood = -13.862944
    Iteration 1:   log likelihood = -9.6278718
    Iteration 2:   log likelihood = -9.3197603
    Iteration 3:   log likelihood = -9.3029734
    Iteration 4:   log likelihood = -9.3028914
    
    Logit estimates                                   Number of obs   =         20
                                                      LR chi2(1)      =       9.12
                                                      Prob > chi2     =     0.0025
    Log likelihood = -9.3028914                       Pseudo R2       =     0.3289
    
    ------------------------------------------------------------------------------
       admit |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
         apt |   .9455112    .422872      2.236   0.025       .1166974    1.774325
       _cons |  -4.095248    1.83403     -2.233   0.026      -7.689881   -.5006154
    ------------------------------------------------------------------------------
    
    logit, or
    
    Logit estimates                                   Number of obs   =         20
                                                      LR chi2(1)      =       9.12
                                                      Prob > chi2     =     0.0025
    Log likelihood = -9.3028914                       Pseudo R2       =     0.3289
    
    ------------------------------------------------------------------------------
       admit | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
         apt |   2.574129   1.088527      2.236   0.025       1.123779      5.8963
    ------------------------------------------------------------------------------
    
    Example 3: Categorical & Continuous Independent Variables
    
    logit admit i.gender apt
    
    Iteration 0:   log likelihood = -13.862944  
    Iteration 1:   log likelihood = -9.3188454  
    Iteration 2:   log likelihood = -9.2822992  
    Iteration 3:   log likelihood = -9.2820991  
    Iteration 4:   log likelihood = -9.2820991  
    
    Logistic regression                               Number of obs   =         20
                                                      LR chi2(2)      =       9.16
                                                      Prob > chi2     =     0.0102
    Log likelihood = -9.2820991                       Pseudo R2       =     0.3304
    
    ------------------------------------------------------------------------------
           admit |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
        1.gender |   .2671938   1.300911     0.21   0.837    -2.282545    2.816932
             apt |   .8982803   .4713918     1.91   0.057    -.0256307    1.822191
           _cons |  -4.028764   1.838393    -2.19   0.028    -7.631949   -.4255801
    ------------------------------------------------------------------------------
     
    logit, or
    
    Logistic regression                               Number of obs   =         20
                                                      LR chi2(2)      =       9.16
                                                      Prob > chi2     =     0.0102
    Log likelihood = -9.2820991                       Pseudo R2       =     0.3304
    
    ------------------------------------------------------------------------------
           admit | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
        1.gender |   1.306294   1.699372     0.21   0.837     .1020242    16.72547
             apt |   2.455377   1.157445     1.91   0.057      .974695    6.185398
    ------------------------------------------------------------------------------

    Example 4: Honors Composition using HSB Dataset

    
    use http://www.philender.com/courses/data/hsbdemo, clear
    
    tabulate honors
    
        honcomp |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |        147       73.50       73.50
              1 |         53       26.50      100.00
    ------------+-----------------------------------
          Total |        200      100.00
      
    logit honors female i.ses read math
    
    Iteration 0:   log likelihood = -115.64441  
    Iteration 1:   log likelihood = -75.969526  
    Iteration 2:   log likelihood = -72.051616  
    Iteration 3:   log likelihood = -71.994777  
    Iteration 4:   log likelihood = -71.994756  
    Iteration 5:   log likelihood = -71.994756  
    
    Logistic regression                               Number of obs   =        200
                                                      LR chi2(5)      =      87.30
                                                      Prob > chi2     =     0.0000
    Log likelihood = -71.994756                       Pseudo R2       =     0.3774
    
    ------------------------------------------------------------------------------
          honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          female |   1.145726   .4513589     2.54   0.011     .2610792    2.030374
                 |
             ses |
              2  |  -1.040402   .5791511    -1.80   0.072    -2.175517     .094713
              3  |   .0541296   .5945439     0.09   0.927    -1.111155    1.219414
                 |
            read |   .0687277   .0287044     2.39   0.017     .0124681    .1249873
            math |   .1358904   .0336875     4.03   0.000     .0698642    .2019166
           _cons |  -12.55332   1.838493    -6.83   0.000     -16.1567   -8.949939
    ------------------------------------------------------------------------------
    
    testparm i.ses
    
     ( 1)  [honors]2.ses = 0
     ( 2)  [honors]3.ses = 0
    
               chi2(  2) =    6.13
             Prob > chi2 =    0.0466
    
    logit, or
    
    Logistic regression                               Number of obs   =        200
                                                      LR chi2(5)      =      87.30
                                                      Prob > chi2     =     0.0000
    Log likelihood = -71.994756                       Pseudo R2       =     0.3774
    
    ------------------------------------------------------------------------------
          honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          female |   1.145726   .4513589     2.54   0.011     .2610792    2.030374
                 |
             ses |
              2  |  -1.040402   .5791511    -1.80   0.072    -2.175517     .094713
              3  |   .0541296   .5945439     0.09   0.927    -1.111155    1.219414
                 |
            read |   .0687277   .0287044     2.39   0.017     .0124681    .1249873
            math |   .1358904   .0336875     4.03   0.000     .0698642    .2019166
           _cons |  -12.55332   1.838493    -6.83   0.000     -16.1567   -8.949939
    ------------------------------------------------------------------------------
    
    
     
    fitstat  /* available for J. Scott Long via the Internet */
    
    Measures of Fit for logit of honors
    
    Log-Lik Intercept Only:       -115.644   Log-Lik Full Model:            -71.995
    D(193):                        143.990   LR(5):                          87.299
                                             Prob > LR:                       0.000
    McFadden's R2:                   0.377   McFadden's Adj R2:               0.317
    ML (Cox-Snell) R2:               0.354   Cragg-Uhler(Nagelkerke) R2:      0.516
    McKelvey & Zavoina's R2:         0.549   Efron's R2:                      0.404
    Variance of y*:                  7.296   Variance of error:               3.290
    Count R2:                        0.830   Adj Count R2:                    0.358
    AIC:                             0.790   AIC*n:                         157.990
    BIC:                          -878.586   BIC':                          -60.808
    BIC used by Stata:             175.779   AIC used by Stata:             155.990
    
    lfit
    
    Logistic model for honors, goodness-of-fit test
    
           number of observations =       200
     number of covariate patterns =       189
                Pearson chi2(183) =       166.48
                      Prob > chi2 =         0.8040
    
    lfit, group(10)
    
    Logistic model for honors, goodness-of-fit test
    
      (Table collapsed on quantiles of estimated probabilities)
    
           number of observations =       200
                 number of groups =        10
          Hosmer-Lemeshow chi2(8) =        12.91
                      Prob > chi2 =         0.1151
    
    lstat
    
    Logistic model for honors
    
                  -------- True --------
    Classified |         D            ~D  |      Total
    -----------+--------------------------+-----------
         +     |        31            12  |         43
         -     |        22           135  |        157
    -----------+--------------------------+-----------
       Total   |        53           147  |        200
    
    Classified + if predicted Pr(D) >= .5
    True D defined as honors != 0
    --------------------------------------------------
    Sensitivity                     Pr( +| D)   58.49%
    Specificity                     Pr( -|~D)   91.84%
    Positive predictive value       Pr( D| +)   72.09%
    Negative predictive value       Pr(~D| -)   85.99%
    --------------------------------------------------
    False + rate for true ~D        Pr( +|~D)    8.16%
    False - rate for true D         Pr( -| D)   41.51%
    False + rate for classified +   Pr(~D| +)   27.91%
    False - rate for classified -   Pr( D| -)   14.01%
    --------------------------------------------------
    Correctly classified                        83.00%
    --------------------------------------------------


    Linear Statistical Models Course

    Phil Ender, 17sep10, 20dec00