Classical Regression vs Logistic Regression
Different Assumptions
Logistic Regression Assumptions
Logit
Note: I would like to thank John Napier (1550-1617), lord of Merchiston (near Edinburgh), for developing the idea of logarithms.
About Logistic Regression
Intrepreting Logistic Coefficients
Intrepreting Odds Ratios
Example Dataset
input apt gender admit 8 1 1 7 1 0 5 1 1 3 1 0 3 1 0 5 1 1 7 1 1 8 1 1 5 1 1 5 1 1 4 0 0 7 0 1 3 0 1 2 0 0 4 0 0 2 0 0 3 0 0 4 0 1 3 0 0 2 0 0 end Example 1: Categorical Independent Variable logit admit i.gender Iteration 0: log likelihood = -13.862944 Iteration 1: log likelihood = -12.222013 Iteration 2: log likelihood = -12.217286 Iteration 3: log likelihood = -12.217286 Logistic regression Number of obs = 20 LR chi2(1) = 3.29 Prob > chi2 = 0.0696 Log likelihood = -12.217286 Pseudo R2 = 0.1187 ------------------------------------------------------------------------------ admit | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.gender | 1.694596 .9759001 1.74 0.082 -.2181333 3.607325 _cons | -.8472978 .6900656 -1.23 0.220 -2.199801 .5052058 ------------------------------------------------------------------------------ logit admit gender, or Logistic regression Number of obs = 20 LR chi2(1) = 3.29 Prob > chi2 = 0.0696 Log likelihood = -12.217286 Pseudo R2 = 0.1187 ------------------------------------------------------------------------------ admit | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.gender | 5.444444 5.313233 1.74 0.082 .8040182 36.86729 ------------------------------------------------------------------------------ Example 2: Continuous Independent Variable logit admit apt Iteration 0: log likelihood = -13.862944 Iteration 1: log likelihood = -9.6278718 Iteration 2: log likelihood = -9.3197603 Iteration 3: log likelihood = -9.3029734 Iteration 4: log likelihood = -9.3028914 Logit estimates Number of obs = 20 LR chi2(1) = 9.12 Prob > chi2 = 0.0025 Log likelihood = -9.3028914 Pseudo R2 = 0.3289 ------------------------------------------------------------------------------ admit | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- apt | .9455112 .422872 2.236 0.025 .1166974 1.774325 _cons | -4.095248 1.83403 -2.233 0.026 -7.689881 -.5006154 ------------------------------------------------------------------------------ logit, or Logit estimates Number of obs = 20 LR chi2(1) = 9.12 Prob > chi2 = 0.0025 Log likelihood = -9.3028914 Pseudo R2 = 0.3289 ------------------------------------------------------------------------------ admit | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- apt | 2.574129 1.088527 2.236 0.025 1.123779 5.8963 ------------------------------------------------------------------------------ Example 3: Categorical & Continuous Independent Variables logit admit i.gender apt Iteration 0: log likelihood = -13.862944 Iteration 1: log likelihood = -9.3188454 Iteration 2: log likelihood = -9.2822992 Iteration 3: log likelihood = -9.2820991 Iteration 4: log likelihood = -9.2820991 Logistic regression Number of obs = 20 LR chi2(2) = 9.16 Prob > chi2 = 0.0102 Log likelihood = -9.2820991 Pseudo R2 = 0.3304 ------------------------------------------------------------------------------ admit | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.gender | .2671938 1.300911 0.21 0.837 -2.282545 2.816932 apt | .8982803 .4713918 1.91 0.057 -.0256307 1.822191 _cons | -4.028764 1.838393 -2.19 0.028 -7.631949 -.4255801 ------------------------------------------------------------------------------ logit, or Logistic regression Number of obs = 20 LR chi2(2) = 9.16 Prob > chi2 = 0.0102 Log likelihood = -9.2820991 Pseudo R2 = 0.3304 ------------------------------------------------------------------------------ admit | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.gender | 1.306294 1.699372 0.21 0.837 .1020242 16.72547 apt | 2.455377 1.157445 1.91 0.057 .974695 6.185398 ------------------------------------------------------------------------------
Example 4: Honors Composition using HSB Dataset
use http://www.philender.com/courses/data/hsbdemo, clear tabulate honors honcomp | Freq. Percent Cum. ------------+----------------------------------- 0 | 147 73.50 73.50 1 | 53 26.50 100.00 ------------+----------------------------------- Total | 200 100.00 logit honors female i.ses read math Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -75.969526 Iteration 2: log likelihood = -72.051616 Iteration 3: log likelihood = -71.994777 Iteration 4: log likelihood = -71.994756 Iteration 5: log likelihood = -71.994756 Logistic regression Number of obs = 200 LR chi2(5) = 87.30 Prob > chi2 = 0.0000 Log likelihood = -71.994756 Pseudo R2 = 0.3774 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 1.145726 .4513589 2.54 0.011 .2610792 2.030374 | ses | 2 | -1.040402 .5791511 -1.80 0.072 -2.175517 .094713 3 | .0541296 .5945439 0.09 0.927 -1.111155 1.219414 | read | .0687277 .0287044 2.39 0.017 .0124681 .1249873 math | .1358904 .0336875 4.03 0.000 .0698642 .2019166 _cons | -12.55332 1.838493 -6.83 0.000 -16.1567 -8.949939 ------------------------------------------------------------------------------ testparm i.ses ( 1) [honors]2.ses = 0 ( 2) [honors]3.ses = 0 chi2( 2) = 6.13 Prob > chi2 = 0.0466 logit, or Logistic regression Number of obs = 200 LR chi2(5) = 87.30 Prob > chi2 = 0.0000 Log likelihood = -71.994756 Pseudo R2 = 0.3774 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 1.145726 .4513589 2.54 0.011 .2610792 2.030374 | ses | 2 | -1.040402 .5791511 -1.80 0.072 -2.175517 .094713 3 | .0541296 .5945439 0.09 0.927 -1.111155 1.219414 | read | .0687277 .0287044 2.39 0.017 .0124681 .1249873 math | .1358904 .0336875 4.03 0.000 .0698642 .2019166 _cons | -12.55332 1.838493 -6.83 0.000 -16.1567 -8.949939 ------------------------------------------------------------------------------ fitstat /* available for J. Scott Long via the Internet */ Measures of Fit for logit of honors Log-Lik Intercept Only: -115.644 Log-Lik Full Model: -71.995 D(193): 143.990 LR(5): 87.299 Prob > LR: 0.000 McFadden's R2: 0.377 McFadden's Adj R2: 0.317 ML (Cox-Snell) R2: 0.354 Cragg-Uhler(Nagelkerke) R2: 0.516 McKelvey & Zavoina's R2: 0.549 Efron's R2: 0.404 Variance of y*: 7.296 Variance of error: 3.290 Count R2: 0.830 Adj Count R2: 0.358 AIC: 0.790 AIC*n: 157.990 BIC: -878.586 BIC': -60.808 BIC used by Stata: 175.779 AIC used by Stata: 155.990 lfit Logistic model for honors, goodness-of-fit test number of observations = 200 number of covariate patterns = 189 Pearson chi2(183) = 166.48 Prob > chi2 = 0.8040 lfit, group(10) Logistic model for honors, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) number of observations = 200 number of groups = 10 Hosmer-Lemeshow chi2(8) = 12.91 Prob > chi2 = 0.1151 lstat Logistic model for honors -------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 31 12 | 43 - | 22 135 | 157 -----------+--------------------------+----------- Total | 53 147 | 200 Classified + if predicted Pr(D) >= .5 True D defined as honors != 0 -------------------------------------------------- Sensitivity Pr( +| D) 58.49% Specificity Pr( -|~D) 91.84% Positive predictive value Pr( D| +) 72.09% Negative predictive value Pr(~D| -) 85.99% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 8.16% False - rate for true D Pr( -| D) 41.51% False + rate for classified + Pr(~D| +) 27.91% False - rate for classified - Pr( D| -) 14.01% -------------------------------------------------- Correctly classified 83.00% --------------------------------------------------
Linear Statistical Models Course
Phil Ender, 17sep10, 20dec00