As we discussed in the previous unit, probit analysis is based on the cululative normal probability distribution. The coefficients of the probit model are effects on a cumulative normal function of the probabilities that the response variable equals one. Here is a table of some z-scores and their associated probabilities:
Z-score Prob -2.0 .0228 -1.0 .1587 -0.5 .3085 0.0 .5000 0.5 .6915 1.0 .8413 2.0 .9772Consider an intercept only model using the honors dataset which we encountered earlier.
use http://www.gseis.ucla.edu/courses/data/honors, clear probit honors, nolog Probit estimates Number of obs = 200 LR chi2(0) = -0.00 Prob > chi2 = . Log likelihood = -115.64441 Pseudo R2 = -0.0000 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | -.628006 .0952758 -6.59 0.000 -.8147431 -.4412689 ------------------------------------------------------------------------------ predict p0The constant can be interpreted as a predicted z-score of -.628006. We could look this z-score up in a table or we could use Stata's norm function to find the probability associated with this z-score. We can also find the rhe empirical probability of being in honors using the sumarize command. And then we can compare both of these to the predicted probability from the predict command.
display norm(-.628006) .265 summarize p0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- p0 | 200 .265 0 .265 .265 tablist p0 +-------------+ | p0 Freq | |-------------| | .265 200 | +-------------+Next, we will add female to the model.
probit honors female, nolog Probit estimates Number of obs = 200 LR chi2(1) = 3.94 Prob > chi2 = 0.0473 Log likelihood = -113.6769 Pseudo R2 = 0.0170 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | .3848753 .1952923 1.97 0.049 .0021095 .767641 _cons | -.8494977 .1501507 -5.66 0.000 -1.143788 -.5552078 ------------------------------------------------------------------------------ predict xb, xb tablist female xb +---------------------------+ | female xb Freq | |---------------------------| | female -.4646225 109 | | male -.8494977 91 | +---------------------------+ predict p1 tablist female p1 +--------------------------+ | female p1 Freq | |--------------------------| | female .3211009 109 | | male .1978022 91 | +--------------------------+Thus, males would would have the probability associated with a predicted z-score of -.8494977 and females would have a z-score .3211009 higher, that is, being female increases the predicted z-score by .3211009.
display norm(-.8494977) .1978022 display -.8494977+.3848753 -.4646224 display norm(-.4646224) .32110094Finally, we will center math on 50 and use it as an interval predictor in the model.
generate math50 = math - 50 probit honors math50, nolog Probit estimates Number of obs = 200 LR chi2(1) = 64.91 Prob > chi2 = 0.0000 Log likelihood = -83.191478 Pseudo R2 = 0.2806 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- math50 | .0969714 .0138825 6.99 0.000 .0697622 .1241806 _cons | -1.107924 .1399694 -7.92 0.000 -1.382258 -.8335886 ------------------------------------------------------------------------------ predict p2 sort math list math p2 if math==50 | math==51 +-----------------+ | math p2 | |-----------------| 81. | 50 .1339474 | 82. | 50 .1339474 | 83. | 50 .1339474 | 84. | 50 .1339474 | 85. | 50 .1339474 | |-----------------| 86. | 50 .1339474 | 87. | 50 .1339474 | 88. | 51 .1560197 | 89. | 51 .1560197 | 90. | 51 .1560197 | |-----------------| 91. | 51 .1560197 | 92. | 51 .1560197 | 93. | 51 .1560197 | 94. | 51 .1560197 | 95. | 51 .1560197 | +-----------------+Now the constant is the predicted z-score when math equals 50 and the coefficient tells us how much the z-score will increase for each one-unit increase in the math score. Thus, a math score of 51 yields a predicted z-score of -1.0109526.
display norm(-1.107924) .13394732 display -1.107924+.0969714 -1.0109526 display norm(-1.0109526) .15601956We can verify that these same predicted probabilities are found when using math untransformed.
probit honors math, nolog Probit estimates Number of obs = 200 LR chi2(1) = 64.91 Prob > chi2 = 0.0000 Log likelihood = -83.191478 Pseudo R2 = 0.2806 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- math | .0969714 .0138825 6.99 0.000 .0697622 .1241806 _cons | -5.956492 .787668 -7.56 0.000 -7.500293 -4.412692 ------------------------------------------------------------------------------ prvalue, x(math=50) /* from Long and Freese */ probit: Predictions for honors Pr(y=1|x): 0.1339 95% ci: (0.0834,0.2023) Pr(y=0|x): 0.8661 95% ci: (0.7977,0.9166) math x= 50 prvalue, x(math=51) probit: Predictions for honors Pr(y=1|x): 0.1560 95% ci: (0.1021,0.2259) Pr(y=0|x): 0.8440 95% ci: (0.7741,0.8979) math x= 51Note that although it is possible to interpret the probit coefficients as changes in z-scores we end up convert the z-scores to probabilities. So, in the end its probably better to focus on the probabilities and/or the changes in probability in interpreting your probit model.
Categorical Data Analysis Course
Phil Ender