Education 231C

Applied Categorical & Nonnormal Data Analysis

Probit Regression Models


An alternative to logistic regression analysis is probit analysis. The term "probit' was coined in the 1930's by Chester Bliss and stands for probability unit. These two analyses, logit and probit, are very similar to one another. As discussed in the previous unit logit analysis is based on log odds while probit uses the cumulative normal probability distribution. Here is what a cumulative normal distribution looks like.

Notice the S-shaped curve that runs from zero to one. It is very similar to the graph of the logit function. The two procedures are so similar that they can easily be confused with one another. The bottom line is that logistic regression and probit analysis produce predicted probabilities that are very similar. An example of predicted probabilities for logit and probit is given below. The probit model is defined as where Φ is the standard cumulative normal probability distribution and xb is called the probit score or index.

Since xb has a normal distribution, interpreting probit coefficients requires thinking in the Z (normal quantile) metric. The interpretation of a probit coefficient, b, is that a one-unit increase in the predictor leads to increasing the probit score by b standard deviations. Leaning to think and communicate in the Z metric takes practice and can be confusing to others. We will make use of a number of tools developed by Long and Freese to aid in the interpretation of the results.

The log-likelihood function for probit is

where wj denotes optional weights.

Currently, logic models are more popular than probit models due to two reasons; 1) the exponentiated logistic coefficients can be interpreted as odds ratios, and 2) there are more diagnostic tools available in logistic regression. Although, this last reason can be a chicken-egg issue, that is, there might be more diagnostic tools because it is being used more often.

We will demonstrate probit analysis using the same datasets that were used in the logistic regression analysis unit.

Example 1

Just a note on the interpretation of the probit coefficients. The coefficient for math is .07 to two decimal places. This indicates that a one-unit increase in the math score results in a .07 standard deviation increase in the predicted probit index. And the coefficient for female is interpreted to mean that the change from 0 to 1 increases the predicted probit index by .77 standard deviations.

Example 2

Example 3

Example 3 involves the use of blocked data, i.e., each observation consists of the number of occurrances of a variable and the number of observations in the population. The syntax for bprobit looks like this,


Categorical Data Analysis Course

Phil Ender