Education 231C

Applied Categorical & Nonnormal Data Analysis

Poisson Models


Analysis of count data, while not new, has seen a tremendous increase in interest in the last 20 years. Along with this increase in interest there have been numerous improvements in the technology for analyzing these types of data. In this section we will cover poisson models and negative binomial models for analyzing count data.

Poisson Models

Poisson probabilities are use to model the number of occurrences (counts) of an event. One of the early recorded uses of the Poisson distribution was the 1898 study investigating the number of Prussian soldiers that were kicked to death by horses.

Here is the poisson distribution function,

with the single parameter λ. A poisson distribution has a mean equal to λ and a variance equal to λ. Table 1 shows poisson probabilities for λ = 1, 3 and 5. Table 1 is followed by the graph of the probabilities for the three lambdas.

As lambda increases, the distribution shifts to the right. For large values of lambda the distribution is approximately normal.

Distribution in which the mean equals the variance have equidispersion. When the variance is greater than the mean there is overdispersion. In practice, it is rare to find distributions with equidispersion.

The poisson regression model can be estimated using maximum-likelihood, with the following likelihood funxtion and log-likelihood function.

In the poisson regression model, the incidence rate or predicted count is given by The incedence rate ratio is used to compare incidence rates. The incidence rate ratio for a one-unit change in xi with all of the variables in the model held constant is The incidence rate ratio is the expected count for X+1 divided by the expected count for X.

Poisson Regression Example

We will illustrate poisson regression using the lahigh data set. In particular, we would like to know whether there is a gender difference in days absent and the relation between language NCE test scores and days absent. Note that for gender, 0 is female and 1 is male. Here is a histogram of days absent.

Interpretation

From the incidence rate ratios, being male decreases the expected number of days absent by a factor of .66, or equivalently, it decreases the expected number by 100*(.66-1)% = -33%. And, for each point increase in the language normal curve equivalence the expected number of days absent decreses by a factor of .98 (or 100*(.98-1)% = -2%) when the other variables are held constant.

The listcoef command also provides for standardized factor change. For a one standard deviation increase (approximately 18 points) in the language nce the expected number of days absent would decrease by a factor of .77 (100*(.77-1)% = -23%) with the other variables in the model held constant.

Another way of interpreting the model is to look at the marginal effects, also known as, partial change in the expected value.

Finally, we will look at the poisson goodness of fit. We should have looked at it earlier before trying to interpret the model but we needed to take some time to discuss how one goes about interpreting a poisson model. The large chi-square suggest that there is not a very good fit for the poisson regression model. This could either be because the explanatory variables are not very good or the poisson model is not appropriate. We saw earlier that the variance for daysabs was much greater than the the mean. This suggest that there is overdispersion. We will use nbvargr to compare the fit for poisson versus negabitive binomial models. As we suspected, the poisson model did not do a good job of approximating daysabs. The fact is overdispersion is very common in "real" data, the poisson distribution which works well in theory does not perform all that well in practice. The negative binomial model looks to be a much better fit.


Categorical Data Analysis Course

Phil Ender