ED231C

Applied Categorical & Nonnormal Data Analysis

Syllabus


Analysis with dichotomous, ordinal and multinomial (polytomous) dependent variables. Topics include contigency table analysis, logistic (logit) models, probit models, poisson models, negative binomial models, loglinear models, regression with censored data and regression with selection.

Prerequisites:

A course in linear models or equivalent knowledge

Textbooks

Long, S. J. (1997) Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: SAGE Publications, Inc.

Highly Recommended Readings

Long, S. J. & Freese, J. (2001) Regression Models for Categorical Dependent Variables using Stata. College Station, TX: Stata Press

Recommended Readings

Agresti, A. (1996) An Introduction to Categorical Data Analysis. New York: Wiley.
Cleves, M.A., Gould, W.W. & Gutierrez, R.G. (2002) An introduction to survival analysis using Stata. College Station, TX: Stata Press.
Hardin, J. & Hilbe, J. (2001) Generalized Linear Models and Extensions. College Station, TX: Stata Press.
Hosmer, D. W. & Lemeshow, S. (2000) Applied Logistic Regression (2nd ed). New York: John Wiley & Sons, Inc.
Hosmer, D. W. & Lemeshow, S. (1999) Applied Survival Analysis. New York: John Wiley & Sons, Inc.
Singer, J. D. & Willett, J. B. (2003) Applied Longitudinal Data Analysis. Oxford University Press.
Stata Corp (2003) Stata Reference Manual. College Station, TX: Stata Press.
Stata Corp (1999) Stata Reference Manual Extract. College Station, TX: Stata Press.

Additional Readings

Agresti, A. (2001) Categorical Data Analysis (2nd ed). New York: Wiley.
Bishop Y.M.M., Fienberg, S.E. & Holland, P.W. (1975) Discrete Multivariate Analysis. Cambridge, MA: MIT Press.
Cameron, A.C. & Trivedi, P.K. (1998) Regression analysis of count data. Cambridge University Press.
Feinberg, S.E. (1977) The analysis of cross-classified categorical data. Cambridge, MA: MIT Press.
Hardin, J. & Hilbe, J. (2002) Generalized Estimating Equations. Boca Raton, FL: Chapman & Hall/CRC.
Kleinbaum, D.G. (1994) Logistic Regression. New York: Springer.
McCullagh, P & Nelder J.A. (1989) Generalized linear models (2nd ed). London: Chapman & Hall.
Powers, D. & Xie, Y. (1999) Statistical methods for categorical data analysis. San Deigo, California: Academic Press.

Textbook Chapters

1. Introduction 
1.1. Linear and Nonlinear Models
 1.2. Organization
 1.3. Orientation
 1.4. Bibliographic Notes

2. Continuous Outcomes: The Linear Regression Model 
 2.1. The Linear Regression Model
 2.2. Interpreting Regression Coefficients
 2.3. Estimation by Ordinary Least Squares
 2.4. Nonlinear Linear Regression Models
 2.5. Violations of the Assumptions
 2.6. Maximum Likelihood Estimation
 2.7. Conclusions
 2.8. Bibliographic Notes

3. Binary Outcomes: The Linear Probability, Probit, and Logit Models 
 3.1. The Linear Probability Model
 3.2. A Latent Variable Model for Binary Variables
 3.3. Identification
 3.4. A Nonlinear Probability Model
 3.5. ML Estimation
 3.6. Numerical Methods for ML Estimation
 3.7. Interpretation
 3.8. Interpretation Using Odds Ratios
 3.9. Conclusions
 3.10. Bibliographic Notes

4. Hypothesis Testing and Goodness of Fit 
 4.1. Hypothesis Testing
 4.2. Residuals and Influence
 4.3. Scalar Measures of Fit
 4.4. Conclusions
 4.5. Bibliographic Notes

5. Ordinal Outcomes: Ordered Logit and Ordered Probit Analysis 
 5.1. A Latent Variable Model for Ordinal Variables
 5.2. Identification
 5.3. Estimation
 5.4. Interpretation
 5.5. The Parallel Regression Assumption
 5.6. Related Models for Ordinal Data
 5.7. Conclusions
 5.8. Bibliographic Notes

6. Nominal Outcomes: Multinomial Logit and Related Models 
 6.1. Introduction to the Multinomial Logit Model
 6.2. The Multinomial Logit Model
 6.3. ML Estimation
 6.4. Computing and Testing Other Contrasts
 6.5. Two Useful Tests
 6.6. Interpretation
 6.7. The Conditional Logit Model
 6.8. Independence of Irrelevant Alternatives
 6.9. Related Models
 6.10. Conclusions
 6.11. Bibliographic Notes

7. Limited Outcomes: The Tobit Model 
 7.1. The Problem of Censoring
 7.2. Truncated and Censored Distributions
 7.3. The Tobit Model for Censored Outcomes
 7.4. Estimation
 7.5. Interpretation
 7.6. Extensions
 7.7. Conclusions
 7.8. Bibliographic Notes

8. Count Outcomes: Regression Models for Counts 
 8.1. The Poisson Distribution
 8.2. The Poisson Regression Model
 8.3. The Negative Binomial Regression Model
 8.4. Models for Truncated Counts
 8.5. Zero Modified Counted Models
 8.6. Comparisons Among Count Models
 8.7. Conclusions
 8.8. Bibliographic Notes

9. Conclusions 
 9.1. Links Using Latent Variable Models
 9.2. The Generalized Linear Model
 9.3. Similarities Among Probability Models
 9.4. Event History Analysis
 9.5. Log-Linear Models
 
Long & Freeze Chapters

 Part I  General Information

 1 Introduction
 2 Introduction to Stata 
 3 Estimation, Testing, Fit, and Interpretation 

 Part II Models for Specific Kinds of Outcomes 

 4 Models for Binary Outcomes 
 5 Models for Ordinal Outcomes 
 6 Models for Nominal Outcomes 
 7 Models for Count Outcomes 
 8 Additional Topics 
Homework and Tests:

There will be five computer assignments. Students are encouraged to work in small groups on the computer assignments.

There will be at least two quizes during the quarter.

Internet Access:

Students will be required to have network access by one of the following means: 1) GSE&IS Computer Labs, 2) their own departmental computer labs with Internet access, or 3) their own personal computer at home with dial-up or broadband Internet access.

World Wide Web:

Students will need to use a web browser (Netscape or Internet Explorer) to access course information, including assignments, datasets, examples, helpsheets, computer printouts, class discussion forums, and lecturers, is available over the Internet on the World Wide Web at the following URL: www.philender.com/courses/categorical/categorical.html.

Lecture Notes:

Lecture notes are available on the Ed231C Web site. Even a short glance at the class notes will reveal that there are more units than can be covered in a ten-week course. Hopefully, the material that is not covered in class will prove useful enough that students will wish to review the material on their own.

Hypothetical Course Schedule

Week	Topic					

1 Introduction 1 Contigency Tables 1 Review of OLS Models 2 Logistic (Logit) Models 2 Probit Models 3 Complementary Log-Log Models 3 Ordered Logit Models 4 Multinomial (Polytomous) Logit Models 4 Conditional Logit Models 4 Log-linear Regression 5 Poisson Models & Negative Binomial Models 5 Zero Inflated Poisson & Negative Binomial Models 6 Bivariate Probit Models 6 Generalized Linear Models 7 Generalized Estimating Equations 7 Regression Models with Censored Data or Truncated Data 8 Selection Models 8 Instrumental Variables Regression 9 Correspondence Analysis 9 Introduction to Survival Analysis 10 Introduction to Discete Time Survival anlysis 10 Review and consolidation


Phil Ender, Jun03