Applied Categorical & Nonnormal Data Analysis

Review of OLS Models


In OLS regression, we use linear combinations of predictor (independent) variables to compute expected values of the response (dependent) variable.

These expected values are conditional on the independent variables. The full model for OLS includes both the structural or systematic component, Σxβ, and a random component, ε.

The matrix formulation for OLS regression looks like this

Equation 1) gives the formula for obtaining the least squares regression coefficients. Equation 2) is the regression equation in matrix form, while equation 3) is used to obtain the conditional expected (predicted) values. The residuals, 4), are the difference between the observed values and the predicted values. The values of the coefficients are such that the sum of squared residuals, ssresid in 5), are a minimum.

Stata Program using Matrix Arithmetic

program define matreg2, eclass
  version 6.0
 
    syntax varlist(min=2 numeric) [if] [in] [, Level(integer $S_level)]
    marksample touse                       /* mark cases in the sample */
    tokenize "`varlist'"
  
    quietly matrix accum sscp = `varlist' if `touse'
    local nobs = r(N)
    local df = `nobs' - (rowsof(sscp) - 1) /* df residual */
  
    matrix XX = sscp[2...,2...]            /* X'X */
    matrix Xy = sscp[1,2...]               /* X'y */
  
    matrix b = Xy * syminv(XX)             /* (X'X)-1X'y */
    local k = colsof(b)                    /* number of coefs */
    matrix hat = Xy * b'
    matrix V = syminv(XX) * (sscp[1,1] - hat[1,1])/`df'
     
    estimates post b V, dof(`df') obs(`nobs') depname(`1') /*
      */ esample(`touse')
    est local depvar "`1'"
    est local cmd "matreg"
       
    display
    estimates display, level(`level')
 
  matrix drop sscp XX Xy hat
end

Example using matreg2

use http://www.ats.ucla.edu/stat/data/hsbdemo, clear
  
regress write read female
   
matreg2 write read female

Assumptions in OLS Regression

Linearity - The expected value of y is linearly related to the x's through the β parameters. Specification errors result when there is a nonlinear relationship.

Independence - The independence of the x's and ε is necessary in order to identify the unknown β parameters, that is, in order to be able to solve for the β's

ε are i.i.d. - The assumption is that the ε's are independent and identically distributed which implies there should be no heterogeneity of variance and no autocorrelation among the residuals.

All relevant variables are in the model - A specification error can occur when the model does not contain all of the relevant variables. As a corollary, a specification error can occur when irrelevant variables are included in the model.

x's are measured without error - The independent variables are measured without error.

Normality* - If we wish to draw statistical inferences we need to add the further assumption that the ε are normally distributed.

Example


Categorical Data Analysis Course

Phil Ender