By now you are familiar with OLS regression, a least squares criterion is not the only way to do regression. We could look at the absolute deviations from some point estimate, say the median. We would be trying to obtain the minimum absolute deviations (MAD).
According to Koenker (2000), quantile regression is a statistical technique intended to estimate and conduct inference about conditional quantile functions. Quantile regression methods offer a mechanism for estimationg the conditional median function in addtion to other conditional quantile functions. Ordinary least squares regression asks the question "How does the conditional mean of Y depend on the covariates X?" Quantile regression asks this question at each quantile of the conditional distribution giving a more complete description of how the conditional distribution of Y given X.
In Stata this can be done using the qreg command. Here are some quantile regressions using the hsb2 dataset.
use http://www.gseis.ucla.edu/courses/data/hsb2, clear tabstat write, by(female) stat(n p25 p50 p75) Summary for variables: write by categories of: female female | N p25 p50 p75 -------+---------------------------------------- male | 91 41 52 59 female | 109 50 57 62 -------+---------------------------------------- Total | 200 45.5 54 60 ------------------------------------------------ graph box write, over(female) qreg write female, quan(.25) nolog .25 Quantile regression Number of obs = 200 Raw sum of deviations 1333.5 (about 45) Min sum of deviations 1243 Pseudo R2 = 0.0679 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 9 1.797523 5.01 0.000 5.455253 12.54475 _cons | 41 1.287262 31.85 0.000 38.4615 43.5385 ------------------------------------------------------------------------------ qreg write female, quan(.50) nolog Median regression Number of obs = 200 Raw sum of deviations 1571 (about 54) Min sum of deviations 1536 Pseudo R2 = 0.0223 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 5 2.611711 1.91 0.057 -.1503393 10.15034 _cons | 52 1.927268 26.98 0.000 48.19939 55.80061 ------------------------------------------------------------------------------ qreg write female, quan(.75) nolog .75 Quantile regression Number of obs = 200 Raw sum of deviations 1084.5 (about 60) Min sum of deviations 1060 Pseudo R2 = 0.0226 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 3 1.23163 2.44 0.016 .5712036 5.428796 _cons | 59 .9385943 62.86 0.000 57.14908 60.85092 ------------------------------------------------------------------------------ list write in 10/14 +-------+ | write | |-------| 10. | 55 | 11. | 46 | 12. | 65 | 13. | 60 | 14. | 63 | +-------+ replace write = 600 in 13 (1 real change made) qreg write female, quan(.5) nolog Median regression Number of obs = 200 Raw sum of deviations 2111 (about 54) Min sum of deviations 2076 Pseudo R2 = 0.0166 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 5 2.611711 1.91 0.057 -.1503393 10.15034 _cons | 52 1.927268 26.98 0.000 48.19939 55.80061 ------------------------------------------------------------------------------ replace write = 6000 in 13 (1 real change made) qreg write female, quan(.5) nolog Median regression Number of obs = 200 Raw sum of deviations 7511 (about 54) Min sum of deviations 7476 Pseudo R2 = 0.0047 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 5 2.611711 1.91 0.057 -.1503393 10.15034 _cons | 52 1.927268 26.98 0.000 48.19939 55.80061 ------------------------------------------------------------------------------ replace write =6000 if write>=60 (52 real changes made) qreg write female, quan(.5) nolog Median regression Number of obs = 200 Raw sum of deviations 316210 (about 54) Min sum of deviations 316175 Pseudo R2 = 0.0001 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 5 2.611711 1.91 0.057 -.1503393 10.15034 _cons | 52 1.927268 26.98 0.000 48.19939 55.80061 ------------------------------------------------------------------------------ univar write -------------- Quantiles -------------- Variable n Mean S.D. Min .25 Mdn .75 Max ------------------------------------------------------------------------------- write 200 1625.97 2633.00 31.00 45.50 54.00 6000.00 6000.00 -------------------------------------------------------------------------------Note that increasing values greater than the median did not change the coefficients for the median regression.
We need to reload the data because of the changes that were made.
use http://www.gseis.ucla.edu/courses/data/hsb2, clear tabstat write, by(prog) stat(n p25 median p75) Summary for variables: write by categories of: prog (type of program) prog | N p25 p50 p75 ---------+---------------------------------------- general | 45 44 54 59 academic | 105 52 59 62 vocation | 50 40 46 54 ---------+---------------------------------------- Total | 200 45.5 54 60 -------------------------------------------------- sort prog graph write, box by(prog) xi: qreg write i.prog, quant(.50) nolog i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) Median regression Number of obs = 200 Raw sum of deviations 1571 (about 54) Min sum of deviations 1364 Pseudo R2 = 0.1318 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Iprog_2 | 5 1.955055 2.56 0.011 1.144477 8.855523 _Iprog_3 | -8 2.302537 -3.47 0.001 -12.54079 -3.459214 _cons | 54 1.646609 32.79 0.000 50.75276 57.24724 ------------------------------------------------------------------------------ test _Iprog_2 _Iprog_3 ( 1) _Iprog_2 = 0.0 ( 2) _Iprog_3 = 0.0 F( 2, 197) = 23.00 Prob > F = 0.0000 xi: qreg write i.prog, quant(.25) nolog i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) .25 Quantile regression Number of obs = 200 Raw sum of deviations 1333.5 (about 45) Min sum of deviations 1159.5 Pseudo R2 = 0.1305 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Iprog_2 | 8 2.717471 2.94 0.004 2.640933 13.35907 _Iprog_3 | -4 3.262362 -1.23 0.222 -10.43364 2.433635 _cons | 44 2.229953 19.73 0.000 39.60236 48.39764 ------------------------------------------------------------------------------ test _Iprog_2 _Iprog_3 ( 1) _Iprog_2 = 0.0 ( 2) _Iprog_3 = 0.0 F( 2, 197) = 10.37 Prob > F = 0.0000 xi: qreg write i.prog, quant(.75) nolog i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted) .75 Quantile regression Number of obs = 200 Raw sum of deviations 1084.5 (about 60) Min sum of deviations 993 Pseudo R2 = 0.0844 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Iprog_2 | 3 1.576171 1.90 0.058 -.1083338 6.108334 _Iprog_3 | -5 1.888316 -2.65 0.009 -8.723908 -1.276092 _cons | 59 1.284961 45.92 0.000 56.46595 61.53405 ------------------------------------------------------------------------------ test _Iprog_2 _Iprog_3 ( 1) _Iprog_2 = 0.0 ( 2) _Iprog_3 = 0.0 F( 2, 197) = 11.72 Prob > F = 0.0000 We have been using dummy (indicator) coding for the categorical variable. There are other possible codings that we could use. For this example, I would like to use a coding that compares general with vocational and one that compares the average of general and vocational with academic. We can create the coding using variable characteristics in Stata and apply them to the model using the xi3 command available for ATS via the Internet.
findit xi3 char prog[user] (1 0 -1 \ -.5 1 -.5) xi3: qreg write u.prog, nolog u.prog _Iprog_1-3 (naturally coded; _Iprog_3 omitted) Median regression Number of obs = 200 Raw sum of deviations 1571 (about 54) Min sum of deviations 1364 Pseudo R2 = 0.1318 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Iprog_1 | 8 2.302537 3.47 0.001 3.459214 12.54079 _Iprog_2 | 9 1.560877 5.77 0.000 5.921826 12.07817 _cons | 53 .8441035 62.79 0.000 51.33536 54.66464 ------------------------------------------------------------------------------In this next series of analyses we will look at models which include an interaction. We will use the variables female and socst and create an interaction fxs.
generate fxs = female*socst qreg write female socst fxs, quant(.50) nolog Median regression Number of obs = 200 Raw sum of deviations 1571 (about 54) Min sum of deviations 1170.167 Pseudo R2 = 0.2551 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 13.5 11.8569 1.14 0.256 -9.883489 36.88349 socst | .6666667 .1567594 4.25 0.000 .357515 .9758183 fxs | -.1666667 .2210759 -0.75 0.452 -.6026596 .2693262 _cons | 15 8.357237 1.79 0.074 -1.481651 31.48165 ------------------------------------------------------------------------------ qreg write female socst fxs, quant(.25) nolog .25 Quantile regression Number of obs = 200 Raw sum of deviations 1333.5 (about 45) Min sum of deviations 895 Pseudo R2 = 0.3288 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 9.1 5.512101 1.65 0.100 -1.770642 19.97064 socst | .7 .0696054 10.06 0.000 .5627283 .8372717 fxs | -.1 .1014278 -0.99 0.325 -.3000299 .1000299 _cons | 9.3 3.770638 2.47 0.015 1.863769 16.73623 ------------------------------------------------------------------------------ qreg write female socst fxs, quant(.75) nolog .75 Quantile regression Number of obs = 200 Raw sum of deviations 1084.5 (about 60) Min sum of deviations 866.3857 Pseudo R2 = 0.2011 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 20.31428 5.465128 3.72 0.000 9.536281 31.09229 socst | .6 .0689976 8.70 0.000 .4639269 .7360731 fxs | -.3142857 .1016025 -3.09 0.002 -.5146602 -.1139111 _cons | 24.4 3.647607 6.69 0.000 17.2064 31.5936 ------------------------------------------------------------------------------Next, we will take a look at the same model using an alternative coding scheme involving the difference between the groups and the grand median.
xi3: qreg write e.female*socst, quan(.5) nolog d.female _Ifemale_0-1 (naturally coded; _Ifemale_0 omitted) d.female*socst _IfemXsocst_# (coded as above) Median regression Number of obs = 200 Raw sum of deviations 1571 (about 54) Min sum of deviations 1170.167 Pseudo R2 = 0.2551 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ifemale_1 | 6.75 5.928452 1.14 0.256 -4.941744 18.44174 socst | .5833333 .110538 5.28 0.000 .3653369 .8013298 _IfemXsocs~1 | -.0833333 .110538 -0.75 0.452 -.3013298 .1346631 _cons | 21.75 5.928452 3.67 0.000 10.05826 33.44174 ------------------------------------------------------------------------------ describe _Ifemale_1 storage display value variable name type format label variable label ------------------------------------------------------------------------------- _Ifemale_1 byte %8.0g female(1 vs. grand mean) tabulate _Ifemale_1 female(1 | vs. grand | mean) | Freq. Percent Cum. ------------+----------------------------------- -1 | 91 45.50 45.50 1 | 109 54.50 100.00 ------------+----------------------------------- Total | 200 100.00 xi3: qreg write e.female*socst, quan(.25) nolog d.female _Ifemale_0-1 (naturally coded; _Ifemale_0 omitted) d.female*socst _IfemXsocst_# (coded as above) .25 Quantile regression Number of obs = 200 Raw sum of deviations 1333.5 (about 45) Min sum of deviations 895 Pseudo R2 = 0.3288 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ifemale_1 | 4.55 2.756051 1.65 0.100 -.885321 9.985321 socst | .65 .0507139 12.82 0.000 .5499851 .7500149 _IfemXsocs~1 | -.05 .0507139 -0.99 0.325 -.1500149 .0500149 _cons | 13.85 2.756051 5.03 0.000 8.414679 19.28532 ------------------------------------------------------------------------------ xi3: qreg write 3.female*socst, quan(.75) nolog d.female _Ifemale_0-1 (naturally coded; _Ifemale_0 omitted) d.female*socst _IfemXsocst_# (coded as above) .75 Quantile regression Number of obs = 200 Raw sum of deviations 1084.5 (about 60) Min sum of deviations 866.3857 Pseudo R2 = 0.2011 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ifemale_1 | 10.15714 2.732564 3.72 0.000 4.76814 15.54614 socst | .4428572 .0508013 8.72 0.000 .3426699 .5430444 _IfemXsocs~1 | -.1571428 .0508013 -3.09 0.002 -.2573301 -.0569556 _cons | 34.55714 2.732564 12.65 0.000 29.16814 39.94614 ------------------------------------------------------------------------------
Categorical Data Analysis Course
Phil Ender -- 5/15/04