Centering a variable involves subtracting the mean from each of the scores, that is, creating deviation scores. Centering can be done two ways; 1) centering using the grand mean and 2) centering using group means, which is also known as context centering.
Centering using the grand mean
We will illustrate issues surrounding centering using using the hsb2 dataset. We will begin by interpreting the constant in simple linear regression.
use http://www.philender.com/courses/data/hsbdemo, clear summarize socst Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- socst | 200 52.405 10.73579 26 71 generate csocst = socst - r(mean) regress write socst Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 114.19 Model | 6539.6427 1 6539.6427 Prob > F = 0.0000 Residual | 11339.2323 198 57.26885 R-squared = 0.3658 -------------+------------------------------ Adj R-squared = 0.3626 Total | 17878.875 199 89.843593 Root MSE = 7.5676 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- socst | .5339693 .0499688 10.69 0.000 .4354301 .6325086 _cons | 24.79234 2.672728 9.28 0.000 19.52167 30.063 ------------------------------------------------------------------------------ regress write csocst Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 114.19 Model | 6539.64271 1 6539.64271 Prob > F = 0.0000 Residual | 11339.2323 198 57.2688499 R-squared = 0.3658 -------------+------------------------------ Adj R-squared = 0.3626 Total | 17878.875 199 89.843593 Root MSE = 7.5676 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- csocst | .5339693 .0499688 10.69 0.000 .4354301 .6325086 _cons | 52.775 .5351114 98.62 0.000 51.71975 53.83025 ------------------------------------------------------------------------------ summarize write Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 200 52.775 9.478586 31 67Now, let's examine a model that includes an interaction.
regress write i.female##c.socst Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 49.26 Model | 7685.43528 3 2561.81176 Prob > F = 0.0000 Residual | 10193.4397 196 52.0073455 R-squared = 0.4299 -------------+------------------------------ Adj R-squared = 0.4211 Total | 17878.875 199 89.843593 Root MSE = 7.2116 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.female | 15.00001 5.09795 2.94 0.004 4.946132 25.05389 socst | .6247968 .0670709 9.32 0.000 .4925236 .7570701 | female#| c.socst | 1 | -.2047288 .0953726 -2.15 0.033 -.3928171 -.0166405 | _cons | 17.7619 3.554993 5.00 0.000 10.75095 24.77284 ------------------------------------------------------------------------------ regress write i.female##c.csocst Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 49.26 Model | 7685.43527 3 2561.81176 Prob > F = 0.0000 Residual | 10193.4397 196 52.0073456 R-squared = 0.4299 -------------+------------------------------ Adj R-squared = 0.4211 Total | 17878.875 199 89.843593 Root MSE = 7.2116 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.female | 4.271196 1.025448 4.17 0.000 2.248868 6.293523 csocst | .6247968 .0670709 9.32 0.000 .4925236 .7570701 | female#| c.csocst | 1 | -.2047288 .0953726 -2.15 0.033 -.3928171 -.0166405 | _cons | 50.50437 .7571024 66.71 0.000 49.01126 51.99749 ------------------------------------------------------------------------------ generate fxss = female*socst generate fxcs = female*csocst collin female socst fxss Collinearity Diagnostics SQRT Cond R- Variable VIF VIF Tolerance Eigenval Index Squared ------------------------------------------------------------------------ female 24.78 4.98 0.0403 2.0054 1.0000 0.9597 socst 1.98 1.41 0.5041 0.9752 1.4340 0.4959 fxss 26.27 5.13 0.0381 0.0194 10.1638 0.9619 ------------------------------------------------------------------------ Mean VIF 17.68 Condition Number 10.1638 Determinant of correlation matrix 0.0380 Cond Eigenval Index --------------------------------- 1 3.4495 1.0000 2 0.5122 2.5950 3 0.0325 10.2981 4 0.0057 24.5654 --------------------------------- Condition Number 24.5654 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept) Det(correlation matrix) 0.0380 collin female csocst fxcs Collinearity Diagnostics SQRT Cond R- Variable VIF VIF Tolerance Eigenval Index Squared ------------------------------------------------------------------------ female 1.00 1.00 0.9972 1.7089 1.0000 0.0028 csocst 1.98 1.41 0.5041 0.9950 1.3105 0.4959 fxcs 1.98 1.41 0.5049 0.2961 2.4024 0.4951 ------------------------------------------------------------------------ Mean VIF 1.66 Condition Number 2.4024 Determinant of correlation matrix 0.5035 Cond Eigenval Index --------------------------------- 1 1.7849 1.0000 2 1.6574 1.0378 3 0.2996 2.4410 4 0.2581 2.6295 --------------------------------- Condition Number 2.6295 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept) Det(correlation matrix) 0.5035Next, let's examine a polynomial regression.
summarize write Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- write | 200 52.775 9.478586 31 67 generate cwrite = write - r(mean) generate write2 = write^2 generate cwrite2 = cwrite^2 regress math write Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 122.00 Model | 6658.72246 1 6658.72246 Prob > F = 0.0000 Residual | 10807.0725 198 54.5811744 R-squared = 0.3812 -------------+------------------------------ Adj R-squared = 0.3781 Total | 17465.795 199 87.7678141 Root MSE = 7.3879 ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- write | .6102747 .0552524 11.05 0.000 .501316 .7192334 _cons | 20.43775 2.962373 6.90 0.000 14.5959 26.2796 ------------------------------------------------------------------------------ regress math cwrite Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 122.00 Model | 6658.72254 1 6658.72254 Prob > F = 0.0000 Residual | 10807.0725 198 54.581174 R-squared = 0.3812 -------------+------------------------------ Adj R-squared = 0.3781 Total | 17465.795 199 87.7678141 Root MSE = 7.3879 ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cwrite | .6102747 .0552524 11.05 0.000 .501316 .7192334 _cons | 52.645 .5224039 100.77 0.000 51.61481 53.67519 ------------------------------------------------------------------------------ regress math c.write##c.write Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 70.23 Model | 7269.48676 2 3634.74338 Prob > F = 0.0000 Residual | 10196.3082 197 51.7579098 R-squared = 0.4162 -------------+------------------------------ Adj R-squared = 0.4103 Total | 17465.795 199 87.7678141 Root MSE = 7.1943 ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- write | -1.35518 .5746805 -2.36 0.019 -2.488496 -.221865 | c.write#| c.write | .0194548 .0056634 3.44 0.001 .0082861 .0306235 | _cons | 68.23992 14.21137 4.80 0.000 40.21397 96.26587 ------------------------------------------------------------------------------ regress math c.cwrite##c.cwrite Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 70.23 Model | 7269.48677 2 3634.74339 Prob > F = 0.0000 Residual | 10196.3082 197 51.7579098 R-squared = 0.4162 -------------+------------------------------ Adj R-squared = 0.4103 Total | 17465.795 199 87.7678141 Root MSE = 7.1943 ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cwrite | .6982757 .0595918 11.72 0.000 .580756 .8157955 | c.cwrite#| c.cwrite | .0194548 .0056634 3.44 0.001 .0082861 .0306235 | _cons | 50.90585 .7177094 70.93 0.000 49.49047 52.32123 ------------------------------------------------------------------------------ collin write write2 Collinearity Diagnostics SQRT Cond R- Variable VIF VIF Tolerance Eigenval Index Squared ------------------------------------------------------------------------ write 114.08 10.68 0.0088 1.9956 1.0000 0.9912 write2 114.08 10.68 0.0088 0.0044 21.3149 0.9912 ------------------------------------------------------------------------ Mean VIF 114.08 Condition Number 21.3149 Determinant of correlation matrix 0.0088 Cond Eigenval Index --------------------------------- 1 2.9482 1.0000 2 0.0516 7.5593 3 0.0002 128.1167 --------------------------------- Condition Number 128.1167 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept) Det(correlation matrix) 0.0088 collin cwrite cwrite2 Collinearity Diagnostics SQRT Cond R- Variable VIF VIF Tolerance Eigenval Index Squared ------------------------------------------------------------------------ cwrite 1.23 1.11 0.8152 1.4299 1.0000 0.1848 cwrite2 1.23 1.11 0.8152 0.5701 1.5837 0.1848 ------------------------------------------------------------------------ Mean VIF 1.23 Condition Number 1.5837 Determinant of correlation matrix 0.8152 Cond Eigenval Index --------------------------------- 1 1.7409 1.0000 2 1.0000 1.3194 3 0.2591 2.5922 --------------------------------- Condition Number 2.5922 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept) Det(correlation matrix) 0.8152Centering scores is a technique that is recommended by some (Aiken & West, 1991; Bryk & Raudenbush, 1991) and viewed as unnecessary by others (Kromrey & Foster-Johnson, 1998; Pedhazur, 1997). Katrichis (1992) views centering negatively and has argued that this technique produces systematically biased estimates of main effects.
The arguments in favor of centering revolve primarily around 1) the greater ease of interpreting the coefficients and 2) reducing collinearity. As to reducing collinearity, modern statistical packages have sufficient numerical accuracy to estimate parameters for product and power variables.
Centering using group means
In this section we will center the socst variable using the means group means for males and females.
egen grmean = mean(socst), by(female) generate grcss = socst - grmean tabstat write, by(female) stat(n mean) Summary for variables: write by categories of: female female | N mean -------+-------------------- male | 91 50.12088 female | 109 54.99083 -------+-------------------- Total | 200 52.775 ---------------------------- regress write socst Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 114.19 Model | 6539.6427 1 6539.6427 Prob > F = 0.0000 Residual | 11339.2323 198 57.26885 R-squared = 0.3658 -------------+------------------------------ Adj R-squared = 0.3626 Total | 17878.875 199 89.843593 Root MSE = 7.5676 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- socst | .5339693 .0499688 10.69 0.000 .4354301 .6325086 _cons | 24.79234 2.672728 9.28 0.000 19.52167 30.063 ------------------------------------------------------------------------------ regress write grcss Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 106.93 Model | 6269.57313 1 6269.57313 Prob > F = 0.0000 Residual | 11609.3019 198 58.6328377 R-squared = 0.3507 -------------+------------------------------ Adj R-squared = 0.3474 Total | 17878.875 199 89.843593 Root MSE = 7.6572 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- grcss | .5235458 .0506298 10.34 0.000 .423703 .6233886 _cons | 52.775 .5414464 97.47 0.000 51.70726 53.84274 ------------------------------------------------------------------------------
In this model, an individual receives a predicted write score of 52.777 if they score at their group mean. For each point increase in the grcss the predicted write score increases by .52.
regress write grcss female Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 70.30 Model | 7445.78654 2 3722.89327 Prob > F = 0.0000 Residual | 10433.0885 197 52.9598399 R-squared = 0.4165 -------------+------------------------------ Adj R-squared = 0.4105 Total | 17878.875 199 89.843593 Root MSE = 7.2774 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- grcss | .5235458 .0481182 10.88 0.000 .428653 .6184386 female | 4.869946 1.033367 4.71 0.000 2.832065 6.907826 _cons | 50.12088 .7628737 65.70 0.000 48.61643 51.62533 ------------------------------------------------------------------------------
Here the predicted score for males at their group mean is 50.12. For females, at their group mean the predicted score is 54.99 (50.12 + 4.87). The regression lines for both males and females are the same, .52.
regress write c.grcss##i.female Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 49.26 Model | 7685.43528 3 2561.81176 Prob > F = 0.0000 Residual | 10193.4397 196 52.0073455 R-squared = 0.4299 -------------+------------------------------ Adj R-squared = 0.4211 Total | 17878.875 199 89.843593 Root MSE = 7.2116 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- grcss | .6247968 .0670709 9.32 0.000 .4925236 .7570701 1.female | 4.869946 1.024032 4.76 0.000 2.85041 6.889481 | female#| c.grcss | 1 | -.2047288 .0953726 -2.15 0.033 -.3928171 -.0166405 | _cons | 50.12088 .7559823 66.30 0.000 48.62998 51.61178 ------------------------------------------------------------------------------
Again, the predicted score for males at their group mean is 50.12, and for females, at their group mean the predicted score is 54.99 (50.12 + 4.87). The slope for males is .62 and the slope for females is .42 (.62 - .20)
Linear Statistical Models Course
Phil Ender, 18feb02, 22dec00