Curvilinear Regression

Linear Statistical Models: Regression

Polynomial Regression

Updated for Stata 11

Polynomial regression can be used to fit a regression line to a curved set of points. Contrary to how it sounds, curvilinear regression uses a linear model to fit a curved line to data points. Curvilinear regression makes use of various transformations of variables to achieve its fit. An example of a curvilinear model is

Y' = b₀ + b₁X₁ + b₂X₂

where X₂ = X₁².

Curvilinear regression should not be confused with nonlinear regression (NL). Nonlinear regression fits arbitrary nonlinear functions to the dependent variable. An example of a nonlinear model is

Y' = b₀(1 - e^-bX)

Example 1

From Pedhazur (1997), a study looks at practice time (x) in minutes and the number of correct responses (y).

Stata Curvilinear Regression Program

use http://www.philender.com/courses/data/curve, clear

scatter y x

Remarks

From Pedhazur (1997), a study looks at practice time (x) in minutes and the number of correct responses (y). Inspection of the y vs x plot reveals a degree of curvilinearity.

Based upon the scatterplot we will try three models:
model 1 -- y = b_o + b₁x + e -- linear
model 2 -- y = b_o + b₁x + b₂x² + e -- quadratic
model 3 -- y = b_o + b₁x + b₂x² + b₃x³ + e -- cubic

regress y x   /* linear */

  Source |       SS       df       MS                  Number of obs =      18
---------+------------------------------               F(  1,    16) =   32.72
   Model |  380.112798     1  380.112798               Prob > F      =  0.0000
Residual |  185.887202    16  11.6179501               R-squared     =  0.6716
---------+------------------------------               Adj R-squared =  0.6511
   Total |      566.00    17  33.2941176               Root MSE      =  3.4085

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       x |   1.284165   .2245067      5.720   0.000       .8082319    1.760098
   _cons |    4.89154    1.73176      2.825   0.012       1.220372    8.562708
------------------------------------------------------------------------------

regress y c.x##c.x   /* linear and quadratic */

      Source |       SS       df       MS              Number of obs =      18
-------------+------------------------------           F(  2,    15) =   31.90
       Model |  458.245766     2  229.122883           Prob > F      =  0.0000
    Residual |  107.754234    15  7.18361562           R-squared     =  0.8096
-------------+------------------------------           Adj R-squared =  0.7842
       Total |         566    17  33.2941176           Root MSE      =  2.6802

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   4.151667   .8872181     4.68   0.000     2.260607    6.042728
             |
     c.x#c.x |   -.209529   .0635329    -3.30   0.005    -.3449462   -.0741119
             |
       _cons |  -2.236083    2.55445    -0.88   0.395    -7.680764    3.208598
------------------------------------------------------------------------------

regress y c.x##c.x##c.x   /* linear, quadratic and cubic */

      Source |       SS       df       MS              Number of obs =      18
-------------+------------------------------           F(  3,    14) =   20.30
       Model |  460.224174     3  153.408058           Prob > F      =  0.0000
    Residual |  105.775826    14  7.55541616           R-squared     =  0.8131
-------------+------------------------------           Adj R-squared =  0.7731
       Total |         566    17  33.2941176           Root MSE      =  2.7487

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   2.267499   3.792818     0.60   0.559    -5.867288    10.40229
             |
     c.x#c.x |   .0975798   .6036817     0.16   0.874    -1.197189    1.392348
             |
 c.x#c.x#c.x |  -.0144026   .0281457    -0.51   0.617    -.0747692     .045964
             |
       _cons |   .7460164     6.3894     0.12   0.909    -12.95788    14.44992
------------------------------------------------------------------------------

test x c.x#c.x

 ( 1)  x = 0
 ( 2)  c.x#c.x = 0

       F(  2,    14) =   15.21
            Prob > F =    0.0003
            
/* rerun regression with linear and quadratic */

regress y c.x##c.x

[output omitted]
            
predict p

scatter y p x, msym(o i) con(. l)

Remarks

From the above analysis, it appears that model 2 appears to be our best bet. The linear model is
y = -2.236083 + 4.151667x -0.209529x². A plot of y vs x with the predicted scores connect by a curved line is displayed above.

Example 2

Here is another artifical example. This time we are looking at the relationship between test perfromance and anxiety.

input anxiety perform 
1  11  
1  13  
2  24  
2  20  
3  42  
3  36  
4  48  
4  42  
5  46  
5  38  
6  23  
6  19  
7   9  
7  11  
end

These data graph into an inverted-U shape. Let's run a second degree polynomial regression.

scatter perform anxiety



regress perform c.anxiety##c.anxiety

      Source |       SS       df       MS              Number of obs =      14
-------------+------------------------------           F(  2,    11) =   44.51
       Model |  2334.38095     2  1167.19048           Prob > F      =  0.0000
    Residual |   288.47619    11  26.2251082           R-squared     =  0.8900
-------------+------------------------------           Adj R-squared =  0.8700
       Total |  2622.85714    13  201.758242           Root MSE      =   5.121

------------------------------------------------------------------------------
     perform |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     anxiety |   29.63095    3.23401     9.16   0.000     22.51294    36.74896
             |
   c.anxiety#|
   c.anxiety |   -3.72619   .3950972    -9.43   0.000    -4.595794   -2.856587
             |
       _cons |  -16.71429   5.643117    -2.96   0.013     -29.1347   -4.293868
------------------------------------------------------------------------------

predict p

scatter perform p anxiety, msym(o i) con(. l)

In social psychology, this inverted-U curve is called the Yerkes-Dodson curve.

Example 3

Let's try this using the hsb2 dataset.

use http://www.philender.com/courses/data/hsbdemo, clear

scatter math write, jitter(2)



regress math write

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  1,   198) =  122.00
       Model |  6658.72246     1  6658.72246           Prob > F      =  0.0000
    Residual |  10807.0725   198  54.5811744           R-squared     =  0.3812
-------------+------------------------------           Adj R-squared =  0.3781
       Total |   17465.795   199  87.7678141           Root MSE      =  7.3879

------------------------------------------------------------------------------
        math |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       write |   .6102747   .0552524    11.05   0.000      .501316    .7192334
       _cons |   20.43775   2.962373     6.90   0.000      14.5959     26.2796
------------------------------------------------------------------------------

linktest

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  2,   197) =   70.23
       Model |  7269.48652     2  3634.74326           Prob > F      =  0.0000
    Residual |  10196.3085   197  51.7579111           R-squared     =  0.4162
-------------+------------------------------           Adj R-squared =  0.4103
       Total |   17465.795   199  87.7678141           Root MSE      =  7.1943

------------------------------------------------------------------------------
        math |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        _hat |  -4.355811   1.561601    -2.79   0.006    -7.435413    -1.27621
      _hatsq |   .0522368   .0152065     3.44   0.001     .0222484    .0822251
       _cons |   135.4435   39.70397     3.41   0.001     57.14414    213.7429
------------------------------------------------------------------------------

ovtest

Ramsey RESET test using powers of the fitted values of math
       Ho:  model has no omitted variables
                 F(3, 195) =      4.03
                  Prob > F =      0.0083

regress math c.write##c.write##c.write

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   46.63
       Model |  7273.64288     3  2424.54763           Prob > F      =  0.0000
    Residual |  10192.1521   196  52.0007761           R-squared     =  0.4165
-------------+------------------------------           Adj R-squared =  0.4075
       Total |   17465.795   199  87.7678141           Root MSE      =  7.2112

------------------------------------------------------------------------------
        math |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       write |  -2.501262   4.094651    -0.61   0.542    -10.57649    5.573969
             |
     c.write#|
     c.write |   .0430674   .0837155     0.51   0.608    -.1220313    .2081662
             |
     c.write#|
     c.write#|
     c.write |  -.0001581   .0005592    -0.28   0.778     -.001261    .0009448
             |
       _cons |   86.26009   65.31345     1.32   0.188    -42.54725    215.0674
------------------------------------------------------------------------------

regress math c.write##c.write

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  2,   197) =   70.23
       Model |  7269.48676     2  3634.74338           Prob > F      =  0.0000
    Residual |  10196.3082   197  51.7579098           R-squared     =  0.4162
-------------+------------------------------           Adj R-squared =  0.4103
       Total |   17465.795   199  87.7678141           Root MSE      =  7.1943

------------------------------------------------------------------------------
        math |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       write |   -1.35518   .5746805    -2.36   0.019    -2.488496    -.221865
             |
     c.write#|
     c.write |   .0194548   .0056634     3.44   0.001     .0082861    .0306235
             |
       _cons |   68.23992   14.21137     4.80   0.000     40.21397    96.26587
------------------------------------------------------------------------------

linktest

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  2,   197) =   70.27
       Model |  7272.18492     2  3636.09246           Prob > F      =  0.0000
    Residual |  10193.6101   197  51.7442136           R-squared     =  0.4164
-------------+------------------------------           Adj R-squared =  0.4104
       Total |   17465.795   199  87.7678141           Root MSE      =  7.1933

------------------------------------------------------------------------------
        math |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        _hat |   1.380268   1.667519     0.83   0.409    -1.908212    4.668748
      _hatsq |  -.0035515   .0155539    -0.23   0.820     -.034225     .027122
       _cons |  -10.04708   44.22772    -0.23   0.821    -97.26763    77.17347
------------------------------------------------------------------------------

ovtest

Ramsey RESET test using powers of the fitted values of math
       Ho:  model has no omitted variables
                 F(3, 194) =      0.15
                  Prob > F =      0.9303
                  
predict p

scatter math p write, msym(o i) con(. l) aspect(1) sort

Linear Statistical Models Course

Phil Ender, 21Jun99