Linear Statistical Models: Regression

Path Analysis


Path Analysis Background

Some Definitions

Path Analysis Assumptions

Consider the Model

Decomposition of correlations:

Each correlation can be decomposed into one or more of the following four types of effects:

Path Tracing to Reproduce Correlations

  1. Begin with any endogenous variable, i. Trace back along a path that comes from variable j. This is a direct path and the path coefficient Pij represents a DE.

  2. If other paths come to a variable from a third or more variables, k, trace all paths between i and j that involve k. Multiply the path coefficients creating compound paths.

  3. While tracing paths do the following:

  4. When all direct and compound path values have been calculated, add them together to obtain the reproduced correlation between i and j.
Example 1

With the following variables:

  • 1 - SES
  • 2 - IQ
  • 3 - AM (achievement motivation)
  • 4 - GPA

    use http://www.philender.com/courses/data/ped788, clear
    
    corr ses iq am gpa
    (obs=300)
    
             |      ses       iq       am      gpa
    ---------+------------------------------------
         ses |   1.0000
          iq |   0.3000   1.0000
          am |   0.4100   0.1600   1.0000
         gpa |   0.3300   0.5700   0.5000   1.0000
    
    regress iq ses, beta
    
      Source |       SS       df       MS                  Number of obs =     300
    ---------+------------------------------               F(  1,   298) =   29.47
       Model |  26.9099995     1  26.9099995               Prob > F      =  0.0000
    Residual |  272.089997   298   .91305368               R-squared     =  0.0900
    ---------+------------------------------               Adj R-squared =  0.0869
       Total |  298.999996   299  .999999987               Root MSE      =  .95554
    
    ------------------------------------------------------------------------------
          iq |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
         ses |         .3   .0552602      5.429   0.000                         .3
       _cons |  -7.10e-09    .055168      0.000   1.000                          .
    ------------------------------------------------------------------------------
    
    display sqrt(1-.09)
    .9539392
    
    regress am ses iq, beta
    
      Source |       SS       df       MS                  Number of obs =     300
    ---------+------------------------------               F(  2,   297) =   30.33
       Model |  50.7117145     2  25.3558572               Prob > F      =  0.0000
    Residual |  248.288282   297  .835987481               R-squared     =  0.1696
    ---------+------------------------------               Adj R-squared =  0.1640
       Total |  298.999996   299  .999999988               Root MSE      =  .91432
    
    ------------------------------------------------------------------------------
          am |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
         ses |   .3978022   .0554298      7.177   0.000                   .3978022
          iq |   .0406593   .0554298      0.734   0.464                   .0406593
       _cons |  -8.73e-09   .0527885      0.000   1.000                          .
    ------------------------------------------------------------------------------
    
    display sqrt(1-.1696)
    .91126286
    
    regress gpa am ses iq, beta
    
      Source |       SS       df       MS                  Number of obs =     300
    ---------+------------------------------               F(  3,   296) =   97.28
       Model |  148.445584     3  49.4818613               Prob > F      =  0.0000
    Residual |  150.554414   296  .508629776               R-squared     =  0.4965
    ---------+------------------------------               Adj R-squared =  0.4914
       Total |  298.999998   299  .999999992               Root MSE      =  .71318
    
    ------------------------------------------------------------------------------
         gpa |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
          am |   .4161263   .0452609      9.194   0.000                   .4161263
         ses |   .0091893    .046835      0.196   0.845                   .0091893
          iq |    .500663   .0432751     11.569   0.000                    .500663
       _cons |   1.23e-09   .0411756      0.000   1.000                          .
    ------------------------------------------------------------------------------
    
    display sqrt(1-.4965)
    .70957734
    
    Estimated path coefficients from multiple regression analyses:

    P21 = .300
    P31 = .398
    P32 = .041
    P41 = .009
    P42 = .501
    P43 = .416

    Path Analysis: Example 1: Just Identified Model

    Compare actual and reproduced correlations: Model 1

    To test whether the model fits the data, compare actual correlations to reproduced correlations based on paths in the model. We denote actual correlations by r and reproduced correlations by r*. The actual correlations are in brackets below.

    r*12	= P21
    	  DE
    	= .300 [.300]
                    
    r*13	= P31 + P32P21
    	  DE     IE
    	= .398 + (.041)(.3) = .410 [.410]
                    
    r*14	= P41 + P42P21 + P43P31 + P43P32P21
    	  DE    IE      IE      IE       
    	= .009+(.501)(.30)+(.416)(.398)+(.416)(.041)(.30) = .330 [.330]
    	
    r*23	= P31P21 + P32
    	  S       DE
    	= (.398)(.30) + .041 = .160 [.160]
    
    
    r*24	= P41P21 + P42 + P43P31P21 + P43P32
    	  S        DE   S          IE
    	= (.009)(.30)+(.501)+(.416)(.398)(.30)+(.416)(.041) = .570 [.570]
    	
    	
    r*34	= P41P31 + P41P21P32 + P42P21P31 + P42P32 + P43
    	  S       S          S		 S       DE
    	= (.009)(.398)+(.009)(.30)(.041)+(.501)(.30)(.398)+(.501)(.041)+.416 = .500 [.500]
    

    Note:

    This is not a very interesting example because the reproduced and original correlations will be the same -- this model has all possible paths among the variables (i.e., no paths deleted).

    Path Analysis: Model 2

  • A more interesting overidentified model

    regress am ses, beta
    
      Source |       SS       df       MS                  Number of obs =     300
    ---------+------------------------------               F(  1,   298) =   60.22
       Model |  50.2619001     1  50.2619001               Prob > F      =  0.0000
    Residual |  248.738096   298  .834691598               R-squared     =  0.1681
    ---------+------------------------------               Adj R-squared =  0.1653
       Total |  298.999996   299  .999999988               Root MSE      =  .91361
    
    ------------------------------------------------------------------------------
          am |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
         ses |        .41   .0528357      7.760   0.000                        .41
       _cons |  -9.02e-09   .0527476      0.000   1.000                          .
    ------------------------------------------------------------------------------
    
    display sqrt(1-.1681)
    .91208552
    
    regress gpa iq am, beta
    
      Source |       SS       df       MS                  Number of obs =     300
    ---------+------------------------------               F(  2,   297) =  146.38
       Model |  148.426003     2  74.2130017               Prob > F      =  0.0000
    Residual |  150.573994   297  .506983146               R-squared     =  0.4964
    ---------+------------------------------               Adj R-squared =  0.4930
       Total |  298.999998   299  .999999992               Root MSE      =  .71203
    
    ------------------------------------------------------------------------------
         gpa |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
          iq |   .5028736    .041715     12.055   0.000                   .5028736
          am |   .4195402    .041715     10.057   0.000                   .4195402
       _cons |   1.16e-09   .0411089      0.000   1.000                          .
    ------------------------------------------------------------------------------
    
    display sqrt(1-.4964)
    .7096478
    

    Reproduced correlations: Model 2

    r*12	= r12 
    	  U
    	= .30 [.30]
    
    r*13	= P31     
    	  DE        
    	= .410  [.410]
    
    r*14	= P42r12 + P43P31  
    	  U        IE         
    	= (.503)(.30)+(.420)(.410) = .323 [.330]
    
    r*23	= P31r12     
    	  U        
    	= (.410)(.30) = .123  [.160]
    
    r*24	= P42 + P43P31r12
    	  DE    U                 
    	= (.503)+(.420)(.410)(.30) = .555 [.570]
    
    r*34	= P42P31r12 + P43
    	  U           DE
    	= (.503)(.410)(.30)+(.420) = .482  [.50]
    
    Terman Data Set

    Variables:

    1 - Parents education
    2 - Father's occupation
    3 - Parents attitude
    4 - IQ
    5 - Achievement
    6 - Education level
    7 - Occupation
    8 - Income

    Terman Model 1

    Terman Model 2

    Reproduced Correlations: Terman Model 2

    r*14	= P41 + P43r13  
    	  DE    U
    
    r*15	= P54P41 + P54P43r13 
    	  IE       U 
    
    r*16	= P61  
    	  DE
    
    r*17	= P75P54P41 + P75P54P43r13 + P76P61 
    	  IE         U		   IE
    
    r*18	= P87P75P54P41 + P87P75P54P43r13 + P87P76P61
    	  IE            U		 IE
    
    r*34	= P41r13 + P43
    	  U       DE
    
    r*35	= P54P41r13 + P54P43
    	  U          IE
    
    r*36	= P61r13
    	  U
    
    r*37	= P75P54P41r13 + P75P54P43 + P76P61r13
    	  U             IE	   U
    
    r*38	= P87P75P54P41r13 + P87P75P54P43 + P87P76P61r13
    	  U                IE	         U
    
    r*45	= P54
    	  DE
    
    r*46	= P61P41 + P61P43r13
    	  S        U
    	  
    r*47	= P75P54 + P76P61P41 + P76P61P43r13
    	  IE       S	      U
    	  
    r*48	= P87P75P54 + P87P76P61P41 + P87P76P61P43r13
    	  IE          S	           U
    
    r*56	= P61P54P41 + P61P54P43r13
    	  S          U
    	  
    r*57	= P75 + P76P61P54P41 + P76P61P54P43r13
    	  DE    S	      U
    	  
    r*58	= P87P75 + P87P76P61P54P41 + P87P76P61P54P43r13
    	  IE      S	           U
    	  
    r*67	= P75P61P54P41 + P75P61P54P43r13 + P76
    	  S             U	         DE
    	  
    r*68	= P87P75P61P54P41 + P87P75P61P54P43r13 + P87P76
    	  S                U	              IE
    	  
    r*78	= P87
    	  DE
    

    Reproduced and actual correlations: Terman Model 2

    r*13  =  .03   [ .03]
    r*14  =  .16   [ .16]	
    r*15  = -.016  [ .07]  possible mismatch
    r*16  =  .31   [ .31]
    r*17  =  .10   [ .08]
    r*18  =  .04   [ .06]
    r*34  =  .08   [ .08]
    r*35  = -.008  [.003]		 
    r*36  =  .01   [ .14]  mismatch
    r*37  =  .003  [ .09]
    r*38  =  .001  [ .08]
    r*45  = -.10   [-.10]
    r*46  =  .05   [ .10]
    r*47  =  .02   [ .08]  possible mismatch
    r*48  =  .008  [ .09]  possible mismatch
    r*56  = -.005  [ .06]  possible mismatch
    r*57  = -.001  [ .02]
    r*58  =  .04   [-.01]
    r*67  =  .32   [ .32]
    r*68  =  .13   [ .20]  possible mismatch
    r*78  =  .41   [ .41]
    

    Terman Model 3

    EstimatedEquations: Terman Model 3

    z'6 = P61 z1
    z'7 = P76 z6
    z'8 = P87 z7

    Reproduced and actual correlations: Terman Model 3

    r*16	= P61 
    	  DE
    	= .31 [.31]
    
    r*17	= P76 p61    
    	  IE        
    	= (.32)(.31) = .10  [.08]
    
    r*18	= P87 P76 P61   
    	  IE                         
    	= (.41)(.32)(.31) = .04  [.06]
    
    r*67	= P76     
    	  DE        
    	= .32  [.32]
    
    r*68	= P87 P76            
    	  IE                              
    	= (.41)(.32)  = .13  [.20]  (possible mismatch)
    
    r*78	= P87  
    	  DE      
    	= .41 [.41]
    
    Example Using Stata

    We will use the hsbdemo dataset. For purposes of this example ses will be treated as continuous even thought it is categorical. In this example, ses and female will be exogenous while read and write will be endogenous. Here is our just identified model.

    
    use http://www.philender.com/courses/data/hsbdemo, clear
    
    corr ses female read write
    
    (obs=200)
    
             |      ses   female     read    write
    ---------+------------------------------------
         ses |   1.0000
      female |  -0.1250   1.0000
        read |   0.2933  -0.0531   1.0000
       write |   0.2075   0.2565   0.5968   1.0000
    
    
    regress read ses female, beta
    
      Source |       SS       df       MS                  Number of obs =     200
    ---------+------------------------------               F(  2,   197) =    9.30
       Model |  1805.58553     2  902.792765               Prob > F      =  0.0001
    Residual |  19113.8345   197  97.0245405               R-squared     =  0.0863
    ---------+------------------------------               Adj R-squared =  0.0770
       Total |    20919.42   199  105.122714               Root MSE      =  9.8501
    
    ------------------------------------------------------------------------------
        read |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
         ses |   4.122699   .9716753      4.243   0.000                   .2912371
      female |  -.3425006    1.40975     -0.243   0.808                  -.0166765
       _cons |    43.94452   2.333705    18.83    0.000                          .
    ------------------------------------------------------------------------------
    
    display sqrt(1 - .08632)
    
    .9558661
    
    regress write read ses female, beta
    
      Source |       SS       df       MS                  Number of obs =     200
    ---------+------------------------------               F(  3,   196) =   52.17
       Model |  7937.69723     3  2645.89908               Prob > F      =  0.0000
    Residual |  9941.17777   196  50.7202947               R-squared     =  0.4440
    ---------+------------------------------               Adj R-squared =  0.4355
       Total |   17878.875   199   89.843593               Root MSE      =  7.1218
    
    ------------------------------------------------------------------------------
       write |      Coef.   Std. Err.       t     P>|t|                       Beta
    ---------+--------------------------------------------------------------------
        read |   .5470064    .051513     10.619   0.000                    .591694
         ses |   .9296443   .7339381      1.267   0.207                   .0710373
      female |   5.634919    1.01943      5.528   0.000                   .2967813
       _cons |    19.2234    2.823373     6.81    0.000                          .
    ------------------------------------------------------------------------------
    
    display sqrt(1 - .444)
    
    .74565408
    

    Let's say that you are kind of lazy and don't want to run three separate regressions and compute the error at each stage. Here is a convenience command that you can use if you have Stata 7.

    A Shortcut

    /* user written progran -- findit pathreg */
    
    pathreg (read ses female)(write ses read female)
    
    
    ------------------------------------------------------------------------------
            read |      Coef.   Std. Err.      t    P>|t|                     Beta
    -------------+----------------------------------------------------------------
             ses |   4.122699   .9716753     4.24   0.000                 .2912371
          female |  -.3425006    1.40975    -0.24   0.808                -.0166765
           _cons |   43.94452   2.333705    18.83   0.000                        .
    ------------------------------------------------------------------------------
                     n = 200  R2 = 0.0863  sqrt(1 - R2) = 0.9559
    
    ------------------------------------------------------------------------------
           write |      Coef.   Std. Err.      t    P>|t|                     Beta
    -------------+----------------------------------------------------------------
             ses |   .9296443   .7339381     1.27   0.207                 .0710373
            read |   .5470064    .051513    10.62   0.000                  .591694
          female |   5.634919    1.01943     5.53   0.000                 .2967813
           _cons |    19.2234   2.823373     6.81   0.000                        .
    ------------------------------------------------------------------------------
                     n = 200  R2 = 0.4440  sqrt(1 - R2) = 0.7457

    Reproducing Correlations

    r*12 = r12
            U
         = -.125 [-.125]
    
    r*13 = P31  + P32r12
            DE     U      
         = .29 +  (-.02)(-.125) = .29 [.29]
    
    r*14 = P41  + P42r12      + P43P31 
            DE       U            IE       
         =  .07 + (.3)(-.125) + (.59)(.29) = .20 [.21]
    
    r*23 = P32  + P31r12     
            DE      U        
         = -.02 + (.29)(-.125) = -.06  [-.05]
    
    r*24 = P42 + P43P32    + P43P31r12       + P41r12
            DE     IE            U                U                
         =  .3 +(.59)(-.02)+(.59)(.29)(-.125)+(.07)(-.125) = .26 [.26]
    
    r*34 = P43 + P42P32   + P41P31   + P42r12P31      + P41r12P32
            DE      IE        IE           S                S
         = .59 +(.3)(-.02)+(.07)(.29)+(.3)(-.125)(.29)+(.07)(-.125)(-.02) = .59 [.6]
    

    Overidentified Model

    Now let's look at an overidentified model.

    pathreg (read ses)(write read female)
    
    ------------------------------------------------------------------------------
            read |      Coef.   Std. Err.      t    P>|t|                     Beta
    -------------+----------------------------------------------------------------
             ses |    4.15221   .9617596     4.32   0.000                 .2933218
           _cons |   43.69721   2.095003    20.86   0.000                        .
    ------------------------------------------------------------------------------
                     n = 200  R2 = 0.0860  sqrt(1 - R2) = 0.9560
    
    ------------------------------------------------------------------------------
           write |      Coef.   Std. Err.      t    P>|t|                     Beta
    -------------+----------------------------------------------------------------
            read |   .5658869   .0493849    11.46   0.000                 .6121169
          female |   5.486894   1.014261     5.41   0.000                 .2889851
           _cons |   20.22837   2.713756     7.45   0.000                        .
    ------------------------------------------------------------------------------
                     n = 200  R2 = 0.4394  sqrt(1 - R2) = 0.7487
    
    

    Reproducing Correlations

    r*12 = r12
            U
         = -.125 [-.125]
    
    r*13 = P31  
            DE        
         = .29  = .29 [.29]
    
    r*14 = P42r12      +  P43P31 
              U             IE       
         = (.29)(-.125) + (.61)(.29) = .14 [.21]
    
    r*23 = P31r12     
              U        
         = (.29)(-.125) = -.04  [-.05]
    
    r*24 = P42 + P43P31r12 
            DE      U                         
         =  .29 +(.61)(.29)(-.125) = .28 [.26]
    
    r*34 = P43 + P42r12P31    
            DE      S     
         = .61 + (.29)(-.125)(.29) = .6 [.6]
    


    Linear Statistical Models Course

    Phil Ender, 29Jan98