Path Analysis Background
Some Definitions
Path Analysis Assumptions
Consider the Model
Decomposition of correlations:
Each correlation can be decomposed into one or more of the following four types of effects:
Effects relating variables 1 and 2:
Path Tracing to Reproduce Correlations
B. If variable j sends a path to variable k, which in turn sends a path to variable i, either in two steps or through other intervening variables, simply trace back from i, through k to j. Multiply path coefficients as you go. If more than one distinct compound path exists going back to variable k, treat each separately.
B. A double-headed arrow, representing the correlation between two exogenous variables, can be traversed only once during any compound path. Note that a traverse of a double-headed correlation arrow always results in a change of direction. Tracing a correlation path results in a multiplication of the compound path by the correlation coefficient.
C. All the legitimate compound paths in the path diagram must be traced and values multiplied to determine the magnitude and sign of the compound effects.
With the following variables:
use http://www.philender.com/courses/data/ped788, clear corr ses iq am gpa (obs=300) | ses iq am gpa ---------+------------------------------------ ses | 1.0000 iq | 0.3000 1.0000 am | 0.4100 0.1600 1.0000 gpa | 0.3300 0.5700 0.5000 1.0000 regress iq ses, beta Source | SS df MS Number of obs = 300 ---------+------------------------------ F( 1, 298) = 29.47 Model | 26.9099995 1 26.9099995 Prob > F = 0.0000 Residual | 272.089997 298 .91305368 R-squared = 0.0900 ---------+------------------------------ Adj R-squared = 0.0869 Total | 298.999996 299 .999999987 Root MSE = .95554 ------------------------------------------------------------------------------ iq | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- ses | .3 .0552602 5.429 0.000 .3 _cons | -7.10e-09 .055168 0.000 1.000 . ------------------------------------------------------------------------------ display sqrt(1-.09) .9539392 regress am ses iq, beta Source | SS df MS Number of obs = 300 ---------+------------------------------ F( 2, 297) = 30.33 Model | 50.7117145 2 25.3558572 Prob > F = 0.0000 Residual | 248.288282 297 .835987481 R-squared = 0.1696 ---------+------------------------------ Adj R-squared = 0.1640 Total | 298.999996 299 .999999988 Root MSE = .91432 ------------------------------------------------------------------------------ am | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- ses | .3978022 .0554298 7.177 0.000 .3978022 iq | .0406593 .0554298 0.734 0.464 .0406593 _cons | -8.73e-09 .0527885 0.000 1.000 . ------------------------------------------------------------------------------ display sqrt(1-.1696) .91126286 regress gpa am ses iq, beta Source | SS df MS Number of obs = 300 ---------+------------------------------ F( 3, 296) = 97.28 Model | 148.445584 3 49.4818613 Prob > F = 0.0000 Residual | 150.554414 296 .508629776 R-squared = 0.4965 ---------+------------------------------ Adj R-squared = 0.4914 Total | 298.999998 299 .999999992 Root MSE = .71318 ------------------------------------------------------------------------------ gpa | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- am | .4161263 .0452609 9.194 0.000 .4161263 ses | .0091893 .046835 0.196 0.845 .0091893 iq | .500663 .0432751 11.569 0.000 .500663 _cons | 1.23e-09 .0411756 0.000 1.000 . ------------------------------------------------------------------------------ display sqrt(1-.4965) .70957734Estimated path coefficients from multiple regression analyses:
P21 = .300
P31 = .398
P32 = .041
P41 = .009
P42 = .501
P43 = .416
Path Analysis: Example 1: Just Identified Model
Compare actual and reproduced correlations: Model 1
To test whether the model fits the data, compare actual correlations to reproduced correlations based on paths in the model. We denote actual correlations by r and reproduced correlations by r*. The actual correlations are in brackets below.
r*12 = P21 DE = .300 [.300] r*13 = P31 + P32P21 DE IE = .398 + (.041)(.3) = .410 [.410] r*14 = P41 + P42P21 + P43P31 + P43P32P21 DE IE IE IE = .009+(.501)(.30)+(.416)(.398)+(.416)(.041)(.30) = .330 [.330] r*23 = P31P21 + P32 S DE = (.398)(.30) + .041 = .160 [.160] r*24 = P41P21 + P42 + P43P31P21 + P43P32 S DE S IE = (.009)(.30)+(.501)+(.416)(.398)(.30)+(.416)(.041) = .570 [.570] r*34 = P41P31 + P41P21P32 + P42P21P31 + P42P32 + P43 S S S S DE = (.009)(.398)+(.009)(.30)(.041)+(.501)(.30)(.398)+(.501)(.041)+.416 = .500 [.500]
Note:
This is not a very interesting example because the reproduced and original correlations will be the same -- this model has all possible paths among the variables (i.e., no paths deleted).
Path Analysis: Model 2
regress am ses, beta Source | SS df MS Number of obs = 300 ---------+------------------------------ F( 1, 298) = 60.22 Model | 50.2619001 1 50.2619001 Prob > F = 0.0000 Residual | 248.738096 298 .834691598 R-squared = 0.1681 ---------+------------------------------ Adj R-squared = 0.1653 Total | 298.999996 299 .999999988 Root MSE = .91361 ------------------------------------------------------------------------------ am | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- ses | .41 .0528357 7.760 0.000 .41 _cons | -9.02e-09 .0527476 0.000 1.000 . ------------------------------------------------------------------------------ display sqrt(1-.1681) .91208552 regress gpa iq am, beta Source | SS df MS Number of obs = 300 ---------+------------------------------ F( 2, 297) = 146.38 Model | 148.426003 2 74.2130017 Prob > F = 0.0000 Residual | 150.573994 297 .506983146 R-squared = 0.4964 ---------+------------------------------ Adj R-squared = 0.4930 Total | 298.999998 299 .999999992 Root MSE = .71203 ------------------------------------------------------------------------------ gpa | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- iq | .5028736 .041715 12.055 0.000 .5028736 am | .4195402 .041715 10.057 0.000 .4195402 _cons | 1.16e-09 .0411089 0.000 1.000 . ------------------------------------------------------------------------------ display sqrt(1-.4964) .7096478
Reproduced correlations: Model 2
r*12 = r12 U = .30 [.30] r*13 = P31 DE = .410 [.410] r*14 = P42r12 + P43P31 U IE = (.503)(.30)+(.420)(.410) = .323 [.330] r*23 = P31r12 U = (.410)(.30) = .123 [.160] r*24 = P42 + P43P31r12 DE U = (.503)+(.420)(.410)(.30) = .555 [.570] r*34 = P42P31r12 + P43 U DE = (.503)(.410)(.30)+(.420) = .482 [.50]Terman Data Set
Variables:
1 - Parents education
2 - Father's occupation
3 - Parents attitude
4 - IQ
5 - Achievement
6 - Education level
7 - Occupation
8 - Income
Terman Model 1
Terman Model 2
Reproduced Correlations: Terman Model 2
r*14 = P41 + P43r13 DE U r*15 = P54P41 + P54P43r13 IE U r*16 = P61 DE r*17 = P75P54P41 + P75P54P43r13 + P76P61 IE U IE r*18 = P87P75P54P41 + P87P75P54P43r13 + P87P76P61 IE U IE r*34 = P41r13 + P43 U DE r*35 = P54P41r13 + P54P43 U IE r*36 = P61r13 U r*37 = P75P54P41r13 + P75P54P43 + P76P61r13 U IE U r*38 = P87P75P54P41r13 + P87P75P54P43 + P87P76P61r13 U IE U r*45 = P54 DE r*46 = P61P41 + P61P43r13 S U r*47 = P75P54 + P76P61P41 + P76P61P43r13 IE S U r*48 = P87P75P54 + P87P76P61P41 + P87P76P61P43r13 IE S U r*56 = P61P54P41 + P61P54P43r13 S U r*57 = P75 + P76P61P54P41 + P76P61P54P43r13 DE S U r*58 = P87P75 + P87P76P61P54P41 + P87P76P61P54P43r13 IE S U r*67 = P75P61P54P41 + P75P61P54P43r13 + P76 S U DE r*68 = P87P75P61P54P41 + P87P75P61P54P43r13 + P87P76 S U IE r*78 = P87 DE
Reproduced and actual correlations: Terman Model 2
r*13 = .03 [ .03] r*14 = .16 [ .16] r*15 = -.016 [ .07] possible mismatch r*16 = .31 [ .31] r*17 = .10 [ .08] r*18 = .04 [ .06] r*34 = .08 [ .08] r*35 = -.008 [.003] r*36 = .01 [ .14] mismatch r*37 = .003 [ .09] r*38 = .001 [ .08] r*45 = -.10 [-.10] r*46 = .05 [ .10] r*47 = .02 [ .08] possible mismatch r*48 = .008 [ .09] possible mismatch r*56 = -.005 [ .06] possible mismatch r*57 = -.001 [ .02] r*58 = .04 [-.01] r*67 = .32 [ .32] r*68 = .13 [ .20] possible mismatch r*78 = .41 [ .41]
Terman Model 3
EstimatedEquations: Terman Model 3
z'6 = P61 z1
z'7 = P76 z6
z'8 = P87 z7
Reproduced and actual correlations: Terman Model 3
r*16 = P61 DE = .31 [.31] r*17 = P76 p61 IE = (.32)(.31) = .10 [.08] r*18 = P87 P76 P61 IE = (.41)(.32)(.31) = .04 [.06] r*67 = P76 DE = .32 [.32] r*68 = P87 P76 IE = (.41)(.32) = .13 [.20] (possible mismatch) r*78 = P87 DE = .41 [.41]Example Using Stata
We will use the hsbdemo dataset. For purposes of this example ses will be treated as continuous even thought it is categorical. In this example, ses and female will be exogenous while read and write will be endogenous. Here is our just identified model.
use http://www.philender.com/courses/data/hsbdemo, clear corr ses female read write (obs=200) | ses female read write ---------+------------------------------------ ses | 1.0000 female | -0.1250 1.0000 read | 0.2933 -0.0531 1.0000 write | 0.2075 0.2565 0.5968 1.0000 regress read ses female, beta Source | SS df MS Number of obs = 200 ---------+------------------------------ F( 2, 197) = 9.30 Model | 1805.58553 2 902.792765 Prob > F = 0.0001 Residual | 19113.8345 197 97.0245405 R-squared = 0.0863 ---------+------------------------------ Adj R-squared = 0.0770 Total | 20919.42 199 105.122714 Root MSE = 9.8501 ------------------------------------------------------------------------------ read | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- ses | 4.122699 .9716753 4.243 0.000 .2912371 female | -.3425006 1.40975 -0.243 0.808 -.0166765 _cons | 43.94452 2.333705 18.83 0.000 . ------------------------------------------------------------------------------ display sqrt(1 - .08632) .9558661 regress write read ses female, beta Source | SS df MS Number of obs = 200 ---------+------------------------------ F( 3, 196) = 52.17 Model | 7937.69723 3 2645.89908 Prob > F = 0.0000 Residual | 9941.17777 196 50.7202947 R-squared = 0.4440 ---------+------------------------------ Adj R-squared = 0.4355 Total | 17878.875 199 89.843593 Root MSE = 7.1218 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| Beta ---------+-------------------------------------------------------------------- read | .5470064 .051513 10.619 0.000 .591694 ses | .9296443 .7339381 1.267 0.207 .0710373 female | 5.634919 1.01943 5.528 0.000 .2967813 _cons | 19.2234 2.823373 6.81 0.000 . ------------------------------------------------------------------------------ display sqrt(1 - .444) .74565408
Let's say that you are kind of lazy and don't want to run three separate regressions and compute the error at each stage. Here is a convenience command that you can use if you have Stata 7.
A Shortcut
/* user written progran -- findit pathreg */ pathreg (read ses female)(write ses read female) ------------------------------------------------------------------------------ read | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- ses | 4.122699 .9716753 4.24 0.000 .2912371 female | -.3425006 1.40975 -0.24 0.808 -.0166765 _cons | 43.94452 2.333705 18.83 0.000 . ------------------------------------------------------------------------------ n = 200 R2 = 0.0863 sqrt(1 - R2) = 0.9559 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- ses | .9296443 .7339381 1.27 0.207 .0710373 read | .5470064 .051513 10.62 0.000 .591694 female | 5.634919 1.01943 5.53 0.000 .2967813 _cons | 19.2234 2.823373 6.81 0.000 . ------------------------------------------------------------------------------ n = 200 R2 = 0.4440 sqrt(1 - R2) = 0.7457
Reproducing Correlations
r*12 = r12 U = -.125 [-.125] r*13 = P31 + P32r12 DE U = .29 + (-.02)(-.125) = .29 [.29] r*14 = P41 + P42r12 + P43P31 DE U IE = .07 + (.3)(-.125) + (.59)(.29) = .20 [.21] r*23 = P32 + P31r12 DE U = -.02 + (.29)(-.125) = -.06 [-.05] r*24 = P42 + P43P32 + P43P31r12 + P41r12 DE IE U U = .3 +(.59)(-.02)+(.59)(.29)(-.125)+(.07)(-.125) = .26 [.26] r*34 = P43 + P42P32 + P41P31 + P42r12P31 + P41r12P32 DE IE IE S S = .59 +(.3)(-.02)+(.07)(.29)+(.3)(-.125)(.29)+(.07)(-.125)(-.02) = .59 [.6]
Overidentified Model
Now let's look at an overidentified model.
pathreg (read ses)(write read female) ------------------------------------------------------------------------------ read | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- ses | 4.15221 .9617596 4.32 0.000 .2933218 _cons | 43.69721 2.095003 20.86 0.000 . ------------------------------------------------------------------------------ n = 200 R2 = 0.0860 sqrt(1 - R2) = 0.9560 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- read | .5658869 .0493849 11.46 0.000 .6121169 female | 5.486894 1.014261 5.41 0.000 .2889851 _cons | 20.22837 2.713756 7.45 0.000 . ------------------------------------------------------------------------------ n = 200 R2 = 0.4394 sqrt(1 - R2) = 0.7487
Reproducing Correlations
r*12 = r12 U = -.125 [-.125] r*13 = P31 DE = .29 = .29 [.29] r*14 = P42r12 + P43P31 U IE = (.29)(-.125) + (.61)(.29) = .14 [.21] r*23 = P31r12 U = (.29)(-.125) = -.04 [-.05] r*24 = P42 + P43P31r12 DE U = .29 +(.61)(.29)(-.125) = .28 [.26] r*34 = P43 + P42r12P31 DE S = .61 + (.29)(-.125)(.29) = .6 [.6]