
Scatterplots are used to plot two variables simultaneously, i.e., the joint distribution (also called a bivariate distribution) of two variables. Scatterplots that appear as a random circular cluster of points indicate low associations between the variables and would be said to have a a low degree of correlation (correlation close to zero). As the degree of association, and thus the correlation increases the circular cluster becomes more elliptical. The more narrow the ellipse the higher the correlation between the two variables and the higher the ability to predict one variable from another.
If the main axis of the ellipse slopes up to the right the assiciation is positive, if it slopes down to the right the association is negative. With positive correlations higher values on one variable are associated with heigher values on the other variable. When the correlation is negative, higher values on one variable are associated with lower values on the other.
Selected Scatter Plots

Pearson Product Moment Correlation Coefficient
Also known as, the Pearson correlation coefficient, or just the correlation coefficient.
Correlation coefficients can take on any value between -1 and +1, with ±1 representing perfect correlations between the variables. A correlation of zero represents no relationship between the variables.
A rule of thumb for interpreting correlation coefficients:
Corr Interpretation 0 to .1 trivial .1 to .3 small .3 to .5 moderate .5 to .7 large .7 to .9 very large
Correlations are interpreted by squaring the value of the correlation coefficient. The squared value represents the proportion of variance of one variable that is shared with the other variable, in other words, the proportion of the variance of one variable that can be predicted from the other variable.
The squared correlation coefficient, r2, is know as the coefficient of determination. The proportion of variance that cannot be predicted or accounted for by the other variable is 1 - r2 and is also know as the coefficient of alienation.
Percent of Variance Accounted For

Correlation and Sample Size
The computation of correlation coefficients do not lend themselves to small sample sizes. The following table gives the recommended sample size for detecting various correlations with a power = 0.8 with an alpha = 0.05.
corr n .10 617 .20 153 .30 68 .40 37 .50 22 .60 15 .70 10 .80 7 .90 5
Covariance

Consider the variance as being the covariance of a variable with itself.

The Sample Correlation Coefficient

In deviation score form.

Calculator Computational Formula

Sources of Misleading Correlation Coefficients
Restriction of Range

Extreme Groups

Combining Groups

Outliers

Curvilinearity

Discuss Correlation & Causation
Of course, just because two variables are correlated it does not mean that they are causally related. Often a third variable, a lurking variable, that is not included in the analysis is responsible (causes) for the first two variables. A lurking variable is a variable that loiters in the background and affects both of the original variables
Stata Examples
use http://www.philender.com/courses/data/hsb2, clear
correlate female read write math science socst, cov
(obs=200)
| female read write math science socst
---------+------------------------------------------------------
female | .249221
read | -.271709 105.123
write | 1.21369 57.9967 89.8436
math | -.137211 63.6147 54.8293 87.7678
science | -.631407 63.9693 53.5339 58.5043 98.0276
socst | .280678 68.4089 61.5438 54.7626 49.4379 115.257
correlate female read write math science socst
(obs=200)
| female read write math science socst
---------+------------------------------------------------------
female | 1.0000
read | -0.0531 1.0000
write | 0.2565 0.5968 1.0000
math | -0.0293 0.6623 0.6174 1.0000
science | -0.1277 0.6302 0.5704 0.6307 1.0000
socst | 0.0524 0.6215 0.6048 0.5445 0.4651 1.0000
scatter write read
scatter write read, jitter(2)
twoway scatter write read, jitter(2) || lfit write read
use http://www.philender.com/courses/data/hsb1, clear /* contains missing data */
pwcorr read write math science socst, obs star(.05)
| read write math science socst
----------+---------------------------------------------
read | 1.0000
| 200
|
write | 0.5968* 1.0000
| 200 200
|
math | 0.6623* 0.6174* 1.0000
| 200 200 200
|
science | 0.6171* 0.5671* 0.6166* 1.0000
| 195 195 195 195
|
socst | 0.6215* 0.6048* 0.5445* 0.4529* 1.0000
| 200 200 200 195 200
Intro Home Page
Phil Ender, 15Jan98