Statistical significance is certainly important but it is not necessarily the most important consideration in evaluating research results. Statistical significance tells us only the likelihood that the observed results are due to chance alone. Once we have determined statistical significance our concern should be with the effect size. Effect size is an indicator of of how strong or how important our results are.
One common method of indicating effect size is to express the difference in means in terms of standard deviations, not standard errors as in the t-test but standard deviations. One approach is to use the stand deviation of the control group (Glass' delta) but more commonly the pooled standard deviation (Cohen's d) is used. We will compute the pooled standard deviation as follows:
sp = sqrt(((n1-1)*s12 + (n2-1)*s22)/(n1+n2-2))Thus, effects size can be calculated as
d = (mean1 - mean2)/spYou can ignore the sign of d. (Technically, we are computing Hedges' g but for other than very small sample sizes Hedges' g and Cohen's d are virtually equal.)
Cohen gives the following very rough guidelines for interpreting the effect size d:
female | N mean sd variance -------+---------------------------------------- male | 91 50.12088 10.30516 106.1963 female | 109 54.99083 8.133715 66.15732 -------+----------------------------------------The pooled standard deviation is sp = 9.18 and the effect size is d = -4.87/9.18 = -0.53, which is a medium effect.
Here are the details of the computation:
sp = sqrt(((91-1)*106.2 + (109-1)*66.16)/(91+109-2)) sp = sqrt((9558 + 7145.28)/198) sp = sqrt(16703.28/198) sp = sqrt(84.36) sp = 9.18 d = (50.12 - 54.99)/9.18 = -4.87/9.18 = -0.53Let's try again by comparing the mean of the academic group versus the others. We will use Stata to do much of the computation.
generate academic = prog==2 ttest read, by(academic) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- 0 | 95 47.88421 .9429248 9.190494 46.01201 49.75641 1 | 105 56.1619 .935769 9.588779 54.30624 58.01757 ---------+-------------------------------------------------------------------- combined | 200 52.23 .7249921 10.25294 50.80035 53.65965 ---------+-------------------------------------------------------------------- diff | -8.277694 1.33128 -10.903 -5.652387 ------------------------------------------------------------------------------ Degrees of freedom: 198 Ho: mean(0) - mean(1) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 t = -6.2178 t = -6.2178 t = -6.2178 P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000 tabstat read, stat(n mean sd var) by(academic) Summary for variables: read by categories of: academic academic | N mean sd variance ---------+---------------------------------------- 0 | 95 47.88421 9.190494 84.46517 1 | 105 56.1619 9.588779 91.94469 ---------+---------------------------------------- Total | 200 52.23 10.25294 105.1227 -------------------------------------------------- display sqrt(((95-1)*84.47+(105-1)*91.94)/(95+105-2)) 9.401789 display (47.88-56.16)/9.4 -.88085106In this example, d = -0.88 which is a large effect size.
Here is some more detailed information one effects size.
Effect size can also be thought of as the average percentile standing of the average treatment (or experimental) participant relative to the average untreated (or control) participant. A d of 0.0 indicates that the mean of the treatment group is at the 50th percentile of the control group. A d of 0.8 indicates that the mean of the treatment group is at the 79th percentile of the control group. An effect size of 1.7 indicates that the mean of the treatment group is at the 95.5 percentile of the untreated group.
Effect sizes can also be interpreted in terms of the percent of nonoverlap of the treatment group's scores with those of the untreated group. A d of 0.0 indicates that the distribution of scores for the treatment group overlaps completely with the distribution of scores for the control group, there is 0% of nonoverlap. A d of 0.8 indicates a nonoverlap of 47.4% in the two distributions. A d of 1.7 indicates a nonoverlap of 75.4% in the two distributions.
Effect Size d | Percentile Standing | Percent Nonoverlap |
2.0 | 97.7 | 81.1% |
1.9 | 97.1 | 79.4% |
1.8 | 96.4 | 77.4% |
1.7 | 95.5 | 75.4% |
1.6 | 94.5 | 73.1% |
1.5 | 93.3 | 70.7% |
1.4 | 91.9 | 68.1% |
1.3 | 90 | 65.3% |
1.2 | 88 | 62.2% |
1.1 | 86 | 58.9% |
1.0 | 84 | 55.4% |
0.9 | 82 | 51.6% |
large 0.8 | 79 | 47.4% |
0.7 | 76 | 43.0% |
0.6 | 73 | 38.2% |
medium 0.5 | 69 | 33.0% |
0.4 | 66 | 27.4% |
0.3 | 62 | 21.3% |
small 0.2 | 58 | 14.7% |
0.1 | 54 | 7.7% |
0.0 | 50 | 0% |
Intro Home Page
Phil Ender, 12Nov03