Alaska Airlines Airport No. On-time No. Delayed Pct Delayed los angeles 497 62 11.1% phoenix 221 12 5.2% san diego 212 20 8.6% san francisco 503 102 16.9% seattle 1841 305 14.1% ---------------------------------------------------- total 3274 501 13.3% America West Airport No. On-time No. Delayed Pct Delayed los angeles 694 117 14.4% phoenix 4840 415 7.9% san diego 383 65 14.5% san francisco 320 129 28.7% seattle 201 61 23.3% ---------------------------------------------------- total 6438 787 10.9%It is interesting that at each airport Alaska Airlines has a lower percent delayed than America West but overall America West has a lower percent delayed.
What can explain this discrepancy?
Let's look at educational example.
----------+----------------------------------------- group | sum n mean(1980) ----------+----------------------------------------- 1 | 6500000.00 10,000 650.00 2 | 640000.00 1,000 640.00 3 | 60000.00 100 600.00 | Total | 7200000.00 11,100 648.65 ----------+-----------------------------------------
Note that the means of the three groups are 650, 640 and 600. The overall mean of the sample is equal to 648.65.
Why is the overall mean equal to 648.65 and not 630? Isn't the average of 650, 640 and 600 equal to 630?
The overall mean is much higher than 630 because there were many more students with a mean of 650 then either 640 or 600. The overall mean must be computed using the weighted average of the sub-group means, i.e., we must multiply the mean of each sub-group by the number in that sample, add up the products for each of the sub-groups and then divide by the total number of students in the overall sample.
----------+----------------------------------------- group | sum n mean(1990) ----------+----------------------------------------- 1 | 6550000.00 10,000 655.00 2 | 6450000.00 10,000 645.00 3 | 1830000.00 3,000 610.00 | Total | 14830000.00 23,000 644.78 ----------+-----------------------------------------
Note that the means of each of the three groups has gone up (655, 645, and 610). In fact, group three has improved by 10 points. However, the overall mean of the sample has gone down from 648.65 to 644.78.
Why does this happen?
Simpson's Paradox can occur whenever data are aggregated (combined). If data are collapsed across a sub-classification (like grades, race or age), the overall change may not represent what is really happening.
Total SAT Subpopulation Scores by Ethnic Group | ||||||
Year | White | Black | Asian | American Indian |
Mexican American | Puerto Rican |
1976 | 944 | 686 | 932 | 808 | 781 | 765 |
1990 | 933 | 737 | 938 | 825 | 809 | 764 |
Berliner, D. (1993) Educational Reform in an Era of Disinformation. Educational Policy Analysis Archives |
Intro Home Page
Phil Ender, 30Jun98