Introduction to Research Design and Statistics

Simpson's Paradox


Airline Example

Consider these data on on-time performance for two airlines, Alaska Airlines and America West.
Alaska Airlines
Airport         No. On-time    No. Delayed   Pct Delayed
los angeles           497          62          11.1%
phoenix               221          12           5.2%
san diego             212          20           8.6%
san francisco         503         102          16.9%
seattle              1841         305          14.1%
----------------------------------------------------
total                3274         501          13.3%

America West
Airport         No. On-time    No. Delayed   Pct Delayed
los angeles           694         117          14.4%
phoenix              4840         415           7.9%
san diego             383          65          14.5%
san francisco         320         129          28.7%
seattle               201          61          23.3%
----------------------------------------------------
total                6438         787          10.9%
It is interesting that at each airport Alaska Airlines has a lower percent delayed than America West but overall America West has a lower percent delayed.

What can explain this discrepancy?

Let's look at educational example.

Test Scores in 1980

Consider the following fictitious set of test scores for three sub-groups from the year 1980.

----------+-----------------------------------------
    group |       sum              n      mean(1980)
----------+-----------------------------------------
        1 |   6500000.00        10,000        650.00
        2 |    640000.00         1,000        640.00
        3 |     60000.00           100        600.00
          | 
    Total |   7200000.00        11,100        648.65
----------+-----------------------------------------

Note that the means of the three groups are 650, 640 and 600. The overall mean of the sample is equal to 648.65.

Why is the overall mean equal to 648.65 and not 630? Isn't the average of 650, 640 and 600 equal to 630?

The overall mean is much higher than 630 because there were many more students with a mean of 650 then either 640 or 600. The overall mean must be computed using the weighted average of the sub-group means, i.e., we must multiply the mean of each sub-group by the number in that sample, add up the products for each of the sub-groups and then divide by the total number of students in the overall sample.

Test Scores in 1990

Next we look at the same three sub-groups ten years later, in 1990.

----------+-----------------------------------------
    group |       sum              n      mean(1990)
----------+-----------------------------------------
        1 |   6550000.00        10,000        655.00
        2 |   6450000.00        10,000        645.00
        3 |   1830000.00         3,000        610.00
          | 
    Total |  14830000.00        23,000        644.78
----------+-----------------------------------------

Note that the means of each of the three groups has gone up (655, 645, and 610). In fact, group three has improved by 10 points. However, the overall mean of the sample has gone down from 648.65 to 644.78.

Why does this happen?

Simpson's Paradox

This is an example of Simpson's Paradox. Simpson's Paradox refers to the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group (Moore & McCabe). Simpson's Paradox is named after Edward Simpson, who described it in a 1951 paper, although it was first described by the British statistician G. Udny Yule in the early 1900's.

Simpson's Paradox can occur whenever data are aggregated (combined). If data are collapsed across a sub-classification (like grades, race or age), the overall change may not represent what is really happening.

A Real World Example

Below is a table of SAT scores broken down by ethnic group for 1976 and 1990. All of the non-white groups saw their scores increase or hold the same while overall scores declined for the population as a whole.

Total SAT Subpopulation Scores by Ethnic Group
Year White BlackAsianAmerican
Indian
Mexican
American
Puerto
Rican
1976944 686932808 781765
1990 933737 938825809764
Berliner, D. (1993) Educational Reform in an Era of Disinformation. Educational Policy Analysis Archives


Intro Home Page

Phil Ender, 30Jun98