Chi-Square Formula

Marginals
| Males | Females | |||
| Monday Night Football | Watch | 65 | 30 | 95 |
| Don't Watch | 55 | 50 | 105 | |
| 120 | 80 | 200 |
| Males | Females | |||
| Monday Night Football | Watch | 32.5% | 15% | 47.5% |
| Don't Watch | 27.5% | 25% | 52.5% | |
| 60% | 40% | 100% |
| Males | Females | |||
| Monday Night Football | Watch | 68.4% | 31.6% | 100% |
| Don't Watch | 52.4% | 47.6% | 100% |
| Males | Females | |||
| Monday Night Football | Watch | 54.2% | 37.5% | |
| Don't Watch | 45.8% | 62.5% | ||
| 100% | 100% |
Expected Frequencies Under Independence

| Males | Females | |||
| Monday Night Football | Watch | 57 | 38 | 95 |
| Don't Watch | 63 | 42 | 105 | |
| 120 | 80 | 200 |
Hypotheses
Example Continued
| Males | Females | |||
| Monday Night Football | Watch | 65 57 | 30 38 | 95 |
| Don't Watch | 55 63 | 50 42 | 105 | |
| 120 | 80 | 200 |
| O | E | O-E | (O-E)2 | (O-E)2/E | |
|---|---|---|---|---|---|
| 65 | 57 | 8 | 64 | 1.12 | |
| 55 | 63 | -8 | 64 | 1.02 | |
| 30 | 38 | -8 | 64 | 1.68 | |
| 50 | 42 | 8 | 64 | 1.52 | |
| 200 | 200 | Totals | 5.35 | = Chi-square |
Using Stata
tabi 65 30 \ 55 50, exp row col chi2
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
| row percentage |
| column percentage |
+--------------------+
| col
row | 1 2 | Total
-----------+----------------------+----------
1 | 65 30 | 95
| 57.0 38.0 | 95.0
| 68.42 31.58 | 100.00
| 54.17 37.50 | 47.50
-----------+----------------------+----------
2 | 55 50 | 105
| 63.0 42.0 | 105.0
| 52.38 47.62 | 100.00
| 45.83 62.50 | 52.50
-----------+----------------------+----------
Total | 120 80 | 200
| 120.0 80.0 | 200.0
| 60.00 40.00 | 100.00
| 100.00 100.00 | 100.00
Pearson chi2(1) = 5.3467 Pr = 0.021
Example 2:
| SES | ||||
| Smoking | High | Middle | Low | |
| Current | 51 | 22 | 43 | 116 |
| Former | 92 | 21 | 28 | 141 |
| Never | 68 | 9 | 22 | 99 |
| 211 | 52 | 93 | 356 | |
| SES | ||||
| Smoking | High | Middle | Low | |
| Current | 24.2 | 42.3 | 46.2 | |
| Former | 43.6 | 40.4 | 30.1 | |
| Never | 32.3 | 17.3 | 23.7 | |
| 100 | 100 | 100 | ||
| SES | ||||
| Smoking | High | Middle | Low | |
| Current | 44 | 19 | 37 | 100 |
| Former | 65.3 | 14.9 | 19.9 | 100 |
| Never | 68.7 | 9.1 | 22.2 | 100 |
| SES | ||||
| Smoking | High | Middle | Low | |
| Current | 68.75 | 16.94 | 30.30 | 116 |
| Former | 83.57 | 20.60 | 36.83 | 141 |
| Never | 58.68 | 14.46 | 25.86 | 99 |
| 211 | 52 | 93 | 356 | |
| O | E | O-E | (O-E)2 | (O-E)2/E | |
|---|---|---|---|---|---|
| 51 | 68.75 | -17.75 | 315.06 | 4.58 | |
| 22 | 16.94 | 5.06 | 25.50 | 1.51 | |
| 43 | 30.30 | 12.7 | 161.29 | 5.32 | |
| 92 | 83.57 | 8.43 | 71.06 | 0.85 | |
| 21 | 20.60 | 0.4 | 0.16 | 0.01 | |
| 28 | 36.83 | -8.83 | 77.97 | 2.12 | |
| 68 | 58.68 | 9.32 | 86.86 | 1.48 | |
| 9 | 14.26 | -5.26 | 27.67 | 1.94 | |
| 22 | 25.86 | -3.86 | 14.90 | 0.58 | |
| 356 | 355.8 | Totals | 18.39 | = Chi-square |
Using Stata
tabi 51 22 43 \ 92 21 28 \ 68 9 22, exp row col chi2
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
| row percentage |
| column percentage |
+--------------------+
| col
row | 1 2 3 | Total
-----------+---------------------------------+----------
1 | 51 22 43 | 116
| 68.8 16.9 30.3 | 116.0
| 43.97 18.97 37.07 | 100.00
| 24.17 42.31 46.24 | 32.58
-----------+---------------------------------+----------
2 | 92 21 28 | 141
| 83.6 20.6 36.8 | 141.0
| 65.25 14.89 19.86 | 100.00
| 43.60 40.38 30.11 | 39.61
-----------+---------------------------------+----------
3 | 68 9 22 | 99
| 58.7 14.5 25.9 | 99.0
| 68.69 9.09 22.22 | 100.00
| 32.23 17.31 23.66 | 27.81
-----------+---------------------------------+----------
Total | 211 52 93 | 356
| 211.0 52.0 93.0 | 356.0
| 59.27 14.61 26.12 | 100.00
| 100.00 100.00 100.00 | 100.00
Pearson chi2(4) = 18.5097 Pr = 0.001More Using Stata
use http://www.philender.com/courses/data/hsb2, clear
list female race ses prog, nolabel
female race ses prog
1. 0 4 1 1
2. 1 4 2 3
3. 0 4 3 1
4. 0 4 3 3
5. 0 4 2 2
6. 0 4 2 2
7. 0 3 2 1
8. 0 1 2 2
9. 0 4 2 1
10. 0 3 2 2
11. 0 4 2 3
12. 0 4 2 2
13. 0 4 3 2
14. 0 4 3 2
15. 0 3 1 2
16. 0 4 1 1
17. 0 4 3 2
18. 0 4 2 1
19. 0 4 3 2
20. 0 4 2 1
21. 0 4 2 1
22. 0 4 2 3
23. 0 3 2 2
24. 0 1 3 2
25. 0 1 2 3
26. 0 3 2 3
[remainder of output omitted]
tab1 female race ses prog
-> tabulation of female
female | Freq. Percent Cum.
------------+-----------------------------------
male | 91 45.50 45.50
female | 109 54.50 100.00
------------+-----------------------------------
Total | 200 100.00
-> tabulation of race
race | Freq. Percent Cum.
-------------+-----------------------------------
hispanic | 24 12.00 12.00
asian | 11 5.50 17.50
african-amer | 20 10.00 27.50
white | 145 72.50 100.00
-------------+-----------------------------------
Total | 200 100.00
-> tabulation of ses
ses | Freq. Percent Cum.
------------+-----------------------------------
low | 47 23.50 23.50
middle | 95 47.50 71.00
high | 58 29.00 100.00
------------+-----------------------------------
Total | 200 100.00
-> tabulation of prog
type of |
program | Freq. Percent Cum.
------------+-----------------------------------
general | 45 22.50 22.50
academic | 105 52.50 75.00
vocation | 50 25.00 100.00
------------+-----------------------------------
Total | 200 100.00
tabulate race ses, exp row col chi2
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
| row percentage |
| column percentage |
+--------------------+
| ses
race | low middle high | Total
-------------+---------------------------------+----------
hispanic | 9 11 4 | 24
| 5.6 11.4 7.0 | 24.0
| 37.50 45.83 16.67 | 100.00
| 19.15 11.58 6.90 | 12.00
-------------+---------------------------------+----------
asian | 3 5 3 | 11
| 2.6 5.2 3.2 | 11.0
| 27.27 45.45 27.27 | 100.00
| 6.38 5.26 5.17 | 5.50
-------------+---------------------------------+----------
african-amer | 11 6 3 | 20
| 4.7 9.5 5.8 | 20.0
| 55.00 30.00 15.00 | 100.00
| 23.40 6.32 5.17 | 10.00
-------------+---------------------------------+----------
white | 24 73 48 | 145
| 34.1 68.9 42.0 | 145.0
| 16.55 50.34 33.10 | 100.00
| 51.06 76.84 82.76 | 72.50
-------------+---------------------------------+----------
Total | 47 95 58 | 200
| 47.0 95.0 58.0 | 200.0
| 23.50 47.50 29.00 | 100.00
| 100.00 100.00 100.00 | 100.00
Pearson chi2(6) = 18.5160 Pr = 0.005
Note: There is a problem in this analysis concerning low expected values
(expected frequencies). The computed value of chi-squared may not be distributed
as a chi-squared with six degrees of freedom. We can try collasping the race categories to see if
this helps the expected frequencies.
generate nonwhite = race~=4
tab nonwhite
nonwhite | Freq. Percent Cum.
------------+-----------------------------------
0 | 145 72.50 72.50
1 | 55 27.50 100.00
------------+-----------------------------------
Total | 200 100.00
tabulate nonwhite ses, exp row col chi2
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
| row percentage |
| column percentage |
+--------------------+
| ses
nonwhite | low middle high | Total
-----------+---------------------------------+----------
0 | 24 73 48 | 145
| 34.1 68.9 42.0 | 145.0
| 16.55 50.34 33.10 | 100.00
| 51.06 76.84 82.76 | 72.50
-----------+---------------------------------+----------
1 | 23 22 10 | 55
| 12.9 26.1 15.9 | 55.0
| 41.82 40.00 18.18 | 100.00
| 48.94 23.16 17.24 | 27.50
-----------+---------------------------------+----------
Total | 47 95 58 | 200
| 47.0 95.0 58.0 | 200.0
| 23.50 47.50 29.00 | 100.00
| 100.00 100.00 100.00 | 100.00
Pearson chi2(2) = 14.7922 Pr = 0.001
tabulate prog ses, exp row col chi2
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
| row percentage |
| column percentage |
+--------------------+
type of | ses
program | low middle high | Total
-----------+---------------------------------+----------
general | 16 20 9 | 45
| 10.6 21.4 13.1 | 45.0
| 35.56 44.44 20.00 | 100.00
| 34.04 21.05 15.52 | 22.50
-----------+---------------------------------+----------
academic | 19 44 42 | 105
| 24.7 49.9 30.4 | 105.0
| 18.10 41.90 40.00 | 100.00
| 40.43 46.32 72.41 | 52.50
-----------+---------------------------------+----------
vocation | 12 31 7 | 50
| 11.8 23.8 14.5 | 50.0
| 24.00 62.00 14.00 | 100.00
| 25.53 32.63 12.07 | 25.00
-----------+---------------------------------+----------
Total | 47 95 58 | 200
| 47.0 95.0 58.0 | 200.0
| 23.50 47.50 29.00 | 100.00
| 100.00 100.00 100.00 | 100.00
Pearson chi2(4) = 16.6044 Pr = 0.002
tab2 female ses prog, exp row col chi2
-> tabulation of female by ses
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
| row percentage |
| column percentage |
+--------------------+
| ses
female | low middle high | Total
-----------+---------------------------------+----------
male | 15 47 29 | 91
| 21.4 43.2 26.4 | 91.0
| 16.48 51.65 31.87 | 100.00
| 31.91 49.47 50.00 | 45.50
-----------+---------------------------------+----------
female | 32 48 29 | 109
| 25.6 51.8 31.6 | 109.0
| 29.36 44.04 26.61 | 100.00
| 68.09 50.53 50.00 | 54.50
-----------+---------------------------------+----------
Total | 47 95 58 | 200
| 47.0 95.0 58.0 | 200.0
| 23.50 47.50 29.00 | 100.00
| 100.00 100.00 100.00 | 100.00
Pearson chi2(2) = 4.5765 Pr = 0.101
-> tabulation of female by prog
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
| row percentage |
| column percentage |
+--------------------+
| type of program
female | general academic vocation | Total
-----------+---------------------------------+----------
male | 21 47 23 | 91
| 20.5 47.8 22.8 | 91.0
| 23.08 51.65 25.27 | 100.00
| 46.67 44.76 46.00 | 45.50
-----------+---------------------------------+----------
female | 24 58 27 | 109
| 24.5 57.2 27.2 | 109.0
| 22.02 53.21 24.77 | 100.00
| 53.33 55.24 54.00 | 54.50
-----------+---------------------------------+----------
Total | 45 105 50 | 200
| 45.0 105.0 50.0 | 200.0
| 22.50 52.50 25.00 | 100.00
| 100.00 100.00 100.00 | 100.00
Pearson chi2(2) = 0.0528 Pr = 0.974
-> tabulation of ses by prog
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
| row percentage |
| column percentage |
+--------------------+
| type of program
ses | general academic vocation | Total
-----------+---------------------------------+----------
low | 16 19 12 | 47
| 10.6 24.7 11.8 | 47.0
| 34.04 40.43 25.53 | 100.00
| 35.56 18.10 24.00 | 23.50
-----------+---------------------------------+----------
middle | 20 44 31 | 95
| 21.4 49.9 23.8 | 95.0
| 21.05 46.32 32.63 | 100.00
| 44.44 41.90 62.00 | 47.50
-----------+---------------------------------+----------
high | 9 42 7 | 58
| 13.1 30.4 14.5 | 58.0
| 15.52 72.41 12.07 | 100.00
| 20.00 40.00 14.00 | 29.00
-----------+---------------------------------+----------
Total | 45 105 50 | 200
| 45.0 105.0 50.0 | 200.0
| 22.50 52.50 25.00 | 100.00
| 100.00 100.00 100.00 | 100.00
Pearson chi2(4) = 16.6044 Pr = 0.002
Small Sample Sizes
It is possible to use Fisher's Exact Test with small samples. Fisher's Exact computes the p-value for the test of independence of the row and column variables. Contrary to the belief of some students, it does not compute an exact value of chi-squared. Here is a small example in Stata using the tabi command.
| men | woman | |
|---|---|---|
| dieting | 1 | 9 |
| not dieting | 11 | 3 |
tabi 1 9 \ 11 3, expected exact
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
+--------------------+
| col
row | 1 2 | Total
-----------+----------------------+----------
1 | 1 9 | 10
| 5.0 5.0 | 10.0
-----------+----------------------+----------
2 | 11 3 | 14
| 7.0 7.0 | 14.0
-----------+----------------------+----------
Total | 12 12 | 24
| 12.0 12.0 | 24.0
Fisher's exact = 0.003
1-sided Fisher's exact = 0.001
Other Issues
Intro Home Page
Phil Ender, 9dec05, 22nov00