Introduction to Research Design and Statistics

Contingency Tables/Crosstabulation


Chi-Square Formula

  • With df = (Rows - 1)(Columns - 1) = (r - 1)*(c - 1)

  • See the Chi-Square Table

    Marginals

  • Frequencies
    MalesFemales
    Monday
    Night
    Football
    Watch653095
    Don't Watch5550105
    12080200

  • Cell Percentages
    MalesFemales
    Monday
    Night
    Football
    Watch32.5%15%47.5%
    Don't Watch27.5%25%52.5%
    60%40%100%

  • Row Percentages
    MalesFemales
    Monday
    Night
    Football
    Watch68.4%31.6%100%
    Don't Watch52.4%47.6%100%

  • Column Percentages
    MalesFemales
    Monday
    Night
    Football
    Watch54.2%37.5%
    Don't Watch45.8%62.5%
    100%100%

    Expected Frequencies Under Independence

  • Expected Frequencies
    MalesFemales
    Monday
    Night
    Football
    Watch573895
    Don't Watch6342105
    12080200

    Hypotheses

  • H0: The observed row frequencies are distributed independently of the column frequencies.
  • H1: The observed row frequencies are not distributed independently of the column frequencies.

    Example Continued

  • Observed & Expected Frequencies
    MalesFemales
    Monday
    Night
    Football
    Watch65 5730 3895
    Don't Watch55 6350 42105
    12080200

    OEO-E(O-E)2(O-E)2/E
     65 57 8641.12
     55 63-8641.02
     30 38-8641.68
     50 42 8641.52
    200200Totals5.35= Chi-square

  • df = (r-1)(c-1) = (2-1)(2-1) =1
  • Critical value of chi-square = 3.84
  • Observed value of chi-square = 5.35
  • Decision: Reject H0.
  • The frequency of MNF viewing is not independent of gender.
  • Men & women do not have the same MNF viewing patterns.

    Using Stata

    tabi 65 30 \ 55 50, exp row col chi2
    
    +--------------------+
    | Key                |
    |--------------------|
    |     frequency      |
    | expected frequency |
    |   row percentage   |
    | column percentage  |
    +--------------------+
    
               |          col
           row |         1          2 |     Total
    -----------+----------------------+----------
             1 |        65         30 |        95 
               |      57.0       38.0 |      95.0 
               |     68.42      31.58 |    100.00 
               |     54.17      37.50 |     47.50 
    -----------+----------------------+----------
             2 |        55         50 |       105 
               |      63.0       42.0 |     105.0 
               |     52.38      47.62 |    100.00 
               |     45.83      62.50 |     52.50 
    -----------+----------------------+----------
         Total |       120         80 |       200 
               |     120.0       80.0 |     200.0 
               |     60.00      40.00 |    100.00 
               |    100.00     100.00 |    100.00 
    
              Pearson chi2(1) =   5.3467   Pr = 0.021
    Example 2:

  • Observed Frequencies
    SES
    SmokingHighMiddleLow
    Current512243116
    Former922128141
    Never6892299
    2115293356

  • Column Percentages
    SES
    SmokingHighMiddleLow
    Current24.242.346.2
    Former43.640.430.1
    Never32.317.323.7
    100100100

  • Row Percentages
    SES
    SmokingHighMiddleLow
    Current441937100
    Former65.314.919.9100
    Never68.79.122.2100

  • Expected Frequencies
    SES
    SmokingHighMiddleLow
    Current68.7516.9430.30116
    Former83.5720.6036.83141
    Never58.6814.4625.8699
    2115293356

    OEO-E(O-E)2(O-E)2/E
    5168.75-17.75315.064.58
    2216.94  5.06 25.501.51
    4330.30 12.7161.295.32
    9283.57  8.43 71.060.85
    2120.60  0.4  0.160.01
    2836.83 -8.83 77.972.12
    6858.68  9.32 86.861.48
    914.26 -5.26 27.671.94
    2225.86 -3.86 14.900.58
    356355.8Totals18.39= Chi-square

  • df = (r-1)(c-1) = (3-1)(3-1) =4
  • Critical value of chi-square = 9.49
  • Observed value of chi-square = 18.39
  • Decision: Reject H0.
  • Assumptions for Example 2:
  • The frequency of smoking is not indpendent of SES.

    Using Stata

    tabi 51 22 43 \ 92 21 28 \ 68 9 22, exp row col chi2
    
    
    +--------------------+
    | Key                |
    |--------------------|
    |     frequency      |
    | expected frequency |
    |   row percentage   |
    | column percentage  |
    +--------------------+
    
               |               col
           row |         1          2          3 |     Total
    -----------+---------------------------------+----------
             1 |        51         22         43 |       116 
               |      68.8       16.9       30.3 |     116.0 
               |     43.97      18.97      37.07 |    100.00 
               |     24.17      42.31      46.24 |     32.58 
    -----------+---------------------------------+----------
             2 |        92         21         28 |       141 
               |      83.6       20.6       36.8 |     141.0 
               |     65.25      14.89      19.86 |    100.00 
               |     43.60      40.38      30.11 |     39.61 
    -----------+---------------------------------+----------
             3 |        68          9         22 |        99 
               |      58.7       14.5       25.9 |      99.0 
               |     68.69       9.09      22.22 |    100.00 
               |     32.23      17.31      23.66 |     27.81 
    -----------+---------------------------------+----------
         Total |       211         52         93 |       356 
               |     211.0       52.0       93.0 |     356.0 
               |     59.27      14.61      26.12 |    100.00 
               |    100.00     100.00     100.00 |    100.00 
    
              Pearson chi2(4) =  18.5097   Pr = 0.001

    More Using Stata

  • Using one line for each observation

    use http://www.philender.com/courses/data/hsb2, clear
    
    list female race ses prog, nolabel
    
            female          race        ses       prog 
      1.         0             4          1          1  
      2.         1             4          2          3  
      3.         0             4          3          1  
      4.         0             4          3          3  
      5.         0             4          2          2  
      6.         0             4          2          2  
      7.         0             3          2          1  
      8.         0             1          2          2  
      9.         0             4          2          1  
     10.         0             3          2          2  
     11.         0             4          2          3  
     12.         0             4          2          2  
     13.         0             4          3          2  
     14.         0             4          3          2  
     15.         0             3          1          2  
     16.         0             4          1          1  
     17.         0             4          3          2  
     18.         0             4          2          1  
     19.         0             4          3          2  
     20.         0             4          2          1  
     21.         0             4          2          1  
     22.         0             4          2          3  
     23.         0             3          2          2  
     24.         0             1          3          2  
     25.         0             1          2          3  
     26.         0             3          2          3
    [remainder of output omitted]
    
    tab1 female race ses prog
    
    -> tabulation of female  
    
         female |      Freq.     Percent        Cum.
    ------------+-----------------------------------
           male |         91       45.50       45.50
         female |        109       54.50      100.00
    ------------+-----------------------------------
          Total |        200      100.00
    
    -> tabulation of race  
    
            race |      Freq.     Percent        Cum.
    -------------+-----------------------------------
        hispanic |         24       12.00       12.00
           asian |         11        5.50       17.50
    african-amer |         20       10.00       27.50
           white |        145       72.50      100.00
    -------------+-----------------------------------
           Total |        200      100.00
    
    -> tabulation of ses  
    
            ses |      Freq.     Percent        Cum.
    ------------+-----------------------------------
            low |         47       23.50       23.50
         middle |         95       47.50       71.00
           high |         58       29.00      100.00
    ------------+-----------------------------------
          Total |        200      100.00
    
    -> tabulation of prog  
    
        type of |
        program |      Freq.     Percent        Cum.
    ------------+-----------------------------------
        general |         45       22.50       22.50
       academic |        105       52.50       75.00
       vocation |         50       25.00      100.00
    ------------+-----------------------------------
          Total |        200      100.00
    
    tabulate race ses, exp row col chi2
    
    +--------------------+
    | Key                |
    |--------------------|
    |     frequency      |
    | expected frequency |
    |   row percentage   |
    | column percentage  |
    +--------------------+
    
                 |               ses
            race |       low     middle       high |     Total
    -------------+---------------------------------+----------
        hispanic |         9         11          4 |        24 
                 |       5.6       11.4        7.0 |      24.0 
                 |     37.50      45.83      16.67 |    100.00 
                 |     19.15      11.58       6.90 |     12.00 
    -------------+---------------------------------+----------
           asian |         3          5          3 |        11 
                 |       2.6        5.2        3.2 |      11.0 
                 |     27.27      45.45      27.27 |    100.00 
                 |      6.38       5.26       5.17 |      5.50 
    -------------+---------------------------------+----------
    african-amer |        11          6          3 |        20 
                 |       4.7        9.5        5.8 |      20.0 
                 |     55.00      30.00      15.00 |    100.00 
                 |     23.40       6.32       5.17 |     10.00 
    -------------+---------------------------------+----------
           white |        24         73         48 |       145 
                 |      34.1       68.9       42.0 |     145.0 
                 |     16.55      50.34      33.10 |    100.00 
                 |     51.06      76.84      82.76 |     72.50 
    -------------+---------------------------------+----------
           Total |        47         95         58 |       200 
                 |      47.0       95.0       58.0 |     200.0 
                 |     23.50      47.50      29.00 |    100.00 
                 |    100.00     100.00     100.00 |    100.00 
    
              Pearson chi2(6) =  18.5160   Pr = 0.005
    Note: There is a problem in this analysis concerning low expected values (expected frequencies). The computed value of chi-squared may not be distributed as a chi-squared with six degrees of freedom. We can try collasping the race categories to see if this helps the expected frequencies.

    generate nonwhite = race~=4
    
    tab nonwhite
    
       nonwhite |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |        145       72.50       72.50
              1 |         55       27.50      100.00
    ------------+-----------------------------------
          Total |        200      100.00
    
    tabulate nonwhite ses, exp row col chi2
    
    +--------------------+
    | Key                |
    |--------------------|
    |     frequency      |
    | expected frequency |
    |   row percentage   |
    | column percentage  |
    +--------------------+
    
               |               ses
      nonwhite |       low     middle       high |     Total
    -----------+---------------------------------+----------
             0 |        24         73         48 |       145 
               |      34.1       68.9       42.0 |     145.0 
               |     16.55      50.34      33.10 |    100.00 
               |     51.06      76.84      82.76 |     72.50 
    -----------+---------------------------------+----------
             1 |        23         22         10 |        55 
               |      12.9       26.1       15.9 |      55.0 
               |     41.82      40.00      18.18 |    100.00 
               |     48.94      23.16      17.24 |     27.50 
    -----------+---------------------------------+----------
         Total |        47         95         58 |       200 
               |      47.0       95.0       58.0 |     200.0 
               |     23.50      47.50      29.00 |    100.00 
               |    100.00     100.00     100.00 |    100.00 
    
              Pearson chi2(2) =  14.7922   Pr = 0.001
    tabulate prog ses, exp row col chi2
    
    +--------------------+
    | Key                |
    |--------------------|
    |     frequency      |
    | expected frequency |
    |   row percentage   |
    | column percentage  |
    +--------------------+
    
       type of |               ses
       program |       low     middle       high |     Total
    -----------+---------------------------------+----------
       general |        16         20          9 |        45 
               |      10.6       21.4       13.1 |      45.0 
               |     35.56      44.44      20.00 |    100.00 
               |     34.04      21.05      15.52 |     22.50 
    -----------+---------------------------------+----------
      academic |        19         44         42 |       105 
               |      24.7       49.9       30.4 |     105.0 
               |     18.10      41.90      40.00 |    100.00 
               |     40.43      46.32      72.41 |     52.50 
    -----------+---------------------------------+----------
      vocation |        12         31          7 |        50 
               |      11.8       23.8       14.5 |      50.0 
               |     24.00      62.00      14.00 |    100.00 
               |     25.53      32.63      12.07 |     25.00 
    -----------+---------------------------------+----------
         Total |        47         95         58 |       200 
               |      47.0       95.0       58.0 |     200.0 
               |     23.50      47.50      29.00 |    100.00 
               |    100.00     100.00     100.00 |    100.00 
    
              Pearson chi2(4) =  16.6044   Pr = 0.002
    
    tab2 female ses prog, exp row col chi2
    
    -> tabulation of female by ses  
    
    +--------------------+
    | Key                |
    |--------------------|
    |     frequency      |
    | expected frequency |
    |   row percentage   |
    | column percentage  |
    +--------------------+
    
               |               ses
        female |       low     middle       high |     Total
    -----------+---------------------------------+----------
          male |        15         47         29 |        91 
               |      21.4       43.2       26.4 |      91.0 
               |     16.48      51.65      31.87 |    100.00 
               |     31.91      49.47      50.00 |     45.50 
    -----------+---------------------------------+----------
        female |        32         48         29 |       109 
               |      25.6       51.8       31.6 |     109.0 
               |     29.36      44.04      26.61 |    100.00 
               |     68.09      50.53      50.00 |     54.50 
    -----------+---------------------------------+----------
         Total |        47         95         58 |       200 
               |      47.0       95.0       58.0 |     200.0 
               |     23.50      47.50      29.00 |    100.00 
               |    100.00     100.00     100.00 |    100.00 
    
              Pearson chi2(2) =   4.5765   Pr = 0.101
    
    -> tabulation of female by prog  
    
    +--------------------+
    | Key                |
    |--------------------|
    |     frequency      |
    | expected frequency |
    |   row percentage   |
    | column percentage  |
    +--------------------+
    
               |         type of program
        female |   general   academic   vocation |     Total
    -----------+---------------------------------+----------
          male |        21         47         23 |        91 
               |      20.5       47.8       22.8 |      91.0 
               |     23.08      51.65      25.27 |    100.00 
               |     46.67      44.76      46.00 |     45.50 
    -----------+---------------------------------+----------
        female |        24         58         27 |       109 
               |      24.5       57.2       27.2 |     109.0 
               |     22.02      53.21      24.77 |    100.00 
               |     53.33      55.24      54.00 |     54.50 
    -----------+---------------------------------+----------
         Total |        45        105         50 |       200 
               |      45.0      105.0       50.0 |     200.0 
               |     22.50      52.50      25.00 |    100.00 
               |    100.00     100.00     100.00 |    100.00 
    
              Pearson chi2(2) =   0.0528   Pr = 0.974
    
    -> tabulation of ses by prog  
    
    +--------------------+
    | Key                |
    |--------------------|
    |     frequency      |
    | expected frequency |
    |   row percentage   |
    | column percentage  |
    +--------------------+
    
               |         type of program
           ses |   general   academic   vocation |     Total
    -----------+---------------------------------+----------
           low |        16         19         12 |        47 
               |      10.6       24.7       11.8 |      47.0 
               |     34.04      40.43      25.53 |    100.00 
               |     35.56      18.10      24.00 |     23.50 
    -----------+---------------------------------+----------
        middle |        20         44         31 |        95 
               |      21.4       49.9       23.8 |      95.0 
               |     21.05      46.32      32.63 |    100.00 
               |     44.44      41.90      62.00 |     47.50 
    -----------+---------------------------------+----------
          high |         9         42          7 |        58 
               |      13.1       30.4       14.5 |      58.0 
               |     15.52      72.41      12.07 |    100.00 
               |     20.00      40.00      14.00 |     29.00 
    -----------+---------------------------------+----------
         Total |        45        105         50 |       200 
               |      45.0      105.0       50.0 |     200.0 
               |     22.50      52.50      25.00 |    100.00 
               |    100.00     100.00     100.00 |    100.00 
    
              Pearson chi2(4) =  16.6044   Pr = 0.002

    Small Sample Sizes

    It is possible to use Fisher's Exact Test with small samples. Fisher's Exact computes the p-value for the test of independence of the row and column variables. Contrary to the belief of some students, it does not compute an exact value of chi-squared. Here is a small example in Stata using the tabi command.

     men  woman 
     dieting     1  9
     not dieting   11  3

    tabi 1 9 \ 11 3, expected exact
    
    +--------------------+
    | Key                |
    |--------------------|
    |     frequency      |
    | expected frequency |
    +--------------------+
    
               |          col
           row |         1          2 |     Total
    -----------+----------------------+----------
             1 |         1          9 |        10 
               |       5.0        5.0 |      10.0 
    -----------+----------------------+----------
             2 |        11          3 |        14 
               |       7.0        7.0 |      14.0 
    -----------+----------------------+----------
         Total |        12         12 |        24 
               |      12.0       12.0 |      24.0 
    
               Fisher's exact =                 0.003
       1-sided Fisher's exact =                 0.001
    Other Issues

  • Collapsing levels
  • Correction for continuity


    Intro Home Page

    Phil Ender, 9dec05, 22nov00