Introduction to Research Design and Statistics

Testing Frequencies


Chi-Square Distribution

Chi-Square Statistic

  • df = Rows - 1
  • Use Table of Chi-Square

    Hypotheses

  • H0: The distribution of observed frequencies equals the distribution of expected frequencies.
  • H1: The distribution of observed frequencies does not equal the distribution of expected frequencies.

    Assumptions

  • Observations are independent (each subject can appear once and only once in a table)
  • Expected frequencies in each row are at least 15.

    Example 1: Pepsi Challenge

    Test whether cola preference among 220 college students in a simple random sample is equally distributed.

    Each individual tastes each of the three colas. Between tastes subjects eat a soda cracker. Each subject receives the colas in a different order. Each subject then selects which soda he/she likes best.

    Results: Pepsi 85, Coke 57, RC 78.

    Using Stata

    The trick here is to create two variables, o, for the observed frequencies and e, for the expected frequencies. Then using the command chitest written by Nick Cox, which can be downloaded over the Internet (findit chitest then click to install tab_chi) to do the chi-square computation.
    clear
    
    input o
    85
    57
    78
    end
    
    generate e = 220/3
    
    chitest o e
    
    observed frequencies from o; expected frequencies from e
    
             Pearson chi2(2) =   5.7909   Pr =  0.055
    likelihood-ratio chi2(2) =   5.9984   Pr =  0.050
    
      +-------------------------------------------+
      | observed   expected   obs - exp   Pearson |
      |-------------------------------------------|
      |       85     73.333      11.667     1.362 |
      |       57     73.333     -16.333    -1.907 |
      |       78     73.333       4.667     0.545 |
      +-------------------------------------------+

    Example 2: School Absences

    A school district in a large surburban area is trying to determine if student absences are distibuted equally throughout the school week. Choosing a week at random, a researcher surveys all the schools in the district and comes up with the following data: Monday -- 73; Tuesday -- 57; Wednesday -- 48; Thusday -- 59; and Fiday -- 68.

    She wants to test the absences at the 0.1 significance level.

    Using Stata

    Same as pevious example
    clear
    
    input o
    73
    57
    48
    59
    68
    end
    
    sum o
    
        Variable |     Obs        Mean   Std. Dev.       Min        Max
    -------------+-----------------------------------------------------
               o |       5          61    9.77241         48         73
    
    
    generate e = r(sum)/5
    
    chitest o e
    
    observed frequencies from o; expected frequencies from e
    
             Pearson chi2(4) =   6.2623   Pr =  0.180
    likelihood-ratio chi2(4) =   6.3196   Pr =  0.177
    
      +-------------------------------------------+
      | observed   expected   obs - exp   Pearson |
      |-------------------------------------------|
      |       73     61.000      12.000     1.536 |
      |       57     61.000      -4.000    -0.512 |
      |       48     61.000     -13.000    -1.664 |
      |       59     61.000      -2.000    -0.256 |
      |       68     61.000       7.000     0.896 |
      +-------------------------------------------+
    
     chitable 4
    
            Critical Values of Chi-square
     df     .50     .25     .10     .05    .025      .01     .001
      4    3.36    5.39    7.78    9.49   11.14    13.28    18.47

    Example 3: Demographics

    In 1980, a Northern California county was found to be 52% anglo, 28% Hispanic, 12% African-American and 8% Asian

    An SRS of 1220 individuals in 1995 found 585 Anglo, 390 Hispanic, 109 African-American and 136 Asian.

    Have the demographics in the county changed greater then would be expected by chance

    Using Stata

    clear
    
    input o
    585
    390
    109
    136
    end
    
    input e
    634.4
    341.6
    146.4
    97.6
    end
    
    list
    
         +-------------+
         |   o       e |
         |-------------|
      1. | 585   634.4 |
      2. | 390   341.6 |
      3. | 109   146.4 |
      4. | 136    97.6 |
         +-------------+
    
    chitest o e
    
    observed frequencies from o; expected frequencies from e
    
             Pearson chi2(3) =  35.3669   Pr =  0.000
    likelihood-ratio chi2(3) =  34.4401   Pr =  0.000
    
      +-------------------------------------------+
      | observed   expected   obs - exp   Pearson |
      |-------------------------------------------|
      |      585    634.400     -49.400    -1.961 |
      |      390    341.600      48.400     2.619 |
      |      109    146.400     -37.400    -3.091 |
      |      136     97.600      38.400     3.887 |
      +-------------------------------------------+

    More Stata

    chitest will generate equal expected frequencies if you don't include the expected variable.

    Now let's use the hsb2 dataset and the variable race.

    use http://www.philender.com/courses/data/hsb2, clear
    
    chitest race, count  /* down loaded from the Internet */
    
    Chi-square test:
        observed frequencies from race
        expected frequencies equal
    
             Pearson chi2(3) = 242.4400   Pr =  0.000
    likelihood-ratio chi2(3) = 203.5732   Pr =  0.000
    
                                       residuals
          observed    expected     classic     Pearson 
      1.        24      50.000     -26.000      -3.677  
      2.        11      50.000     -39.000      -5.515  
      3.        20      50.000     -30.000      -4.243  
      4.       145      50.000      95.000      13.435 
    


    Intro Home Page

    Phil Ender, 28nov05, 22Nov00