A New View of Statistics Go to: Next · Previous · Contents · Search · Home
Generalizing to a Population:
SIMPLE MODELS AND TESTS continued

Contingency Table (Chi-Squared Test)

model: nominal <= nominal
example: sport <= sex

What effect does a kid's sex have on the kind of sport s/he likes? That's the sort of question we address with this model, as shown in the example of the sport preferences of a sample of boys and girls.

The word contingency in the name of the model refers, I guess, to the relationship between the two variables. Table speaks for itself. The test for whether there is any relationship at all is known as the chi-squared test, from the test statistic, chi squared (c2: this will come up as c2 if your browser doesn't show symbols). It's pretty obvious that there's a strong relationship in the example. Whether the relationship is significant would depend on the number of boys and girls.

We don't normally think about parameters for this model, but they would be the probabilities of opting for each sport, for each sex. Goodness of fit is also not usually calculated, but various analogs of the correlation coefficient (e.g. the kappa coefficient) make their appearance occasionally. Those outcome measures that we have already met, the relative risk and odds ratio, make sense only for 2 x 2 tables or for comparing 2 x 2 cells in a bigger table. Most stats programs can calculate the confidence intervals for these outcome measures

When you have more than two rows or columns in the table (e.g. the three sports above), the chi-squared test tells you whether there is any relationship, but it doesn't tell you where the differences are. Now, just as we can do pairwise tests for the different levels of a grouping variable in an ANOVA, we can in principle test for differences between frequencies of males and females in pairs of sports, or between one sport and the rest, or whatever. In the above example, it's clear that the "other" category does not differ between sexes (which is actually a comparison of "other" with basketball and football combined, if you think about it), whereas every other pairwise comparison looks like it could be different. The funny thing is, there is no tradition for doing such pairwise tests in a contingency table, or for controlling the type I error, not that I know of anyway. All that people do is state whether there is an effect overall or not, then eyeball the frequencies in the table and comment on where the biggest differences are. Strange...

Go to: Next · Previous · Contents · Search · Home