A New View of Statistics

© 2000 Will G Hopkins

Go to: Next · Previous · Contents · Search · Home

Generalizing to a Population:

 Ordinal Dependent Variables
Outcome variables with only a few possible values, such as 1, 2 or 3, need special treatment. Variables like this are called ordinal, because they indicate an ordering of responses. They crop up often in questionnaires, where people have to tick one response from a choice like less, the same, or more. The choices make up a so-called Likert scale. We use integers to number and record the responses, but the responses aren't integers. All the integers do is indicate order in the levels of the response. That's why ordinal variables are neither numeric nor nominal.

If we treat the ordinal variable as nominal, we lose the information about the ordering. But if we try to treat it as a numeric variable, we might violate one or more of the assumptions we make when we calculate confidence limits or p values. I used to think such violations were a frequent problem, but it turns out that they are rare. Most of the time you can use t tests for comparisons of groups, and if you are fitting lines or curves, you can use rank transformation to get rid of non-uniform residuals.

As I pointed out earlier, the rare situations occur when the responses of a Likert-type variable are almost all stacked up on the top or bottom level of the scale. Rank transformation doesn't work in these situations either, because the rank-transformed variable has the same problems as the raw variable. In such situations, the only way forward is to model the probabilities of responses being at each level. It's called logistic regression. You've met this before as categorical modeling. The only difference is that logistic regression takes into account the fact that the different levels of the outcome are ordered, whereas plain old categorical modeling treats the outcome as a nominal variable, without any implied order in the levels of the variable.

Logistic regression gives you a way out when you have a variable, like habitual intense physical activity, with almost everyone on zero and a smear of values for the rest. Split the values of your outcome variable into a number of ordered levels (the first being zero, of course), then do a logistic regression on those levels. You are actually transforming the ugly continuous variable into a more manageable Likert-scale (ordinal) variable.

Go to: Next · Previous · Contents · Search · Home
Last updated 17 Dec 00