| |

Go to: Next · Previous · Contents · Search · Home |

COMPLEX MODELS: More Than One Independent Variable continued

The figure shows data that lend themselves to fitting a polynomial. As you can see, there is a so-called

Often the different points come from the same subjects, especially when time is the independent variable. You can still fit polynomials to such data, but you have to use repeated-measures models. I deal with repeated-measures polynomials later, but the interpretation of the numbers describing the shape of the curve is the same, and I deal with that here.

**A Simple Polynomial**

**model: numeric <= numeric numeric**^{2} numeric^{3}...

example:
attitude <= experience experience^{2
}Notice the subtle difference from the model
for multiple linear regression on the previous page.
Here the numbers 2, 3... represent **powers** of the **same** variable.
It might be easier to see if I write: Y <= X X^{2} X^{3}... The
stats program fits the polynomial Y = a + bX + cX^{2} + dX^{3}...
to the data. Polynomials are a special case of the more general non-linear
models. Check that page out again right now!

For data that are shaped like a parabola, you probably won't need more than
a quadratic model (Y <= X X^{2}).
If the curve is trends up again at one end, you'll need a cubic model. Curves
with multiple kinks need even higher-order terms. It's rare to go past a quadratic,
though.

When you fit a model like Y <= X X^{2},
the stats program finds the best quadratic curve to fit the data. In other words,
it will find the best values for the **coefficients** (or parameters) a,
b and c in the equation Y = a + bX + cX^{2}. The value of a represents the overall position of the
curve up and down the Y axis; for example, an increase of 1 unit in a shifts
the whole curve up the Y axis by 1 unit. The value of b represents the amount
of overall upward or downward linear (straight-line) trend in the values of
Y as you move along the X axis; in other words, if you draw a straight line
to fit all the points well, b is the slope of the line, which is the same thing
as the increase (or decrease, if b is negative) in Y for each 1-unit increase
in X. For the data in the figure, b would represent the change in attitude per
year of experience. The value of c represents the amount of curvature in the
data; in the present example, c would be negative, because the parabola is upside
down. I find it easier to interpret c visually if I transform the X values so
they range from -1 to +1. If I then fit a curve with this new independent variable,
the value of c that I get is about the amount that the values of Y sit above
(or fall below, if c is negative) a straight line at either end of the X range.

Remember that you can derive these coefficients or parameters as raw values, as percents, and as normalized regression coefficients, just like the slope in a simple linear regression. Make sure you interpret their magnitudes and their confidence limits!

**Caution! **The linear term in a quadratic polynomial represents the overall
effect as you go from low to high values of the independent variable. The quadratic
term doesn't impact this overall effect--in fact, including the quadratic when
there is curvature in the trend will make the estimate of the linear term more
precise. But if you include a *cubic* term in the polynomial, the cubic
also contributes to the overall effect of going from low to high values of the
independent variable. **This extra contribution of the cubic makes it impossible
to interpret the linear term as representing the difference between low and
high values of the independent variable. **This problem is particularly important
when you are using polynomial contrasts in a
repeated-measures analysis, where the independent variable is time or trial
number. The easiest way to avoid the problem is to avoid including a cubic or
quintic in the polynomial. If you do include these higher order terms, and you
want an estimate of the difference between the effect of low and high values
of the independent variable (e.g., first test vs last test), you will have to
derive an estimate for the high minus the low values.

Don't forget that you can assess the contribution of each term of the polynomial
to the variance explained (R^{2}) by the model. If your stats program doesn't give you
the R^{2} for each term, find the total sum of
squares and the sums of squares for each effect in the output, then calculate
the R^{2} for the quadratic term by dividing
its sum of squares by the total sum of squares, multiplied by 100 to convert
it to a percent. Phew! Interpret the R^{2} by
taking its square root and working out the confidence limits of the resulting
correlation, as described earlier.

**A Polynomial With a Nominal Effect
**
The next figure shows an extension of
the above model to test for differences between two sports. Let's build up the
model term by term. We'll need sport as a main effect, to see how much overall
difference there is in the mean attitude for the two sports:

attitude <= sport

The main trend with experience is linear, and we want to know about the differences in the slopes, so we need a full ANCOVA model:

attitude <= sport experience sport*experience

And finally, there is curvature for at least one sport, so we need to fit a quadratic term overall, and a quadratic term that might differ between the two sports. The way to do that is to include the quadratic term as a main effect and as an interaction with sport. So here's the full model:

attitude <= sport experience sport*experience experience^{2} sport*experience^{2}

The p value for sport*experience^{2} tells you whether any difference in the
curvature for the two sports is statistically significant. Once again
you express this difference as a contribution to the overall
R^{2} for the model, as described for
the simpler example above.

Go to: Next · Previous · Contents · Search · Home

webmaster=AT=newstats.org

Last updated 10 Dec 00