A New View of Statistics

© 2000 Will G Hopkins

Go to: Next · Previous · Contents · Search · Home

Summarizing Data:

The standard deviation (SD) represents variation in the values of a variable, whereas the standard error of the mean (SEM) represents the spread that the mean of a sample of the values would have if you kept taking samples. So the SEM gives you an idea of the accuracy of the mean, and the SD gives you an idea of the variability of single observations. The two are related: SEM = SD/(square root of sample size).

Some people think you should show SEMs with means, because they think it's important to indicate how accurate the estimate of the mean is. And when you compare two means, they argue that showing the SEMs gives you an idea of whether there is a statistically significant difference between the means. All very well, but here's why they're heading down the wrong track:

  • For descriptive statistics of your subjects, you need the SD to give the reader an idea of the spread between subjects. Showing an SEM with the mean is silly.
  • When you compare group means, showing SDs conveys an idea of the magnitude of the difference between the means, because you can see how big the difference is relative to the SDs. In other words, you can see how big the effect size is.
  • It's important to visualize the SDs when there are several groups, because if the SDs differ too much, you may have to use log transformation or rank transformation before you compute confidence limits or p values. If the number of subjects differs between groups, the SEMs won't give you a direct visual impression of whether the SDs differ.
  • If you think it's important to indicate statistical significance, show p values or confidence limits of the outcome statistic That's more accurate than showing SEMs. Besides, does anyone know how much SEMs have to overlap or not overlap before you can say the difference is significant? And does anyone know that the amount of overlap or non-overlap depends on the relative sample sizes?
  • Most importantly, when you have means for pre and post scores in a repeated-measures experiment, the SEMs of these means do NOT give an impression of statistical significance of the change--a subtle point that challenges many statisticians. So if the SEMs don't show statistical significance in experiments, what's the point of having them anywhere else?

    Here's a figure to illustrate why SEMs don't convey statistical significance. It's for imaginary data in an experiment to increase jump height. The change in height is significant (p=0.03) when the measurement of jump height has high reliability, but not significant (p=0.2) when the reliability is low. But the SEMs are the same in both cases:
  • The SEMs of the post-pre change scores in a treatment and control group would indicate statistical significance. But if you show the change scores, you should show the confidence interval for the change, not the SEM. You should also show the SD of the change scores for the treatment and control groups, because a substantial increase in the SD of the change scores in a treatment group relative to a control group indicates individual responses to the treatment. SEMs of the change scores would alert you to the possibility of individual responses only if the sample size was the same in both groups.

So when you see SEMs in a publication, smile, then mentally convert them into SDs to see how big the differences are between the groups. For example, if there are 25 subjects in a group, increase the size of the SEM by a factor of 5 (= square root of 25) to turn it into an SD.

The bottom line: never show SEMs. Never. Trust me.

Here endeth precision of measurement and summarizing data. On the next page we start generalizing to a population.

Go to: Next · Previous · Contents · Search · Home
Last updated 25 June 03