A New View of Statistics
THE FLY: MISCELLANEOUS
On this last page devoted to sample size on the fly, I explain how to use it for any design and any outcome statistic. I then suggest what to say to the ethical committee when you apply for approval. I also warn you not to use statistical significance for sampling on the fly.
THE FLY FOR OTHER DESIGNS
Whatever the design and whatever the outcome statistic, if your stats program can produce a confidence interval for the outcome statistic, you can sample on the fly. Here is the procedure. First I explain how to do it for outcome statistics whose confidence interval has a width proportional to the square root of the sample size.
If the confidence interval of your outcome statistic is not inversely proportional to the square root of the sample size, replace Step 9 with the following elegant procedure (which allows you to work out the relationship between sample size and the width of the confidence interval):
ON THE FLY FOR THE ETHICAL COMMITTEE
You need to convince the ethical committee that you have the resources to go to the usual large number of subjects, if the effect turns out to be small. So you will have to provide an estimate of the worst-case sample size. You'll have to justify it using my approach with confidence intervals (which requires half the usual number), because you can't let statistical significance get anywhere near sample size on the fly. The two do not mix, as we'll see shortly.
To do a cross-sectional study properly, you must have the resources to test hundreds of subjects, if necessary. Don't forget to take into account known or guessed validities, which could push the number up by a factor of two or three.
For a longitudinal study, reliability is crucial for calculating
how many subjects you might need. If you don't know or can't guess
the reliability, you have to tell the committee that you simply don't
know how many subjects you might end up with. So tell them that
testing 10 or so subjects per group will be enough to detect large
effects if the reliability is almost perfect, and it will give you
enough data to estimate roughly the final sample size otherwise.
Indicate the total number you will be able to test, and admit that
this number may not be enough if the reliability turns out to be low.
You will end up with a confidence interval that is wider than
optimum, but the result may still be publishable. There's nothing you
can do about it, and there's no ethical justification for your
application to be refused, if you've got everything else right. After
all, if no-one knows the reliability, someone has to start testing to
find out how many subjects are needed. And it makes sense to do it
during the experiment itself rather than to waste resources on a
reliability study. But if you already have data from a reliability
study, point out that uncertainty in the reliability makes a big
difference to the estimate of the worst-case final sample size, so
you might still be wrong with your estimate.
DO NOT FLY WITH STATISTICAL SIGNIFICANCE
If statistical significance is your goal, you would presumably start with a sample big enough to give statistical significance for large effects. For example, you might start searching for a correlation of 0.6, which you would want to find statistically significant (p<0.05) 80% of the time. From the formulae, the number of subjects is 13, so let's say you start with this number. If you get statistical significance, you stop. If not, you test more subjects.
Seems OK, but there are two things wrong. If the correlation does turn out to be statistically significant on the first go, it has such a wide confidence interval that the correlation in the population is likely to be anything from practically perfect down to trivial. In other words, there's an effect, yes, but you end up with little idea of how big it is.
The other problem is more serious: bias! With a true correlation of 0.6, a starting sample size of 13, and up to three rounds of extra sampling, the sample correlation ends up at 0.65 on average. For a true correlation of 0.40, the sample correlation averages 0.50. This amount of bias is unacceptable. Starting with a bigger sample helps, but as long as you make stopping contingent upon statistical significance, you will have substantial bias for most values of correlation. For example, a true correlation of 0.20 and a starting sample of 45 produce a correlation of 0.25 on average in the final sample. You could start with hundreds of subjects, I suppose, but by then you'd have defeated the purpose of sample sizing on the fly!
I wonder if sampling on the fly using statistical significance is a widespread practice, without people realizing it. By people I mean everyone, including the experimenters themselves. It's all too easy to start a study with a small sample, stop if you get statistical significance, or do a few more subjects to bring a promising p value below the 0.05 threshold!
A FINAL WARNING. Opting for
sample size on the fly, then sky diving as soon as you get
statistical significance, is forbidden. If your paper comes to me for
review, I will reject it on the grounds that the result is biased and
that the confidence interval is too wide.