RESEARCHING WORTHWHILE PERFORMANCE ENHANCEMENTS
Will G Hopkinsa PhD, John A Hawleyb PhD, Louise M Burkec PhD
aDepartment of Physiology, University of Otago, Dunedin 9001, New Zealand; bDepartment of Human Biology & Movement Science, RMIT University, Bundoora 3083, Australia; cDepartment of Sports Nutrition, Australian Institute of Sport, Belconnen 2616, Australia.
Sportscience 3(1), sportsci.org/jour/9901/wghnews.html, 1999 (1101 words)
Reviewed by Stephen Seiler PhD, Institute for Sport, Agder College, 4604 Kristiansand, Norway
What magnitude of performance enhancement makes a difference to anelite athlete's chance of winning the gold? What is the best way forsport scientists to study training, ergogenic aids, or othertreatments that produce enhancements of this magnitude? What is thebest way to present the findings for non-academics and academics tounderstand? We have attempted to answer these important questions ina recently published paper (Hopkins et al., 1999).The paper grew out of a mini symposium we presented at the annualmeeting of the American College of Sports Medicine in Orlando lastyear. Here is a plain-language account of some of the mainpoints.
We first tackled the problem of the smallest worthwhileperformance enhancement by considering an event where a few topequally matched athletes vie for first place (Figure 1). If theathletes re-run the event a large number of times, the normal randomvariation in the individual athletes' performance between events willensure that each athlete gets an equal share of wins. But if one ofthe athletes gets an enhancement, obviously s/he will win more often.The magnitude of the enhancement has to be about as big as the normalvariation in the athlete's performance between events to make adifference: much smaller and the athlete won't perform anydifferently; much larger and s/he will always win. In fact, when wesimulated many events in a computer, we showed that an enhancement ofabout half the size of the normal variation in performance caused areal effect on the chance of winning. Even smaller enhancements wouldstill make a difference to the medal tally of a country like theUS.
Enhancements of this magnitude are small. To put them intoperspective, the normal variation for track runners in the top halfof the field at international competitions may be as low as ~0.6%(WGH, unpublished observations). That means an enhancement of about0.3% would make a difference to one of these athletes. In the bestlab tests with the best athletes researchers can get, variation inperformance between tests is typically 2-3%, and seldom better than1.5%. We show in the paper that researchers would need to testhundreds or even thousands of athletes to measure an enhancement of0.3% with adequate precision. For example, if you observed anenhancement of 0.3%, you would want to be able to say that the truevalue of the enhancement is most likely to fall between 0.0% and0.6%. (These two values are the so-called 95% confidence limits; inour paper we explain why they need to be about ±0.3% when thesmallest worthwhile enhancement is 0.3%.) Now suppose that theresearcher used a reasonably good performance test, one for which thesubjects had a typical variation in performance of 2.0% betweentests. The resulting sample size would be 350 for a crossover studyor 1400 for a study with a separate control group. The usual samplesize in studies of performance enhancement is 10! If the researcherobserved an enhancement of 0.3% with the same test in a crossoverstudy of 10 subjects, the true value of the enhancement could beanything between 2.3% and -1.7%--in other words, a massive positiveor a massive negative effect on performance for a top athlete.
Many researchers are unaware of the need for large sample sizeswhen they investigate small changes in performance. Furthermore, theyreport results using the concept of statistical significance andso-called p values, which few scientists and no lay people understandproperly. When they study a treatment that has only a small (butworthwhile) effect on performance, the small sample size almostinvariably produces a result that is not statistically significant (p> 0.05). In some studies with particularly small sample sizes orparticularly unreliable tests, even large effects can turn up as notsignificant. Regardless of the magnitude of the effect, someresearchers conclude incorrectly that a non-significant result meansthe treatment is ineffective. The way to overcome this confusion isto publish the observed change in performance and the likely range ofthe true value of the change (the 95% confidence limits). Theresearcher should then use plain language to explain the magnitude ofthe observed change and of the upper and lower limits of the likelyrange, as in the above example (see also Hopkins,1999: InterpretingEffects). In this way there can be no confusion about thepossible magnitude of the enhancement. Statistical significance, orlack of it, need not be mentioned.
In our paper we discuss other aspects of the design of experimentsaimed at measuring performance enhancement, including new ways toassess the reliability and validity of tests, the need to recruit thebest possible athletes for a study, the need to mimic conditions ofreal training and real events in a study, the impact and measurementof individual differences in enhancement, and the impact andmeasurement of placebo effects in unblinded studies. Time and spacedo not permit us to explain these aspects here. Interested readerscan read the full article in the March issue of Medicine and Sciencein Sports and Exercise. We welcome feedback, but please do notrequest reprints from us--we have not ordered any.
Hopkins WG (1999).How to write a literature review. Sportscience 3, sportsci.org/jour/9901/wghreview.html(2618 words).
Hopkins WG, Hawley JA, Burke LM(1999). Design and analysis of research on sport performanceenhancement. Medicine and Science in Sports and Exercise 31,472-485