Linear Models and Effect Magnitudes for Research, Clinical and Practical Applications
Will G Hopkins
Sportscience 14, 49-58, 2010 (sportsci.org/2010/wghlinmod.htm)
Sport and Recreation, AUT University, Auckland 0627, New Zealand. Email. Reviewer: Alan M Batterham, School of Health and Social Care, Teesside University, Middlesbrough TS1 3BA, UK.
Update 4 Jan 2017. I improved the slides on number needed to treat, odds ratio in case-control studies, and relationships between risk, hazard and odds ratios.
Update 6 June 2016. I now advise use the p value for the odds ratio in a logistic regression to get confidence limits for the corresponding proportion ratio. (I checked with simulations that this approach is accurate. Converting the confidence limits for the odds ratio to those for the proportion ratio does not work well, so I removed that advice.) A few other minor edits.
Update 23 Oct 2013. Rank transformation is now dismissed as an approach to non-uniformity, and the analysis of win-draw-lose outcomes in matches is now mentioned as an example for cumulative logistic regression.
Update 16 Sept 2012. There is now a single set of magnitude thresholds for ratios of proportions, hazards and counts. The thresholds are the same as those I proposed previously for counts and rare events. I have removed the shorter version of the slideshow.
Update 26 Sept 2011. Simplification of the introductory slides; coding of nominal predictors with dummy variables; models for controlled trials; improvements to slides dealing with nominal and count dependents, and a shorter version of the slideshow with less detail on nominal and count dependents.
Update 9 Sept 2010. Slide showing residuals vs predicteds for a dependent requiring log transformation. More information on multinomial regression (e.g., for a Likert scale with few items or skewed responses). Other minor improvements.
Update 28 Aug 2010. Odds-ratio thresholds of 1.5, 3.4, 9.0, 32 and 360 now included as an adjunct to proportion-difference thresholds of 10, 30, 50, 70 and 90 percent when modeling and interpreting common time-independent classifications. These odds-ratio thresholds, which I computed directly from the proportion differences centered on 50% (55 vs 45, 65 vs 35, etc.), agree well with a formula devised by Chinn (2000) to convert an odds ratio to a standardized difference in means (ln(odds ratio)/1.81).
After presenting the Magnitude Matters slideshow recently in several workshops, I realized that it needed more on the role played by linear modeling in estimation of effects. The additive nature of the linear model is the basis of adjustment for the effects of other factors to get pure or un-confounded effects and to identify potential mediators or mechanisms of an effect. The additive nature of linear models also explains why we should use the log of the dependent variable to estimate uniform percent or factor effects. A consideration of the error term in a linear model provides further justification for the use of log transformation, along with the use of the unequal-variances t statistic or mixed modeling in analyses where the error term differs between or within subjects. Finally, the analyses for counts and binary dependent variables make little sense without understanding how the underlying linear models require such strange dependent variables as the log of the odds of a classification or the log of the hazard of a time-dependent event. The new slideshow addresses all these issues and more, using material from the recent progressive statistics article (Hopkins et al., 2009) and a book chapter on injury statistics (Hopkins, 2009). The slideshow hopefully represents a useful combination of theory and practical advice for anyone who wants to understand and estimate effects in their research.
For more on the way we infer causality, deal with confounders, and account for mechanisms in the relationships between variables, see the slideshow/article on research designs (Hopkins, 2008). My article and spreadsheets on understanding stats via simulations (Hopkins, 2007a) are useful for learning more about log transformation, straightforward analyses, and inferential statistics. Follow this link to a slideshow that details the various approaches to repeated measures and random effects; I presented it at a conference in 2003, but it is still up to date.
When it comes to actual data analysis, you will need extra help with the practicalities of the use of a spreadsheet or stats package. Peruse the article on comparing two group means and play with the associated spreadsheet to come to terms with simple comparisons of means and adjustment for a covariate (Hopkins, 2007b). The article on the various controlled trials and the associated spreadsheets are a little more advanced and also full of useful material (Hopkins, 2006). See my item on Sad Stats for an overview of some of the stats packages and for a set of files that are useful for SPSS users. If you already have some experience with the SAS package but need specific advice on Proc Mixed, Genmod or Glimmix, contact me.
Chinn S (2000). A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine 19, 3127-3131
Hopkins WG (2006). Spreadsheets for analysis of controlled trials, with adjustment for a subject characteristic. Sportscience 10, 46-50
Hopkins WG (2007a). Understanding statistics by using spreadsheets to generate and analyze samples. Sportscience 11, 23-36
Hopkins WG (2007b). A spreadsheet to compare means in two groups. Sportscience 11, 22-23
Hopkins WG (2008). Research designs: choosing and fine-tuning a design for your study. Sportscience 12, 12-21
Hopkins WG (2009). Statistics in observational studies. In: Verhagen E, van Mechelen W (editors) Methodology in Sports Injury Research. OUP: Oxford. 69-81
Hopkins WG, Marshall SW, Batterham AM, Hanin J (2009). Progressive statistics for studies in sports medicine and exercise science. Medicine and Science in Sports and Exercise 41, 3-12. Link to PDF.
Published July 2010