The New Statistics for applied linguistics
The New Statistics is an approach to scholarly research which offers an alternative to the problematic overreliance on
significance testing currently plaguing the research literature. This paper describes the problems associated with significance testing and
introduces the key concepts of the data-analysis that best fits with the goals of the New Statistics: estimation of effect sizes and
confidence intervals. These concepts will be applied in a reanalysis of the summary data from an article that was recently published in this
journal. This makes it possible to compare the estimation approach advocated by the New Statistics to the standard significance tests and to
discuss potential drawbacks of this approach as a means of gathering quantitative evidence in support of our substantive hypotheses.
Article outline
- 1.Introduction
- 2.Does it feel non-native?
- 3.The New Statistics
- 3.1An estimate of the population effect size and the confidence interval
- 4.Interpreting confidence intervals
- 5.Conclusion
- Acknowledgements
- Notes
-
References
References (52)
References
American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Amrhein, V., Greenland, S., & McShane, B. (2019). Retire statistical significance. Nature, 5671, 305–307. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 661, 423–437. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Berger, O. J., & Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of P Values and evidence. Journal of the American Statistical Association, 821, 112–122.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Berkson, J. (1942). Tests of significance considered as evidence. Journal of the American Statistical Association, 371, 325–335. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Calin-Jageman, R. J., & Cumming, G. (2019a). The New Statistics for better science: Ask how much, how uncertain, and what else is known. The American Statistician, 701, 271–280. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Calin-Jageman, R. J., & Cumming, G. (2019b). Estimation for better inference in neuroscience. ENeuro, 61, 1–11. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Carver, R. P. (1978). The case against significance testing. Harvard Educational Review, 481, 378–399. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Chambers, C. (2018). The seven deadly sins of psychology. A manifesto for reforming the culture of scientific practice. Princeton/Oxford: Princeton University Press.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd edition). New York, NY: Academic Press.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 491, 997–1003. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cumming, G. (2012). Understanding the New Statistics. Effect sizes, confidence intervals, and meta-analysis. New York/London: Routledge.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cumming, G. (2014). The New Statistics: Why and how. Psychological Science, 251, 7–29. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cumming, G. & Calin-Jageman, R. J. (2017). Introduction to the New Statistics. Estimation, open science, & beyond. New York/London: Routledge.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Falk, R., & Greenbaum, C. W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology, 51, 75–98. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Field, A. (2015). Discovering statistics using IBM SPSS Statistics (4th ed.). London: Sage.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 331, 587–606. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gigerenzer, G., & Marewski, J. N. (2015). Surrogate science: The idol of a universal method for scientific inference. Journal of Management, 411, 421–440. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Kruger, L. (1989). The Empire of Chance. Cambridge: Cambridge University Press. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hacking, I. (1965). Logic of statistical inference. Cambridge: Cambridge University Press. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Haller, H., & Krauss, S. (2002). Misinterpretations of significance. A problem students share with their teachers? Methods of Psychological Research Online, 71, [URL]
Hubbard, R. (2004). Alfabet soup: Blurring the distinctions between p’s and α’s in psychological research. Theory &Psychology, 141, 295–327. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hubbard, R., & Lindsay, R. M. (2008). Why P values are not a useful measure of evidence in statistical significance testing. Theory & Psychology, 181, 69–88. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kelley, K. (2007). Methods for the behavioral, educational, and social sciences: An R package. Behavior Research Methods, 391, 979–384. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kline, R. B. (2013). Beyond significance testing. Statistics reform in the behavioral sciences. Washington, DC: American Psychological Association. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kruschke, J. K. (2015). Doing Bayesian data analysis. A tutorial with R, Jags, and Stan (2nd ed.). London: Academic Press.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kruschke, J. K., & Liddell, T. M. (2018). The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin Review, 251, 178–206. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lambdin, C. (2012). Significance tests as sorcery: Science is empirical – significance tests are not. Theory & Psychology, 221, 67–90. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lindley, D. V. (2000). The philosophy of statistics. The Statistician, 491, 293–337.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ly, A., Raj, A., Etz, A., Marsman, M., Gronau, Q. F., & Wagenmakers, E. J. (2018). Bayesian reanalyses from summary statistics: A guide for academic consumers. Advances in Methods and Practices in Psychological Science, 11, 367–374. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing experiments and analyzing data. A model comparison perspective (3th ed.). New York, NY: Routledge. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon statistical significance. The American Statistician, 731, 235–245. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 461, 806–834. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Meehl, P. E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 393–425). Mahwah, NJ: Erlbaum.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E. J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin Review, 231, 103–123. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mulder, G. (2016). De kwaliteit van onderzoek. Dichotoom denken versus meta-analytisch denken. Tijdschrift voor Taalbeheersing, 381, 163–173. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mulder, G. (2019). Een significant probleem. Tijdschrift voor Taalbeheersing, 411, 203–213. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese, 361, 97–131. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 51, 241–301. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Norouzian, R., De Miranda, M., & Plonksy, L. (2018). The Bayesian revolution in second language research: An applied approach. Language Learning, 681, 1032–1075. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Oakes, M. (1986). Statistical significance. New York, NY: Wiley.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Perezgonzalez, J. D. (2015). Fisher, Neyman-Pearson, or NHST? A tutorial for teaching data testing. Frontiers in Psychology, 61, 1–11. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Polya, G. (1954). Mathematics and plausible reasoning, V1–2. Induction and analogy in mathematics, patterns of plausible inference. Princeton, NJ: Princeton University Press. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Roozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin, 571, 416–428. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research. A correlational approach. Cambridge, UK: Cambridge University Press.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rouder, J. N., Speckman, P. L., Dongchu, S., Morey, R. D., & Iversen, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin and Review, 161, 225–237. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Schmidt, F. L. (1996). Statistical significance and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods, 11, 115–129. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wasserstein, R. L., & Lazar, N. (2016). The ASA’s statement on P-values: Context, process, and purpose. The American Statistician, 701, 129–133. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p < .05”. The American Statistician, 731, 1–19. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wiens, S., & Nilsson, M. E. (2017). Performing contrast analysis in factorial designs: From NHST to confidence intervals and beyond. Educational and Psychological Measurement, 771, 690–715. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Zilliak, S. T., & McClosky, D. N. (2008). The cult of statistical significance. How the standard error costs us jobs, justice, and lives. Ann Arbor, MN: The University of Michigan Press.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cited by (3)
Cited by three other publications
Müller, Marcus
2024.
Einsam oder gemeinsam?.
Zeitschrift für Literaturwissenschaft und Linguistik 54:2
► pp. 151 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
Sönning, Lukas & Valentin Werner
2021.
The replication crisis, scientific revolutions, and linguistics.
Linguistics 59:5
► pp. 1179 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
van Tessel, Evi & Marco Bril
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.