Item Details

Adjusting Regression Models for Overfitting in Second Language Research

Issue: Vol 5 No. 1-2 (2018)

Journal: Journal of Research Design and Statistics in Linguistics and Communication Science

Subject Areas: Linguistics

DOI: 10.1558/jrds.38374

Abstract:

Regression modeling is an increasingly important quantitative tool for second language (L2) research. While superior in many ways to more traditional methods, such as ANOVA, regression modeling, like all procedures, still has limitations, ranging from small sample sizes to a lack of screening for outliers and influential data points (Plonsky and Ghanbar, 2018). Since these limitations are common features in L2 research, this raises concerns that existing studies using regression may overfit the data, perhaps inflating effect size estimates. These issues can be partially alleviated via robust statistics, such as validation. This paper provides L2 researchers with an overview of these issues and an instructive look at one robust validation method: bootstrapping.

Author: Phillip Hamrick

View Full Text

References :

Abrahamsson, N. and Hyltenstam, K. (2009). Age of onset and nativelikeness in a second language: Listener perception versus linguistic scrutiny. Language Learning 59 (2), 249-306. https://doi.org/10.1111/j.1467-9922.2009.00507.x
https://doi.org/10.1111/j.1467-9922.2009.00507.x

 

DeKeyser, R. M. (2000). The robustness of critical period effects in second language acquisition. Studies in Second Language Acquisition 22 (4), 499-533.


 

Egbert, J. and Plonsky, L. (in press). Bootstrapping techniques. In S. T. Gries and M. Paquot (Eds), A Practical Handbook of Corpus Lingusitics. New York: Springer.


 

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. New York: Springer-Verlag. Retrieved from //www.springer.com/us/book/9781461471370
https://doi.org/10.1007/978-1-4614-7138-7


 

Laflair, G. T., Egbert, J., and Plonsky, L. (2015). A practical guide to bootstrapping descriptive statistics, correlations, t tests, and ANOVAs. In L. Plonsky (Ed.), Advancing Quantitative Methods in Second Language Research, 46-77. New York: Routledge.
https://doi.org/10.4324/9781315870908-4


 

Larson-Hall, J. and Herrington, R. (2010). Improving data analysis in second language acquisition by utilizing modern developments in applied statistics. Applied Linguistics 31 (3), 368-390. https://doi.org/10.1093/applin/amp038
https://doi.org/10.1093/applin/amp038


 

Nikitina, L. and Furuoka, F. (2018). Expanding the methodological arsenal of applied linguistics with a Robust Statistical Procedure. Applied Linguistics. https://doi.org/10.1093/applin/amx026
https://doi.org/10.1093/applin/amx026


 

Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting practices in Quantitative L2 Research. Studies in Second Language Acquisition 35 (4), 655-687. https://doi.org/10.1017/S0272263113000399
https://doi.org/10.1017/S0272263113000399


 

Plonsky, L. (2014). Study quality in quantitative L2 research (1990-2010): A methodological synthesis and call for reform. The Modern Language Journal 98 (1), 450-470. https://doi.org/10.1111/j.1540-4781.2014.12058.x
https://doi.org/10.1111/j.1540-4781.2014.12058.x


 

Plonsky, L. (2015). Statistical power, p values, descriptive statistics, and effect sizes: A 'back-to-basics' approach to advancing quantitative methods in L2 research. In L. Plonsky, Advancing Quantitative Methods in Second Language Research, 23-45). New York: Routledge.
https://doi.org/10.4324/9781315870908-3


 

Plonsky, L., Egbert, J., and Laflair, G. T. (2015). Bootstrapping in applied linguistics: Assessing its potential using shared data. Applied Linguistics 36 (5), 591-610. https://doi.org/10.1093/applin/amu001
https://doi.org/10.1093/applin/amu001


 

Plonsky, L. and Ghanbar, H. (2018). Multiple regression in L2 research: A methodological synthesis and guide to interpreting R2 values. The Modern Language Journal 102 (4), 713-731. https://doi.org/10.1111/modl.12509
https://doi.org/10.1111/modl.12509


 

Plonsky, L. and Oswald, F. L. (2017). Multiple regression as a flexible alternative to ANOVA in L2 research. Studies in Second Language Acquisition 39 (3), 579-592. https://doi.org/10.1017/S0272263116000231
https://doi.org/10.1017/S0272263116000231


 

R Core Team. (2018). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.