Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of WriteToLearn
Issue: Vol 33 No. 1 (2016) Automated Writing Evaluation
Journal: CALICO Journal
Subject Areas:
Abstract:
This study investigated the application of WriteToLearn on Chinese undergraduate English majors’ essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was marked by four human raters as well as WriteToLearn. Many-facet Rasch measurement (MFRM) was conducted to calibrate WriteToLearn’s rating performance in scoring the whole set of essays against those of four trained human raters. The accuracy of WriteToLearn’s feedback on 60 randomly selected essays was compared with the feedback provided by human raters. The two main findings related to scoring were that WriteToLearn was more consistent but highly stringent relative to the four trained human raters in scoring essays and that it failed to score 7 essays. In terms of error feedback, WriteToLearn had an overall precision and recall of 49% and 18.7% respectively. These figures did not meet the minimum threshold of 90% precision for it to be a reliable error detecting tool set by Burstein, Chodorow, and Leacock (2003). Furthermore, it had difficulty in identifying the errors made by Chinese undergraduate English majors in the use of articles, prepositions, word choice and expression.
Author: Sha Liu, Antony John Kunnan
References :
Aryadoust, V., & Liu, S. (2015). Predicting EFL writing ability from levels of mental representation measured by Coh-Metrix: A structural equation modeling study. Assessing Writing, 24, 35–58. http://dx.doi.org/10.1016/j.asw.2015.03.001
Attali, Y. (2013). Validity and reliability of automated essay scoring. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 181–199. New York, NY: Routledge.
Attali, Y., & Burstein, J. (2005). Automated essay scoring with e-rater® V.2.0 (ETS research report number RR-04-45). Retrieved from http://www.ets.org/Media/Research/pdf/RR-04-45.pdf
Bridgeman, B. (2013). Human ratings and automated essay evaluation. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 221–232. New York: Routledge.
Burstein, J. (2003). The e-rater® scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. Burstein (Eds), Automated essay scoring: A cross-disciplinary perspective, 113–121. Mahwah, NJ: Lawrence Erlbaum Associates.
Burstein, J., Chodorow, M., & Leacock, C. (2003). Criterion online essay evaluation: An application for automated evaluation of student essays. Proceedings of the Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco: Mexico.
Burstein, J., Kukich, K., Wolff, S., Lu, C., Chodorow, M., Braden-Harder, L., & Harris, M. D. (1998, August). Automated scoring using a hybrid feature identification technique. Proceedings of the Annual Meeting of the Association of Computational Linguistics, Montreal. Retrieved from http://www.ets.org/Media/Research/pdf/erater_acl98.pdf
Chen, C. F., & Cheng, W. Y. (2008). Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in EFL writing classes. Language Learning & Technology, 12 (2), 94–112.
Dikli, S., & Bleyle, S. (2014). Automated Essay Scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1–17. http://dx.doi.org/10.1016/j.asw.2014.03.006
Ferris, D. R., Liu, H., Sinha, A., & Senna, M. (2013). Written corrective feedback for individual L2 Writers. Journal of Second Language Writing, 22, 307–329. http://dx.doi.org/10.1016/j.jslw.2012.09.009
Foltz, P. W., Laham, D., & Landauer, T. K. (1999). The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Educational Journal of Computer-Enhanced Learning, 1 (2). Retrieved from http://imej.wfu.edu/articles/1999/2/04/printver.asp
Foltz, P. W., Lochbaum, K. E., & Rosenstein, M. R. (2011, April). Analysis of student ELA writing performance for a large scale implementation of formative assessment. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, Louisiana.
Foltz, P. W., Streeter, L. A., Lochbaum, K. E., & Landauer, T. (2013). Implementation and Application of the Intelligent Essay Assessor. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 66–88. New York, NY: Routledge.
Galleta, D. F., Durcikova, A., Everard, A., & Jones, B. (2005). Does spell-checking software need a warning label? Communication of the ACM, 48 (7), 82–85. http://dx.doi.org/10.1145/1070838.1070841
Han, N., Chodorow, M., & Leacock, C. (2006). Detecting errors in English articles usage by non-native speakers. Natural Language Engineering, 12 (2): 115–129. http://dx.doi.org/10.1017/S1351324906004190
Hoang, G. (2011). Validating My Access as an automated writing instructional tool for English language learners (Unpublished Master's thesis). California State University, Los Angeles.
Koskey, K., & Shermis, M. D. (2013). Scaling and norming for automated essay scoring. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 200–220. New York: Routledge.
Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automatic essay assessment. Assessment in Education, 10, 295–308. http://dx.doi.org/10.1080/0969594032000148154
Leacock, C., Chodorow, M., Gamon, M., & Tetreault, J. (2010). Automated grammatical error detection for language learners. Synthesis Lectures on Human Language Technologies, 3, 1–34. http://dx.doi.org/10.2200/S00275ED1V01Y201006HLT009
Li, J., Link, S., & Hegelheimer, V. (2015). Rethinking the role of automated writing evaluation (AWE) feedback in ESL writing instruction. Journal of Second Language Writing, 27, 1–18. http://dx.doi.org/10.1016/j.jslw.2014.10.004
Li, Z., Link, S., Ma, H., Yang, H., & Hegelheimer, V. (2014). The role of automated writing evaluation holistic scores in the ESL classroom. System, 44, 66–78. http://dx.doi.org/10.1016/j.system.2014.02.007
Linacre, J. M. (2013a). A user guide to Facets, Rasch-model computer programs. Chicago, IL: Winsteps.com.
Linacre, J. M. (2013b). Facets Rasch measurement [computer program]. Chicago, IL: Winsteps.com.
McGee, T. (2006). Taking a spin on the Intelligent Essay Assessor. In P. F. Ericsson & R. H. Haswell (Eds), Machine scoring of student essays: Truth and consequences, 79–92. Logan, UT: Utah State University Press.
McNamara, D. S., Crossley, S. A., Roscoe, R. D., Allen, L. K., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35–39. http://dx.doi.org/10.1016/j.asw.2014.09.002
Pearson Education Inc. (2010). Intelligent Essay Assessor (IEA) fact sheet. Retrieved from http://kt.pearsonassessments.com/download/IEA-FactSheet-20100401.pdf
Perelman, L. (2014). When ‘the state of the art’ is counting words. Assessing Writing, 21, 104–111. http://dx.doi.org/10.1016/j.asw.2014.05.001
Powers, D. E. (2000). Computing reader agreement for the GRE Writing Assessment (ETS research memorandum, RM-00-08). Princeton, NJ: Educational Testing Service.
Powers, D. E., Burstein, J., C., Chodorow, M., Fowels, M. E., & Kukich, K. (2002). Stumping e-rater: Challenging the validity of automated essay scoring. Computers in Human Behavior, 18 (2), 103–134. http://dx.doi.org/10.1016/S0747-5632(01)00052-8
Shermis, M. D. (2014). State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration, Assessing Writing, 20, 53–76. http://dx.doi.org/10.1016/j.asw.2013.04.001
Shermis, M. D., & Burstein, J. C. (2003). Introduction. In M. D. Shermis & J. Burstein (Eds), Automated essay scoring: A cross-disciplinary perspective, xiii–xvi. Mahwah, NJ: Lawrence Erlbaum Associates.
Stevenson, M., & Phakiti, A. (2014). The effects of computer-generated feedback on the quality of writing. Assessing Writing, 19, 51–65. http://dx.doi.org/10.1016/j.asw.2013.11.007
Tetreault, J., & Chodorow, M. (2008a, August). The ups and downs of preposition errors detection in ESL writing. Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, UK. http://dx.doi.org/10.3115/1599081.1599190
Tetreault, J., & Chodorow, M. (2008b, August). Native judgments of non-native usage: Experiments in preposition error detection. Proceedings of the Workshop on Human Judgments in Computational Linguistics at the 22nd International Conference on Computational Linguistics (COLING), Manchester, UK. http://dx.doi.org/10.3115/1611628.1611633
Vantage Learning. (2003a). Assessing the accuracy of Intellimetric for scoring a district-wide writing assessment (RB-806). Newton, PA: Vantage Learning.
Vantage Learning. (2003b). How does Intellimetric score essay response? (RB-929). Newton, PA: Vantage Learning.
Vantage Learning. (2006). Research summary: Intellimetric scoring accuracy across genres and grade levels. Retrieved from http://www.vantagelearning.com/docs/intellimetric/IM_ReseachSummary_InteliMetric_Accuracy_Across_Genre_and_Grade_Levels.pdf
Warschauer, M., & Grimes, D. (2008). Automated writing in the classroom. Pedagogies: An International Journal, 3 (1), 22–26. http://dx.doi.org/10.1080/15544800701771580
Warschauer, M., & Ware, P. (2006). Automated writing evaluation: Defining the classroom research agenda. Language teaching research, 10 (2), 157–180. http://dx.doi.org/10.1191/1362168806lr190oa
Weigle, S. C. (2013a). English language learners and automated scoring of essays: Critical considerations. Assessing Writing, 18, 85–99. http://dx.doi.org/10.1016/j.asw.2012.10.006
Weigle, S. C. (2013b). English as a second language writing and automated essay evaluation. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluations: Current applications and new directions, 36–54. New York: Routledge.