Exploring the effects of individualized feedback on raters’ severity in second language writing assessment

Year: 2019

Author: HUANG, Jing

Type of paper: Abstract refereed

Abstract:
Performance-based language assessment commonly requires human raters to assign scores to language leaners’ performances. The subjectivity in human ratings inevitably introduces rater variability that has been identified as a main source of construct-irrelevant variance. This study explored the immediate and retention effects of individualized feedback on raters’ rating severity in the context of Chinese as a second language writing assessment. The participants were 93 native Chinese speakers without previous rating experience, and randomly assigned to one of three treatment groups. The three groups differed in the way of receiving individualized feedback at a given time period: (a) control group receiving no feedback, (b) single-feedback group receiving the feedback once, and (c) double-feedback group receiving the feedback twice. Each participant rated 100 writing scripts on Day 1 as the pre-feedback ratings, and received one of the feedback treatments on Day 2. The post-feedback ratings were conducted immediately after the feedback session on Day 2 by assigning each participant 100 new writing scripts to rate. Raters’ retention of the feedback was measured by assigning each participant 100 new writing scripts to rate as the delayed post-feedback ratings after one week. Based on the outputs of the FACETS, raters’ rating severities were produced, respectively, for the pre-feedback rating phase, the post-feedback rating phase, and the delayed post-feedback rating phase. One-way ANCOVA and one-way repeated measures ANOVAs were conducted to investigate the immediate and retention effects of individualized feedback on raters’ rating severity.

The results from the immediate post-feedback rating severity showed that raters’ rating severities from all experimental groups receiving the individualized feedback were significantly lower than that from the control group receiving no feedback. In other words, the rating severities of the double-feedback and single-feedback groups were superior to the control group. However, raters’ rating severity from the double-feedback group receiving the individualized feedback twice was not significantly lower than that from the single-feedback group receiving the individualized feedback once.

Furthermore, the results showed that raters’ pre-feedback rating severity, immediate post-feedback rating severity, and delayed post-feedback rating severity were not significantly different in the single-feedback group. On the other hand, for the double-feedback group, raters’ pre-feedback rating severity, immediate post-feedback rating severity, and delayed post-feedback rating severity were found to be significantly different. That means, the raters from the double-feedback group could retain their improvements in rating severity one week after receiving the individualized feedback twice at a given time period.

Back