Hiệu quả của hoạt động tập huấn giám khảo chấm nói VSTEP.3-5

Chỉ số đề mục

Lĩnh vực nghiên cứu

Giáo dục chuyên biệt

Dạng tài liệu

Tác giả

Nguyễn Thị Ngọc Quỳnh⁽¹⁾, Nguyễn Thị Quỳnh Yến, Trần Thị Thu Hiền, Nguyễn Thị Phương Thảo, Bùi Thiện Sao, Nguyễn Thị Chi, Nguyễn Quỳnh Hoa

Nhan đề

Hiệu quả của hoạt động tập huấn giám khảo chấm nói VSTEP.3-5

Nhan đề tiếng anh

The effectiveness of VSTEP.3-5 speaking rater training

Nguồn trích

TC Nghiên cứu nước ngoài – Đại học Quốc gia Hà Nội

Năm xuất bản

2020

Số

04

Trang

99-112

ISSN

2525-2445

Từ khóa

Giám khảo, Tập huấn, Kỹ năng nói, VSTEP.3-5

Từ khóa tiếng anh

Rater training, Speaking skills VSTEP.3-5

Tóm tắt

Giữ vai trò quan trọng trong việc đảm bảo độ tin cậy của hoạt động kiểm tra đánh giá các kỹ năng sản sinh ngôn ngữ, tập huấn giám khảo (rater training) là một chủ đề thu hút trong nghiên cứu về các bài thi quy mô lớn. Tương tự, với bài thi VSTEP, hiệu quả của chương trình tập huấn giám khảo cũng nhận được nhiều sự quan tâm. Do đó, một nghiên cứu đã được tiến hành nhằm tìm hiểu ảnh hưởng của phần tập huấn sử dụng thang chấm Nói VSTEP.3-5 với các giám khảo trong chương trình bồi dưỡng tổ chức bởi Trường Đại học Ngoại ngữ - Đại học Quốc gia Hà Nội. Dữ liệu được thu thập từ 37 học viên tham gia khóa tập huấn nhằm so sánh việc chấm điểm của các học viên trước và sau phần tập huấn sử dụng thang chấm Nói. Cụ thể, các khía cạnh về độ tin cậy của điểm số, độ khó của tiêu chí, độ khó tính, độ phù hợp, và độ thiên lệch của giám khảo cũng như mức phân tách của thang điểm đã được phân tích. Nghiên cứu đã thu được các kết quả tích cực khi điểm số của các giám khảo đưa ra sau phần tập huấn có độ tin cậy, thống nhất, và phân tách tốt hơn. Sự cải thiện rõ rệt nhất được tìm thấy ở khía cạnh độ phân biệt mức điểm trong thang chấm. Một số ý nghĩa về hoạt động tập huấn giám khảo cũng như phương pháp nghiên cứu hoạt động này đã được rút ra từ các kết quả nghiên cứu.

Tóm tắt tiếng anh

Playing a vital role in assuring reliability of language performance assessment, rater training has been a topic of interest in research on large-scale testing. Similarly, in the context of VSTEP, the effectiveness of the rater training program has been of great concern. Thus, this research was conducted to investigate the impact of the VSTEP speaking rating scale training session in the rater training program provided by University of Languages and International Studies - Vietnam National University, Hanoi. Data were collected from 37 rater trainees of the program. Their ratings before and after the training session on the VSTEP.3-5 speaking rating scales were then compared. Particularly, dimensions of score reliability, criterion difficulty, rater severity, rater fit, rater bias, and score band separation were analyzed. Positive results were detected when the post-training ratings were shown to be more reliable, consistent, and distinguishable. Improvements were more noticeable for the score band separation and slighter in other aspects. Meaningful implications in terms of both future practices of rater training and rater training research methodology could be drawn from the study.

Kí hiệu kho

TTKHCNQG, CTv 183

File toàn văn

Xem toàn văn

Tài liệu tham khảo

[1] Woehr, D. J., & Huffcutt, A. I. (1994), Rater training for performance appraisal: A quantitative review,Journal of Occupational and Organizational Psychology, 67, 189-205.
[2] Weir, C. J. (2020), Global, Local, or “Glocal”: Al-ternative pathways in English language test provision,In L. I-W. Su, C. J. Weir, & J. R. W. Wu (Eds), English Language Proficiency Testing in Asia: A New Paradigm Bridging Global and Local Contexts
[3] Weir, C. J. (2005), Language testing and validation,Hampshire: Palgrave McMillan
[4] Weigle, S. C. (2002), Assessing writing,Cambridge: Cambridge University Press
[5] Weigle, S. C. (1998), Using FACETS to model rater training effects,Language Testing, 15(2), 263–287
[6] Thornton, G. C., & Zorich, S. (1980), Training to improve observer accuracy,Journal of Applied Psychology, 65(3), 351.
[7] Smith, D. E. (1986), Training programs for performance appraisal: A review,Academy of Management Review, 11, 22-40.
[8] Rosales Sánchez, C., Díaz-Cabrera, D., HernándezFernaud, E. (2019), Does effectiveness in performance appraisal improve with rater training?,PLoS ONE 14(9): e0222694. https://doi. org/10.1371/journal.pone.0222694
[9] Roch, S. G., Woehr, D. J., Mishra, V., & Kieszczynska, U. (2011), Rater training revisited: An up-dated meta-analytic review of frame-of-reference training,Journal of Occupational and Organizational Psychology, 85, 370-395
[10] Roch, S. G., O’Sullivan, B. J. (2003), Frame of reference rater training issues: recall, time, and behavior observation training,International Journal of Training and Development, 7(2), 93-107.
[11] Pulakos, E. D. (1986), The development of training programs to increase accuracy with different rating tasks,Organizational behavior and human decision processes, 38, 76-91
[12] Pulakos, E. D. (1984), A comparison of rater training programs: Error training and accuracy training,Journal of Applied Psychology, 69(4), 581-588.
[13] Noonan, L. E., & Sulsky, L. M. (2009), Impact of frameof-reference and behavioral observation training on al-ternative training effectiveness criteria in a Canadian Military Sample,Human Performance, 14(1), 3-26.
[14] McNamara, T. F. (1996), Measuring second language performance,Essex: Addison Wesley Longman.
[15] McIntyre, R. M., Smith, D. E., & Hassett, C. E. (1984), Accuracy of performance ratings as affected by rater training and perceived purpose of rating,Journal of Applied Psychology, 69(1), 147–156.
[16] Luoma, S. (2004), Assessing speaking,
[17] Linacre, J.M. (1989), Many-faceted Rasch measurement,
[18] Latham, G. P., Wexley, K. N., & Pursell, E. D. (1975), Training managers to minimize rating errors in the observation of behavior,Journal of Applied Psychology, 60(5), 550-555
[19] Hedge, J. W., & Kavanagh, M. J. (1988), Improving the accuracy of performance evaluations: Comparison of three methods of performance appraiser training,Journal of Applied Psychology, 73(1), 68-73
[20] Hakel, M. D. (1980), An appraisal of performance appraisal: Sniping with a shotgun,Paper presented at the First Annual Scientist-Practitioner Conference in Industrial-Organizational Psychology, Virginia Beach, VA
[21] Eckes, T. (2008), Rater types in writing performance assessments: A classification approach to rater variability,Language Testing, 25(2), 155-185.
[22] Bernardin, H. J., & Wal-ter, C. S. (1977), Effects of rater training and diary-keeping on psychometric errors in ratings,Journal of Applied Psychology, 61(1), 64-69.
[23] Bernardin, H. J., & Pence, E. C. (1980), Effects of rater training: Creating new response sets and decreasing accuracy,Journal of Applied Psychology, 65(1), 60–66. https://doi.org/10.1037/0021-9010.65.1.60
[24] Bachman, L. F., & Palmer, A. S. (1996), Language testing in practice: Designing and developing useful language tests,