The burgeoning integration of Large Language Models (LLMs) such as ChatGPT into the fabric of Massive Open Online Courses (MOOCs) has highlighted a promising new direction for enhancing automated essay assessment processes. This research delves into the practical implementation of LLMs for evaluating student essays within MOOC frameworks, focusing primarily on exploring advanced prompt engineering strategies.
We investigate a spectrum of methodologies, including few-shot learning, Chain-of-Thought (CoT) prompting, and fine-tuning techniques, to discern the most effective strategies for harnessing the capabilities of LLMs in this educational domain. Drawing from the latest advancements in natural language processing (NLP), our study examines the ability of LLMs to deliver accurate, efficient, and scalable assessments of student submissions.
MOOCs typically host hundreds to thousands of students per course, presenting significant logistical challenges regarding assignment evaluation. The volume of essays that require assessment can be overwhelming for instructors, making it virtually impossible to provide detailed, timely feedback without technological assistance. The deployment of LLMs promises not only to enhance the grading efficiency and maintain consistency in evaluation standards across large cohorts.
The primary objective of this study is to explore the application of generative AI (GAI) in assisting with essay grading, utilizing open courses hosted at ewant, the largest MOOCs platform run by National Yang Ming Chiao Tung University (NYCU) in Taiwan . This course, "Required Credits for University Students - Emotional Education" is taught by Professor Chen Fei-Chuan at National Yunlin University of Science and Technology, Taiwan. Since its first delivery in 2015, this course has been offered 137 times, with nearly 20,000 students enrolled. From both qualitative and quantitative perspectives, this course represents an optimal choice for the study, offering substantial potential for further research and development. Assignments in this course predominantly involve open-ended questions without standard answers, encouraging students to reflect, discuss, share, and synthesize their personal experiences based on the knowledge acquired during the course. This type of unstructured assignment is better suited for introducing GAI than structured assignments in science and engineering courses with definitive answers.
This research aims to leverage a data-driven approach to develop a GAI system that replicates the grading standards and performance of the instructors or teaching assistants (graders), thereby assisting future educators in efficiently grading large volumes of written assignments. By analyzing the strengths and drawbacks of multiple prompt engineering and fine-tuning methods in automating essay evaluations, the study aims to establish a dataflow pipeline for AI-assisted essay grading, with the expectation of generalizing this process to other courses of a similar nature. Additionally, this research proposes recommendations for designing more effective and scalable automated essay assessment systems tailored for contemporary online education platforms.
Overall, this study aims to provide a comprehensive analysis of the potential of LLMs in transforming the landscape of essay assessment in MOOCs, thereby contributing valuable insights into the optimization of educational technologies in a GAI era.
Included in
[Session 11A]: Artificial IntelligenceReferencesBrown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. Advances in neural information processing systems, 35, 22199-22213.
Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., & Zettlemoyer, L. (2022). Rethinking the role of demonstrations: What makes in-context learning work?. arXiv preprint arXiv:2202.12837.
Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., ... & Le, Q. V. (2021). Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824-24837.
Author KeywordsArtificial Intelligence, Large Language Models, Prompt Engineering, Assessment, MOOCs