AI-assisted Formative Feedback

Formative feedback

AI-assisted grading

Personalized feedback

Goals

Use AI to deliver detailed, personalized feedback on open-ended questions, tailored to individual student needs

Provide immediate AI-generated feedback around the clock to minimize learning process interruptions.

Encourage self-regulated learning by offering iterative feedback that helps students refine responses and understand complex concepts

Reducing educator workload while ensuring consistent, high-quality feedback for large groups

Guide students toward deeper inquiry and understanding by using AI feedback to promote critical thinking and problem-solving skills

To assess students' comprehension of course content, lecturers traditionally employ exercises and multiple-choice self-assessments in a formative (self-) test. Compared to multiple-choice questions, open-ended questions embody a more sophisticated pedagogical approach. When coupled with thoughtfully designed questions and corresponding feedback, they significantly enhance critical thinking and conceptual understanding. These questions are particularly valuable as they challenge students to articulate complex ideas and demonstrate deeper learning beyond simple factual recall.

Historically, in large lectures, lecturers were limited to providing sample solutions for open-ended questions in formative self-assessments, offering minimal individualized learning support. Individual feedback was only possible through human grading, which is resource-intensive and costly. The emergence of Large Language Models has fundamentally transformed this educational paradigm by enabling immediate, personalized formative feedback for open-ended questions as well .

This technological advancement supports self-regulated learning and helps students refine responses and understand complex concepts. The AI-generated feedback highlights strengths and suggests areas for improvement, helping students to deepen their understanding of course concepts. This feedback guides students toward refining their answers without directly providing the correct solution. Moreover, the system can dynamically assess response quality, potentially awarding points based on predefined performance thresholds.

Background

AI-generated feedback has been shown to be a viable alternative or complement to traditional human feedback. Escalante's study on AI-generated feedback for English as a New Language (ENL) students revealed that there is a nearly equal preference for AI-generated and human-generated feedback among learners, suggesting that AI can effectively support educational practices without compromising learning outcomes (Escalante, 2023). This aligns with Schultze's findings, which emphasize that using large language models (LLMs) to augment human feedback can improve perceived feedback quality, addressing the common dissatisfaction students express regarding the quality of feedback they receive (Schultze, 2024). The implication here is that a blended approach, incorporating both AI and human feedback, can leverage the strengths of each to enhance the educational experience.

However, AI-generated feedback has also been associated with a set of challenges. Bai and Stede's survey on machine learning approaches to free-text evaluation underscores the complexities involved in developing effective AI systems for educational feedback (Bai & Stede, 2022). They argue that while AI can automate certain aspects of feedback delivery, it must be carefully designed to align with educational goals and learner needs. Similarly, Deeva et al. discuss the limitations of automated feedback systems, noting that while they can provide immediate responses, they may lack the nuanced understanding that human feedback can offer (Deeva et al., 2021). This highlights the importance of integrating human oversight in AI feedback systems to ensure that the feedback is not only timely but also contextually relevant and constructive.

The effectiveness of AI in essay evaluation is further supported by Kostic's case study, which demonstrates the capabilities of LLMs in assessing various text attributes through natural language processing (NLP) algorithms (Kostic, 2024). These systems can evaluate writing style and content quality, thus providing a comprehensive analysis that can inform students about their writing strengths and weaknesses. However, the reliance on pre-graded corpora for training these models raises questions about the generalizability and fairness of AI evaluations, necessitating further research to refine these systems.

The integration of human feedback within AI systems is a critical area of exploration. Wang et al. emphasize the need for a human-in-the-loop approach in natural language processing, which allows for continuous improvement of AI systems through human feedback (Wang et al., 2021). This approach not only enhances the accuracy of AI evaluations but also ensures that the feedback provided is aligned with educational objectives and learner expectations. The combination of human insights and AI efficiency can lead to more personalized and effective feedback mechanisms.

The impact of feedback timing on learning outcomes is another important consideration. Research suggests that the timing of feedback delivery can significantly influence its effectiveness, with immediate feedback often being more beneficial for learning than delayed responses (Deeva et al., 2021). This is particularly relevant in the context of AI feedback systems, which can provide instantaneous responses, thereby facilitating a more dynamic learning environment. However, educators must remain aware of the potential pitfalls of over-reliance on automated systems, ensuring that feedback remains focused and constructive.

Scenario Description with KlickerUZH

By using AI-driven formative feedback in KlickerUZH, you create an engaging and supportive learning environment that allows students to practice open-ended tasks in a non-assessment setting. When compared to the assessment setting, the application of AI in the practice scenario reduces the impact of potential mistakes made by the AI.

1. Preparing Questions for Formative Feedback

As a lecturer, you aim to provide your students with a flexible and interactive way to engage with course materials and practice their skills. To achieve this, you start by creating a course in KlickerUZH and developing a set of open-ended questions that align with your course objectives. For each of the questions, you provide model solutions that outline the key elements of ideal responses. Additionally, AI-generated grading rubrics are created to offer criteria for evaluating student responses. You have the option to review and modify these rubrics to ensure alignment with course goals and standards. After testing, you can embed the questions into any of the learning activities supported in KlickerUZH.

Once configured, KlickerUZH allows you to integrate these learning activities directly into your learning management system (LMS) via LTI (e.g., OLAT). This integration ensures seamless access for students through familiar platforms. A student log-in and course participation is required to get formative feedback from the AI, allowing for both cost control and moderation of access.

2. Practicing Questions with Formative Feedback

Students access the learning activities through OLAT or the KlickerUZH app, selecting those that correspond to their current learning modules. This setup encourages self-paced learning and allows students to focus on areas where they need more practice. When students attempt free-text questions, the AI analyzes their responses, provides formative feedback that highlights strengths and suggests areas for improvement. This feedback guides students toward refining their answers without directly providing the correct solution.

Students can revise their responses based on the feedback received and resubmit them for further evaluation. This iterative process continues until the AI deems the response sufficient according to the established criteria in the grading rubric. Once a student's response meets the required standards, points are awarded as part of a gamification strategy to enhance motivation and engagement. This system encourages students to view the learning process as an enjoyable challenge rather than a high-stakes assessment.

Our Learnings

At the University of Zurich's Department of Finance, we are currently exploring the potential of Large Language Models (LLMs) to provide immediate, personalized formative feedback to students during their learning journey. This initiative builds upon our successful experiments with AI-assisted grading in examinations and aims to extend these capabilities to support continuous learning throughout the semester.

To systematically validate and further extend these findings, we will conduct comprehensive pilot studies during the spring term of 2025. Should you be interested in participating, please fill out the form at https://forms.office.com/e/K8CXM2pKhJ so that we may contact you. The results of the piloting will be evaluated and summarized as part of this use case.

Our initial assessment of this use case has also provided several significant insights and preliminary learnings regarding the general use of AI that are relevant for lecturers regarding the implementation of AI use cases. Information about the associated challenges, limitations, and remediation strategies for IT can be found here .

Some of our most important preliminary findings include:

Didactic challenges: A naive implementation where AI provides direct answers as feedback may hinder learning by discouraging critical thinking. Therefore, it is advised to use a tutoring approach for content-specific feedback that guides students toward solutions through hints or counter-questions without giving direct answers. Furthermore, it is important to focus on giving formative feedback that allows students to identify and improve their weaknesses. This corresponds to the way a conversational interface (e.g., chatbot) would be designed and encourages students to try again with another answer.
Accuracy and contextual relevance: AI-generated feedback systems often struggle with accuracy and contextual relevance (e.g. the nuanced understanding that human feedback provides), leading to generic or misaligned responses. Additionally, language models tend to "hallucinate," inventing information or providing overly complex answers that are not grounded in the relevant knowledge base. To address these accuracy and reliability challenges, integrating a human-in-the-loop approach is essential, meaning that a human should review AI-generated feedbacks regularly and check for accuracy and context relevance, allowing for continuous improvement of AI-generated feedback through human input , for example adjusting the rubrics. When compared to the assessment setting, the application of AI in the practice scenario reduces the impact of potential mistakes made by the AI.
Ethical considerations and data privacy: The collection and use of student data for generating personalized feedback raises concerns about consent, transparency, and potential misuse. At institutions like the University of Zurich (UZH), there are no clear guidelines on data privacy concerning AI applications at the time of writing, complicating compliance efforts. However, to address these considerations it is important to obtain informed consent from students regarding how their data is collected, processed, stored, and used (see the implemented privacy policy of KlickerUZH). The provider for AI services has to be carefully selected and it should be ensured that the data provided by students is not used for further training by the provider. Locally hosted models might provide a suitable alternative for small-scale use cases that are privacy-sensitive. Additionally, strict anonymization protocols can help protect personally identifiable information (PII ) that students might embed in prompts. It is not allowed for lecturers to use the AI-assisted feedback on free-text questions for the purpose of assessment without double-checking the scoring. Furthermore, if you, as a lecturer, wish to conduct your own research using the collected data (e.g., free-text responses), this intention must be communicated to the students in advance.
Operational cost: Implementing AI-driven formative feedback systems involves operational costs related to AI use . To manage operational costs effectively, institutions should implement cost-control mechanisms like, e.g., limiting the number of student queries per time period. Exploring open-source models hosted locally (e.g., using Ollama) or by trusted providers can also lower expenses associated with proprietary solutions. Additionally, optimizing resource use by deploying lightweight models for basic queries while reserving more resource-intensive models for complex queries can help balance costs against educational benefits. The cost of all requests will be billed directly by your chosen API providers.

Acknowledgements

We sincerely thank our collaborators and sponsors on this use case: Swissuniversities for funding the development of this use case as part of the P-8/DISK4U project; the University of Zurich (ULF) and the Department of Finance / Teaching Center for sponsoring the development of KlickerUZH and functionalities related to this use case.