Assessment and Feedback

Day 3: Reliability and validity

Yesterday we looked at how to prepare students for peer assessment and feedback, and we’d like to wrap up this section with a short video with Dr Tony Curran, Sessional Lecturer, at the School of Art and Design at ANU, which showcases peer feedback in teaching practice.

The peer “crit” sessions in Art are a great example of embedding authentic peer feedback into the course, and also creating a supportive environment for this potentially confronting experience.

question mark

Activity:

What did you think of the approach used in the School of Art and Design? Is this similar to your experiences with peer assessment in other disciplines?

Reliability and validity

Lecturers and tutors can be concerned about the accuracy and consistency ofĀ  the peer assessment process compared to other forms of assessment, so today, we will look at some techniques that can boost reliability and validity when designing and implementing peer assessment.

1. Make it authentic and realistic

Authentic assessment can flag for students what activity, practices and values are essential in their future discipline or profession, such as:

  • 360 degree appraisal (in business)
  • Audition (in performing arts)
  • Case audit (in health professions)

Check out UNSW’s guide here and this authentic assessment toolbox for more.

2. Make the process explicit, fair and transparent

Clear light bulbThe tasks and associated assessment criteria need to be clearly described and explained to students (see Day 2 for more), and the marking process (if applicable) needs to be fair, equitable and transparent.

Want to know more? See here.

It can also be helpful to involve students in developing the assessment criteria.

 

3. Consider rubrics

Rubrics ā€“ as a marking tool – can play an important role in creating a high level of validity and reliability as they provide a standard set of rules and criteria to measure a task or performance, as well as provide information for peer feedback. However, students may need to learn what they think quality is before they can assess it accurately, and to be given multiple practice opportunities using the marking criteria.

In some teaching contexts, there is also a case for using an alternative approach to peer assessment (without using assessment criteria), which can also have high validity and inter-rater reliability. Read more here.

4. Set up a moderation process

This can help to improve the reliability of grades and foster greater student confidence in the accuracy and consistency of the peer assessment process.

question markDiscussion questions

  1. What forms of peer-evaluation or peer review are authentic in your profession or discipline?
  2. What are some techniques you would use or have used to enhance validity and reliability of a peer assessment task?

Optional Reading

Peer assessment and professional behaviours:

Joanna Tai, Chie Adachi (2017) Peer assessment and professional behaviours: what should we be assessing, how, and why?, Medical education, 51:4, 346-347. DOI: 10.1111/medu.13254.

Reliability and validity:

Nancy Falchikov, Judy Goldfinch (2000) Student Peer Assessment in Higher Education: A Meta-Analysis Comparing Peer and Teacher Marks. DOI: 10.3102/00346543070003287.

Hongli Li, Yao Xiong, Xiaojiao Zang, Mindy L. Kornhaber, Youngsun Lyu, Kyung Sun Chung, Hoi K. Suen (2015) Peer assessment in the digital age: a meta-analysis comparing peer and teacher ratings, Assessment & Evaluation in Higher Education, 41:2, 245-264. DOI: 10.1080/02602938.2014.999746.

Peer assessment supporting self-assessment:Ā Ā 

Daniel Reinholz (2015) The assessment cycle: a model for learning through peer assessment, Assessment & Evaluation in Higher Education, 41:2, 301-315. DOI: 10.1080/02602938.2015.1008982

Rubrics:
Ernesto Panadero, Anders Jonsson (2013) The use of scoring rubrics for formative assessment purposes revisited: A review, Educational Research Review, 9, 129-144. DOI: 10.1016/j.edurev.2007.05.002.

26 thoughts on “Day 3: Reliability and validity

  1. Authentic peer-evaluation and -review in my discipline includes role-play simulations, real-world tasks (such as writing policy briefs), and presentations. These are regular tasks that students will encounter in the professional scene, and they will both assess, and be assessed by, their colleagues.

    In addition to using authentic evaluation processes, I will probably implement moderation and transparency to maintain reliability and validity. However, I am still on the fence regarding rubrics. While I appreciate that rubrics foster standardisation by reducing subjectivity, increasingly, research, including the article by Jones and Alcock, shows that students create their own grading criteria in the absence of a rubric. Moreover, the research also suggests that these criteria are fairly consistent, both across cohorts and with experts’ internal criteria.

  2. What did you think of the approach used in the School of Art and Design? Is this similar to your experiences with peer assessment in other disciplines?
    I really liked the described approach and found it to be completely different from anything I’ve been involved in! I really appreciated how much it considered the students emotion and well-being during the process.

    What are some techniques you would use or have used to enhance validity and reliability of a peer assessment task?
    I would definitely use rubrics as I remember them being very helpful when I was a student (i.e. to tailor the assessment) and I do think they could help to have a degree of validity and reliability as outlined.
    I think I would take the students through some example text and get them to grade it together as a class (maybe pair-think-share) in order to demonstrate examples of adequate assessing. I also think I would try to spend some time explaining the use to them of learning to assess their peers well, and reminding the students to be respectful of each other and to recognise that could be potential negative outcomes on each other (perhaps again, going through examples of ‘good’ and ‘bad’ critique).

    1. Hi Angela, I’m glad you appreciated the approach taken in the School of Art! One of the things I appreciate about it is that the process is embedded into the routine of the degree, where students are introduced to it in their foundational subjects and it continues throughout the degree. One of the things Tony discussed with us (that didn’t make it into the video unfortunately!) was how the peer crit sessions help to build strong and supportive relationships during the degree that then carry forward into professional networks. šŸ™‚

  3. What forms of peer-evaluation or peer review are authentic in your profession or discipline?
    As mentioned above, in my field (business) the 360 degree feedback is often used as authentic. One thing that is worth noting is that 360 degree feedback, like all forms of feedback can be subjective. Whilst those who have real world experience are used to this and understand how it works, some students are not as they think that marks should always be 100% objective.

    What are some techniques you would use or have used to enhance validity and reliability of a peer assessment task?
    For me, the more peer assessors the better the reliability and validity. Having said that, I think it’s also good to do randomly check some of the assessments myself to ensure that the peer assessments are reasonably accurate

  4. From the video, the ANU School of Art and Design appears to be doing a slightly more academic version of what is traditionally done in design schools, where work is reviewed collectively. I was a student in one foundation unit in design at the University of Canberra and the criticism was relatively gentle. It was a little more robust when I presented a seminar to the “New Bauhaus” architecture students and staff from Germany, on a study tour of Australia. Also once I gave a seminar at Cambridge University’s Computing Laboratory and the criticism from the students of me, and each other, was scarifying (Oxford’s computer department was not as harsh). At ANU computer science things are much more gentle.

  5. The audits done in the ANU TechLauncher program are very similar to the reviews I used to conduct of IT projects at the Defence Department. In that sense these are very authentic forms of peer-evaluation and review. The process is more polite and gentle in the ANU version.

    The ANU TechLauncher program collects input from not only students and tutors but also project clients and mentors to enhance validity and reliability. Also the peer feedback is peer assessed.

    For thethe ANU ICT Sustainability course I ran some statistical tests to check the peer assessment correlated with my own grading and student’s final marks. Also, of course, I run my eye over the student’s work each week and check the peer grades look reasonable. Very occasionally I have to suggest to a student that their feedback could be worded it more positively than “This is the worst work I have ever seen …”. šŸ˜‰

  6. Bhavani, I am a big fan of rubrics. This is a useful way to convey to the students, assessors and evaluators what learning is expected. It is too much to ask a student (or most instructors) to each construct their own grading criteria for each assessment task. Also I find that students (and many instructors) tend to overdo rubrics with excessive numbers of criteria and levels. Producing a rubric should be part of formal training for educators.

    1. Hi Tom,

      I definitely agree that there should be formal training on producing and understanding rubrics. I also agree that rubrics can help convey expectations, although they can also be overdone. However, I have repeatedly seen rubrics limit students’ potential. More importantly, I have also seen how well learning and teaching functions without rubrics. The particular authors escape me, but I remember looking at this in Principles of Tutoring and Demonstrating. In situations where half the class were given rubrics, and half were not, the groups without rubrics consistently outperformed, or at least matched, the groups with marking rubrics. Like most things in education (and, indeed life), there are pros and cons to using rubrics.

      1. Hi Bhavani & Tom, great discussion about rubrics here. As a marker I have found them very constructive to my ability to mark consistently (and quickly), but I have also found that they sometimes can contribute to inflation (or deflation) of student marks if a student ranks very well in one area but poorly in another. I am very interested in the paper you mention Bhavani about the effect of rubrics – I might try to find it! We will have a coffee course coming up later this year about marking and feedback to students which will discuss some of these issues so stay tuned for that as well.

      2. Bhavani, I haven’t done “Principles of Tutoring and Demonstrating”. Was there a particular paper cited in this course, or did the class do the experiment with half using rubrics? I did a quick search and found one paper where this experiment was done using two hundred trainee teachers (Panadero & Romero, 2014). The rubric group reported better results, but also higher stress and more narrowly task focused behavior.

        Reference

        Ernesto Panadero & Margarida Romero (2014) To rubric or not to rubric? The effects of self-assessment on self-regulation, performance and self-efficacy, Assessment in Education: Principles, Policy & Practice, 21:2, 133-148, DOI: 10.1080/0969594X.2013.877872

        1. Hi Tom,

          Panadero and Romero, Glasgow and Hicks, and ____ are among the authors who note the challenges of using rubrics. Jonsson and Svingby, in assessing multiple studies on rubrics, conclude that rubrics work better in certain disciplines, and that rubrics do not enhance validity/reliability in and of themselves. As mentioned before, rubrics have pros and cons, and in my discipline, the pros do not seem to strongly outweigh the cons.

          While I can’t lay my hands (or memory) on materials from PTD, the class did actually do an experiment, and everyone with rubrics did significantly worse than those without. Perhaps that particular cohort was stacked with students from certain fields?

    2. Hi Tom, I wonder where some of the resistance to rubrics comes from, and what the rationale for that resistance might be? I know many academics are very opposed to them!

      1. Katie, I can see Bhavani’s point on rubrics limiting studentsā€™ potential. But that will apply to the top few students, and I am worried about the bottom ones who need all the support they can get.

        The resistance to rubrics from academics might come from a difference in aims of education. If you are trying to help your top students to excel, so they can go on to advanced research, then you want to give them as much flexibility as possible. But I am trying to train the bulk of students for a professional career in industry. They need to be able to demonstrate specified skills and knowledge to carry out a defined role. If I certify they can do the job and they really can’t, then money, or lives, may be lost as a result.

  7. I think the rubric is a great way to enhance the validity of the assessment, especially combined with going over example submissions as a class. By working through a few examples with everyone and the rubric, students can see what qualifies for great, good, and poor marks and thus have a similar level of baseline training in order to better assess their peers, in line with the course expectations, and the expectations of the instructor.

    1. Hi Danny, I think rubrics are perhaps even more important for students assessing each other, compared to more experience markers in the teaching staff, to provide some structured guidelines for the students to apply in their peer assessment. Interesting stuff!

      An approach that I have seen work really well was having the students develop the criteria and weightings for the rubric as a class activity. This was really illuminating as it made more clear what the students thought were important, and gave the teacher a chance to explain why the assignment was going to be marked a certain way, and then discuss it.

  8. The School of Art and Design crit sessions sound great to me ā€“ I imagine they are really useful for students over time and help them to develop their work. Iā€™m not sure what an equivalent form of authentic learning in my subject area would be as we tend to get students from different disciplines. The students I teach are most likely to go into the public service or academia (many are already lecturers at overseas universities). Presentations and policy briefs are likely to be the most useful in terms of peer assessment.

    I would use a rubric because I find a rubric really helpful in grading essays, but I like the idea of developing a rubric with the class at the beginning of the peer assessment process. That way students are more likely to understand the different criteria and have the opportunity to say what they think is important in assessing work. This would also be a good way to raise the issue of how to judge the work of students for whom English is a second language, ie discuss the relative weights of different criteria. Like Angela, Iā€™d do a couple of run throughs first till Iā€™m confident that everyone has got the idea. I think Iā€™d go for having everyone assess everyone elseā€™s work and would check and moderate to deal with issues of reliability and respectful communication.

  9. It’s interesting to hear that some academics are still anti-rubrics, but perhaps this is because HE educators are not really taught how to build or use them. I have used rubrics of various kinds for more than 20 years so I’m fairly used to them and their foibles, but having a rubric is absolutely no guarantee of consistency, reliability or validity. I don’t use peer assessment towards final marks because different assessors (peers or teachers) interpret rubrics/tasks differently. Even with major standardised tests, they don’t expect more than 80% agreement and this is on short, controlled tasks where raters have been repeatedly trained and checked in their application of rubrics with benchmarked samples. Rubrics, or at least their use across the assessments for a course, also need to be linked clearly to a standard (in our context, the ANU assessment standards) and this needs to be explicit for students.

  10. When I think about the different ways to increase the validity and reliability of student peer assessment, the same techniques we use in training tutors come to mind. Rubrics are one such technique. Another way to train tutors and increase inter-marker reliability is to have a norming session where an HD, D, C, and P papers are marked and compared. This can easily be adapted for training students. I’ve used papers from the previous year (with students’ permission of course) for this purpose.

  11. For me, the most inspiring technique in today’s material is to involve the students in developing the criteria. We all realize the importance of clear marking criteria. I have learned in several training sessions that we could ask the students to explain the criteria to check whether they have mastered the criteria. Now if we invite the students to develop the criteria, they will have a deeper understanding through the discussions. Students could be better engaged.

    1. Hi Sunny,

      I agree – involving students in developing the criteria is a great way for them to be more engaged and understand the assessment process more!

      Cheers

      Karlene

  12. In my discipline, authentic peer review or evaluation could take a number of forms, including critique of ideas via group discussion; review of oral presentations either by discussion or written feedback; or copy editing and giving feedback on a piece of written work.

    To increase the validity and reliability of a peer assessment task, I would use a rubric alongside some model answers, and walk the students through both, encouraging discussion to clarify any queries they may have. I would also explicitly walk through the difference between critique and criticism, and encourage the students to use empathy when constructing their responses.

    I really love the idea of involving students in the development of marking criteria ā€“ it would be a fabulous way of uncovering what students think is important in the context of their discipline.

    1. Hi Rebekka,

      Thanks for sharing ideas on how peer review could look in your discipline. I also really like how you would emphasise the difference between critique and criticism with your students – I think this is a great way to encourage students to focus on being constructive and empatheic in their responses!

      Cheers

      Karlene

  13. Personally, I am a fan of marking rubrics, provided they are well made, and allow for enough variation between each category. I also liked the idea of getting students to participate in making the rubrics or marking criteria, I think that would give students a real insight into how their work is marked in other courses, and how they should be focussing their work.

    The point that I thought was interesting through, was giving students practice runs with the criteria and example assessment. A few people in the comments seem to have used, or would like to use something similar. It seems to me that this would thoroughly improve studentsā€™ ability to participate in peer assessment, however, it also seems that it would take a lot of time in classes. How could that be resolved? In addition, would that training be something that was run in every first year course and then not again, or is it something that would need to be done in every course? My intuition tells me that it would need to be in every course to be effective, which would be difficult, regardless of how effective it might be.

    I am very interested in this idea of authentic assessment, even though here it has only been flagged as authentic peer assessment. I would be interested in learning more about this topic.

  14. In the creative profession, we are often asked to present our creative portfolio.

    I could totally relate to the School of Artā€™s approach because I did a degree in Fine Arts and peer assessment was a huge part of our program. Using rubrics, we assessed each otherā€™s work (called plates) and gave feedback. Some sessions were brutal especially when we came to advertising and editorial campaigns. When people started to get emotional, our teachers would just say ā€“ we are just preparing you to the brutal world of advertising.

    ā€¦. which of course was just very poor assessment design.

    My main takeaways from Dr Curran which I would apply in my teaching in the future are:
    ā€¢ To provide a supportive environment.
    ā€¢ To explain the assessmentā€™s value and outcomes because it can be intimidating and scary
    ā€¢ Learn how to manage the studentsā€™ emotions
    ā€¢ Allow alternatives in case there are learners who are not comfortable with peer assessment

  15. On the topic of peer review and assessment, all I can offer from my teaching experience is a small example but which did in fact take a fascinating turn. Not sure if this is helpful. But its fascinating!! For one of my courses, the participation mark was a pass or fail grade, based on each student taking over our class Twitter handle for one day. The mark itself was a done deal. Along the way, students did a whole bunch of fun and engaging things and by the end of the course, students were requesting to have another go and producing excellent and thought provoking tweets that even attracted a “follow” from Kevin Rudd! We had determined that there would be an ANU voucher that would be awarded to our Social Media Champion, plus the title itself, again as a way of encouraging students to participate. The way to determining the winner is where this is going >> the students themselves would vote for each other and thus it was a question of impressing your cohort. To guard against everyone simply voting for themselves, while keeping the process anonymous, each student had two votes to allocate (on one ballot paper) with the instructions that no one could vote twice for the same person. Guess what? At least half the class did not get any votes at all, eg 50% of students did not vote for themselves. And what surprised me, in discussions with students afterwards, was that some people voted with one of the students who was a particular fan of ANU merchandise. So these precious votes they had – where you would expect students to maybe vote for themselves or otherwise vote for the best candidate for champion – actually were used in a very non Eurovision way. This to me is a cautionary tale about how students approach assessment. In this case, there was something at stake – an accolade that could feasibly be used even on a CV. And yet, the exercise was deeply personal and quite touching. I should mention that the most talented and logical person did win regardless, so the “nice” votes did not sway against the natural winner. All of this to say — when talking assessment, where we value objectivity and merit over other considerations, human behavior is not always rational or impartial.

Leave a Reply

Your email address will not be published. Required fields are marked *

*