Using LLMs for Effective Rubric Design and Grading
Today's exercise is to use an LLM as a partner in the grading process. This involves two distinct, high-level skills:
Before you can grade, you need a standard. The first step is to use an LLM as a collaborator to brainstorm and draft a comprehensive rubric for a research proposal.
Ask the LLM to help you define what makes a "good" project proposal.
Act as an experienced professor. What are the key criteria you use to evaluate a graduate-level research project proposal? List the criteria and briefly explain what you look for in each one.
Use the brainstormed list to have the LLM generate a structured, point-based rubric. Save this as "Rubric v1.0" in a new Google Doc in your class folder.
Based on the criteria we just discussed (e.g., Novelty, Feasibility, Methodology, Clarity, Impact), generate a formal grading rubric.
The rubric MUST be out of 100 points total. Assign a point value to each criterion, ensuring they all sum up to 100.
For each criterion, create a 5-point scoring system (or similar) with clear descriptors for what a low, medium, and high score means.
A rubric is a "prompt" for a grader. Now, you will test how well the AI performs as the grader when given your rubric as its instructions.
Go to the class folder and select two project proposals (not your own). For each proposal, start a new, fresh LLM session (to avoid context leaks) and use the prompt below.
Act as a diligent teaching assistant. Your task is to grade the attached project proposal using the *exact* rubric I am providing.
[Attach your "Rubric v1.0" Google Doc here]
---
Now, grade the attached proposal. For each criterion in the rubric:
1. Provide a specific score (e.g., "Feasibility: 20/25").
2. Write a 2-3 sentence justification for that score, citing specific evidence or quotes from the proposal.
3. Conclude with a summary of the proposal's main strengths and weaknesses.
4. Provide a "Total Score" out of 100.
[Attach the project proposal here]
This is the most critical step. You must now act as the "professor" again, evaluating the "TA" (the AI) and the tool you gave it (the rubric).
Read the AI's graded outputs and compare them to the proposals. Ask yourself:
Based on your analysis, your rubric has flaws. Use the LLM (e.g., in the Canvas) to fix them. Save the result as "Rubric v2.0" in your Google Doc.
[Attach your "Rubric v1.0" Google Doc]
---
You are a prompt engineer. I have attached a grading rubric I designed, but it has flaws. When I tested it, the AI was too lenient on 'Methodology'. Also, the points don't add up to 100.
Please modify the rubric to create 'Rubric v2.0'. Specifically:
1. In the 'Methodology' section, add a sub-point that explicitly checks for "discussion of limitations."
2. Adjust the point values for all criteria (e.g., Methodology: 30, Feasibility: 20, etc.) so that they logically add up to exactly 100 points.
3. Add a new criterion called "Clarity of Writing" (worth 10 points) and adjust the other points accordingly.
Your rubric is now refined and tested. The final step is to apply it to a larger set of proposals to evaluate its performance at scale and record the grades.
Go to the class proposals folder and select your submitted proposal in addition to five different proposals (not your own and not the ones you used for testing). For each proposal, use your refined "Rubric v2.0" to generate a grade. Use a new, fresh LLM session for each one. Don't know which ones to grade? Try asking your LLM to pick 5 random ones for you.
Act as a diligent teaching assistant. Your task is to grade the attached project proposal using the *exact* rubric I am providing.
[Attach your final "Rubric v2.0" Google Doc here]
---
Now, grade the attached proposal. For each criterion in the rubric:
1. Provide a specific score.
2. Write a 2-3 sentence justification for that score, citing evidence.
3. Conclude with a summary of strengths and weaknesses.
4. Provide a "Total Score" out of 100.
[Attach the project proposal here]
As you generate each grade, open the class spreadsheet and record the AI's final score for each proposal you graded. This will help us analyze consistency as a class.
Open Grading Spreadsheet →Think about the following questions. We will use them as a basis for our class discussion.