Sep 18 2007
Fair, Honest, Accurate Assessment
Pissed Off has an interesting article about partial credit:
My AP is against giving too much partial credit on exams. At Friday’s welcome meeting he passed out copies of a final exam from summer school and asked us what was wrong with it. Not one person in the department could come up with an answer. The questions were well written in mathematically precise language. They were typed. They covered the entire range of the course and they were not too easy or too hard. He was in shock that we could find nothing to complain about.
After a few minutes he yelled “Don’t you see it? The multiple choice section is worth 24 points (4 points each) and the rest of the exam consists of 4 and 6 point questions where partial credit is given. This is ridiculous. An exam should have no more than 30 or 40 points of questions with partial credit. Besides, don’t you have better things to do than to mark papers?”
I thought when I first read this that the AP was being stupid, but upon reflection (and it may well be that I’m giving the AP too much credit), I can understand where this came from: Partial credit not only can be abused, but frequently is, in my experience, and I don’t mean by students. But the question of partial credit, what is abuse and what is appropriate, led to assessment in general, and here we are.
I’d like to talk about assessment issues, and remembering that a commenter on an earlier article stated that his administration to a large extent controlled the weighting of graded items, address this both to faculty and administrators. If you’re into assigning macaroni art projects, you’re probably wasting your time here. Just sayin.
Categories of assessment
Not all graded items are created equal. The distinctions between different types of assessment must be taken into account if we are to give students the fairest, most honest and accurate assessment possible.
Knowledge assessment v. still-warm-still-breathing assessment. Knowledge assessment obviously assesses a student’s knowledge, preferably of the topic at hand (and not something else). Still-warm-still-breathing assessment has nothing to do with knowledge, but may assess attitude or some other tangential property. Giving points for attendance is one example of still-warm-still-breathing assessment. After all, a student may not be in class because he already knows what a differential equation is. I probably don’t need to tell you that I’m not a big fan of the latter. I also don’t understand it. In primary and secondary school, attendance is mandated by law, so why base part of the grade on it? In the university, most faculty don’t grade on attendance because 1) it’s tedious and takes up too much valuable time, and 2) it treats students like children.
The only thing I have to say about still-warm-and-breathing assessment is this: If you absolutely must grade on something that has no direct relationship to knowledge, find some way to tie knowledge to it. So if you feel you have to grade on attendance, instead of wasting time calling roll at the beginning of class, give pop quizzes. You get a partial attendance check that bears some relationship to the topic, even if it is what I call the “Were you awake through class” type of quiz.
In-class v. out-of-class assessment. The crucial distinction between the two is that the former is given in a controlled environment, whereas the latter is not. This points up a weakness of out-of-class assessment: You can never be sure if the assignment reflects the student’s knowledge, or that of some other person (or source), be it another student, well-meaning but seriously misguided and unethical parents, or wikipedia. Of course, students can cheat on in-class assignments, but that’s a different topic (see here).
Another important distinction is time. I suspect that if I had had days in which to do an assignment, I might eventually have come up with an answer that was more or less correct, but if it took me days to think of it, then I didn’t know it very well. Out-of-class assignments therefore tend to inflate the representation of the student’s knowledge that in-class assignments, such as exams, more accurately represent.
I do not mean that out-of-class assignments are bad. But out-of-class assignments should be designed so that they assess knowledge and skills that are not easily assessed in a timed environment, or they should be designed less as assessment than reinforcement.
Contextual v. discrete assessment. Because “integrated” has been <s>perverted</s> redefined to mean “anything that doesn’t assess knowledge related to this course,” I’m reverting to the older term, contextual. A final paper is an example of a contextual assessment, because it assesses the student’s overall knowledge of the course material. Case-based assignments, those in which the assessments are embedded within the context of a case, or assignments that assess multiple, interrelated knowledge domains are also contextual.
Assessments in which the items bear no necessary relationship to one another except that they all relate to the general topic are discrete assessments. There is a grey area between the two, where the individual items may be discrete, but the items themselves are contextualized. Story problems on a math or science exam would be an example.
Also at issue are individual and group assessments, which need no definition.
In my experience, assessment is too often given very little thought. I have, on too many occasions, sat by while a colleague said something like, “We’ll have two exams at 45 points each and 10 assignments at 10 points each, that adds up to 100 so it will be easy to calculate final grades,” and left it at that. In my humble opinion, one should invest a great deal of care and thought when designing an assessment system.
Ideally, an assessment system will use a combination of assessment types to best evaluate the student’s knowledge across the spectrum. Exams are an excellent assessment for detailed knowledge, and quizzes can be excellent checks along the way (both for the instructor and the student). Every class should implement some kind of in-class assessment, if only to guard against academic misconduct.
Group projects (and group work in class) certainly have their advantages, but they also have distinct disadvantages. Before any instructor implements group projects, he should decide how he will guard against unfair work distribution within the group. Too often, students who like group work do so because they can sit back an let everybody else do their work for them.
However, group projects are excellent for contextualized work that is too complex to be assessed on an exam. Otherwise, there is little point in assigning group work. I am aware of the maxim about teaching to others being the best way to learn, but that doesn’t imply that those others learn anything.
Controlling for fairness when assessing group work is extremely difficult. I found that assigning students to one group which they worked with all semester long helped a great deal; students are willing to let others slack on the first project, but are much less so inclined on succeeding projects. I also used a system of contracts. The students as a group when the project was ready to turn in had to allot what everyone in the group agreed was an accurate percentage of the work done to each student in the group. Each student had to sign the contract, consenting to the percentage given him, and I assigned no grade to anyone in the group until the contract was signed and submitted. Once signed, a percentage score could not be appealed (if there is a conflict, we resolve it before any grades are assigned). So if the project was worth 50 points, the project grade was 45 points, and Johnny got a 90% on the contract, he would get 90% of the 45 points, or 40.5 points for that project.
Group projects were highly complex, far more so than what could be assessed in class. The were also case-based, and therefore contextual, which leads us to the next issue.
Contextual assignments, individual or work, out-of-class or in-class, are often designed so that the output or answers from one section feed into following questions. There is nothing wrong with this, except that it raises the issue of cascading error. In other words, if Mary gets the first question wrong and her answer for succeeding questions depends on getting the first question correct, should she be docked for those suceeding questions?
My position is that it depends on the purpose of the assessment. If the purpose of the assessment is to judge Mary’s knowledge of each of those tasks covered, then Mary should only be docked for what she did not learn. If, however, Mary has already been assessed on those tasks, and the project is analogous to, say, a comprehensive final exam, then I would be more comfortable with cascading errors — provided that she is given partial credit.
And that brings us back around to the article that started this. My first reaction, as I said, was that this AP was spouting nonsense. Partial credit may not be applicable in the real world, but it’s a necessity for teaching, even in a math class. Carl may not have gotten the right answer because he started out right, but veered off in the wrong directions, but he started out right, and he should get credit for that. Partial credit reflects partial learning, and as such, partial credit is more accurate assessment.
But partial credit should never be given unless it reflects partial knowledge. “You get five points just for writing something down for the question” isn’t partial credit. It’s educational welfare. Such practices should be forbidden, and the assessments of teachers who employ such practices cannot be trusted.
The final question, of course, is weighting assessments. How an instructor weights assessments depends on what type of course he teaches and what types of assessments he uses. But the weights assigned should be designed so that the advantages and disdavantages balance one another. So for example, if I am teaching the same course with others and assessments must be decided by the group, I insist on a 60-75% range for in-class exams so that at least that amount of the grade is not an assessment of someone other than the student’s.
But I do not give still-warm-and-breathing assessments. If such work is assigned, it should never form enough of the total score to raise a student’s grade more than a grade sign. That’s no more than 3% of the total grade. Anything more distorts the assessment to the point that it no longer reflects how well the student has learned.
For more on fair, performance-based assessment, see here.
2 responses so far
2 Responses to “Fair, Honest, Accurate Assessment”

I’m interested in your thoughts on a cascading problem where, to use the student name from your example, Carl did not start out right, nor do many right steps, but somehow magically arrived at the right final answer. For example, Carl worked math magic in the intervening steps to make his answer be the right answer, but completely ignored the math concept being tested. Not that Carl cheated.
I only ask this because I have found myself in this situation only recently and thought it timely to ask others for their thoughts on the matter.
By the way, I call students writing things down for questions to which they don’t know the answer, “Core dumps” and promise my students negative points for them. Almost all of my students knew what a “core dump” was even before I got to the definition and they hung their heads in shame. No one has challenged me on that policy since I pointed out that not only would their score be much improved for leaving a question blank when they did not know the answer, but their hands would hurt less.
I’m a bit confused, maybe by your use of the word “magically,” since you say there was no cheating. If so, Carl found a route to the solution. I’m afraid that if the test did not specify the route, I would have to give him full credit.
We ran into this when developing VBA grading modules. You don’t forsee one possible method for solving the problem, a student uses it, and gets no credit. We gave credit, and built the method into the grading module.
But that’s not magic. That’s an unforseen method. Is that what you’re talking about?