Professor W. Stephen Wilson did an interesting cursory study (the PDF is here). Professor Wilson is a math professor at Johns Hopkins. He obtained the SATM scores for his Calculus I for the Biological and Social Sciences students in 1989 and 2006, and gave his 2006 students the same final exam he gave his 1989 students. The percentage of the students taking the class was essentially the same in both semesters, as were the SATM scores (although the number of students applying and accepted to Johns Hopkins had increased from 1989 to 2006 by roughly 146%).

The final exam scores were significantly lower in 2006 than 1989:

The 2006 Calculus I class took the same 77-point final exam as the 1989 class. The content of the Calculus I course has not changed, and, mathematically, using the old exam was completely appropriate.

The scores on the final exam were markedly different. The average of the 1989 scores was 48.4, with a standard deviation of 14.4, while the 2006 class average was 42.5, with a standard deviation of 11.3. The 5.9 point decrease in the average is a 12.2% decline. Daniel Naiman also ran the Wilcoxon test of significance on these two distributions and found a p-value of .001 for the two-sided test. [The p-value of less than 0.05 tells us that the difference is statistically significant, that is that it is not due to random variation.]

Here is a histogram of both semester scores showing the distribution:

He compares this difference to a similar difference in SATM scores to make his point:

How significant is this change educationally? Contemplate a similar drop in SATM scores. SATM scores range from 200 to 800. If there had been a 12.2% drop over the 17 years from the recentered SATM score of 662.3, the 2006 class would have an average SATM score of 605.9 (= 662.3 - .122 x (662.3 - 200)).

But the interesting thing (to me, at any rate) is that the SATM scores did not drop from 1989 to 2006:

The average SATM score for the 1989 Calculus I class was 662.6 with a standard deviation of 6.8. For the 2006 Calculus I class it was 664.9, with a standard deviation of 6.3. In the mid-1990s, SATM scores were “recentered,” [Rec07]. After recentering the 1989 class’s SATM scores, the new average was 662.3, with a standard deviation of 6.5.

Professor Wilson discusses the possible causes for this discrepancy, but what interests me is that the calculus final exam scores dropped significantly while the SATM scores did not. The distinction between the two exams Professor Wilson focuses on is that the SATM allows the use of calculators while the course final exam does not. I’m not sure this factor all by itself can account for this difference, because using calculators on exams is like looking things up on an open book exam: You lose what you gain because of wasted time.

How similar is the calculus content on the SATM and the content of the final exam? The SATM is a more comprehensive exam, and cannot devote as many questions to calculus. A fairer comparison (if it were possible) would be to somehow score the calculus questions on the SATM and compare those scores to the final exam scores.

Yet this decrease in final exam scores should have been reflected to some extent in the SATM scores. I have to wonder if this “recentering” of scores is somehow responsible for this. Of this topic, Diane Ravitch says:

For many years the College Board insisted that the Scholastic Assessment Test was “an unchanging standard.” But no more. The latest SAT scores, released last week, are the first to be graded on a new curve–one that destroy’s the test’s “unchanging standard.”

Two years ago, the College Board–decided to “recenter” the scores by arbitrarily declaring that the 1990 scores on both the verbal and mathematical portions of the test would serve as the new average. The fairly robust math score of 475 was transformed overnight to a 500, and the anemic verbal score of 424 also was lifted to 500. With the stroke of a pen, extremely poor performance on the verbal portion of the test was turned into the new norm.

So “recentering” the scores was inflating them. This would at least partially explain why the SATM scores did not decrease while the final exam scores did.

Still, university faculty, secondary school faculty, parents, and yes, College Board should be concerned about this difference. Somebody should research this, find out if it is a national trend, and if so, try to correct it.

10 Responses to “Test Discrepancies”
  1. Unfortunately he gave the exam after the class was over, so the test measures previous knowledge, plus what he taught.

    Even in the content of his class was the same, do we know if the textbook was as well?

    I wonder if there is some unmeasured background knowledge that the SAT doesn’t measure, but his test did?

    My math SAT score was 600 in 1988, but I was also a C- student in High School. Yeah, I am an idiot.

  2. rory @ parentalcation on March 29, 2007 at 4:32 pm said:

    Unfortunately he gave the exam after the class was over, so the test measures previous knowledge, plus what he taught.

    Even in the content of his class was the same, do we know if the textbook was as well?

    I wonder if there is some unmeasured background knowledge that the SAT doesn’t measure, but his test did?

    My math SAT score was 600 in 1988, but I was also a C- student in High School. Yeah, I am an idiot.

    He said nothing about the textbook, but interestingly, the grading curve had been changed (he calculates what grades the 2006 students would have gotten had they been graded on the 1989 curve, and vice versa), though I didn’t mention it because it wasn’t really relevant to what I focused on.

    And the discrepancy between the two scores rules out blaming this on pedagogy (were that the problem, the discrepancy would be much smaller).

    But somebody needs to research this.

  3. I’m dense.

    The Calculus class in which the students scores decreased was a college course? And the implication is that this is the fault of the high school curriculum? If that is what you are saying I guess I can imagine how that might be possible.

    I wonder if there is some unmeasured background knowledge that the SAT doesn’t measure, but his test did?

    Isn’t the SAT limited to the kinds of questions that can be puzzled out in about 60 seconds? A Calculus final at a university might test “higher order thinking skills” on problems that require a bit more thought. If high school math class were designed to “test to the SAT” then there isn’t any pressure to get students involved in more profound problems.

    Off topic, I’ve been trying to find online articles that address my idle speculation about this.

  4. I certainly agree…

    I had a quick look at disaggregated SAT score trends, but I realized that the participation rates would skew these.

    The horrible thought that occured to me, John Hopskins probably became even more selective over the years, and more students take AP calculus.

  5. skh.pcola says:

    I saw a conversion table for SATs taken in different years/time frames once. I think it was on MENSA’s website. I was trying to find out what my 1983 1420 score was worth in today’s terms. Needless to say, it was much higher.

    The CB has been screwing around with the scores forever, trying to make successive class years look like they are getting smarter…there’s nothing new here.

    rightwingprof, one reason the SATM scores might’ve remained the same, but the actual calculus test scores went down is because more kids study “to the test” today than they used to.

  6. Another prof says:

    Really not that interesting a study. *One* professor. It’s odd indeed to assume that the test changed its validity, rather than that one professor’s teaching changed.

  7. Really not that interesting a study. *One* professor. It’s odd indeed to assume that the test changed its validity, rather than that one professor’s teaching changed.

    Well yes, it is interesting precisely because it’s unexpected, and because it reflects the experience of so many faculty over the last ten or fifteen years. And if you think it odd that the test changed its validity, as you put it, then you have little experience with College Board or ETS.

  8. The Calculus class in which the students scores decreased was a college course? And the implication is that this is the fault of the high school curriculum?

    Whoa, back up. Where did I say anything about the high school, much less the curriculum?

    I hope I didn’t give the impression that I was responding directly to you. I was responding to the original post. The focus of the discussion seemed to be on the College Board recentering scores, meaning the students are getting out of high school less educated with further evidence provided by this decline in end of Calculus test scores. I am not trying to assert this so much as figure it out.

  9. Myrtle on March 30, 2007 at 10:07 am said:

    The Calculus class in which the students scores decreased was a college course? And the implication is that this is the fault of the high school curriculum?

    Whoa, back up. Where did I say anything about the high school, much less the curriculum?

    I hope I didn’t give the impression that I was responding directly to you. I was responding to the original post. The focus of the discussion seemed to be on the College Board recentering scores, meaning the students are getting out of high school less educated with further evidence provided by this decline in end of Calculus test scores. I am not trying to assert this so much as figure it out.

    I read a couple of other things he’d written. He seems to focus on calculators.

  10. Hey, prof.

    I just found a long post by a math prof at Georgia speculating that it’s the AP courses contributing to declining math abilities in college….it’s at my blog. Also, I’m linking to you in my side bar.