Statistics are an invaluable tool for improving your teaching and making your class fairer for your students. With statistics, you can identify bad test questions and throw them out. You can identify questions many students should have gotten right but did not, and determine what went wrong. You can determine how well the assignments you give your students work, and you can determine how well you are preparing students for those state exams.

Let’s start with that 100-point test you just gave. Here are your descriptive stats:

Exam Score
Mean 73.88
SE 2.49
Median 73.60
Mode 100.00
Stdev 24.94
Sample Variance 621.80
Kurtosis -1.66
Skewness -0.17
Range 69.30
Minimum 30.70
Maximum 100.00
Sum 7388.10
Count 100
95% CL 4.95

Your mean is just a little low (ideally, it should be in the mid 70s), but not low enough for concern. Your mode (most frequently occurring score) is 100, and that’s always good, but your standard deviation is large: Each score on average varied 24.94 points from the mean, and that’s a lot of spread. Your kurtosis is a bit low, too, and along with the large standard deviation, it looks like you have a lot of scores in the tails. It’s not a bad exam from just looking at the descriptive stats, though you would have liked to have had more students clustering around the mean.

Next, look at the correlations between the individual questions and the total score (yes, I know all about collinearity, but this is justified). Let’s pick two questions to look at, Question 11 and Question 35, and run Spearman correlations between the questions and the Exam Score:

 
Question 11
Question 35
Exam Score
Question 11 1
Question 35 0 1
Exam Score 0.02 0.94 1

When we don’t see the effects of collinearity, there’s a problem. Note that the correlation coefficient between Question 11 and the Exam Score is 0.02! There is something bad wrong with that question. First, pull the exam and read the question; usually, when this happens, it’s pretty obvious what went wrong, often a typo or a badly worded question, but sometimes a question that goes beyond the scope of what you covered. Once in a great while there will be nothing wrong with the question. If that is the case, leave it in, but otherwise, delete it from the exam and the exam results. Note that Question 35 highly correlates with the Exam Score. Leave it in. Do this for all the questions, deleting any that have suspiciously low correlations after you read the question and determine if there is anything wrong with it.

As you’re doing this, you will usually notice that there are questions on topics you covered in class that students should have gotten right, but did not. This is a pedagogical red flag (when this happens with questions only from reading assignments, it indicates that students didn’t do the reading, and I find that these are the questions most students miss). How did you cover those topics? How can you change your presentation to make it clearer to your students? Go through all the questions fewer than half the students got correct, and run them through the same process. Compare the questions on similar topics. If students missed many of the questions on the same topic, that’s a sign that there’s a problem with the way you present the topic.

Use statistics to tell you how well you’re presenting the material.

You can also use statistics to determine how effective those assignments you give your students are. Let’s say you’ve just given your first 200-point exam, and before that, you had given several assignments (we’ll look at three). Your data look like this (the table represents part, not all, of your data):

Assignment 1
Assignment 2
Assignment 3
Exam Score
3.50 3.50 8.36 78.40
34.30 34.30 6.40 200.00
34.80 34.80 9.42 183.20
12.80 12.80 22.95 149.60
29.30 29.30 14.33 200.00
27.20 27.20 24.51 133.20
6.35 6.35 6.59 117.80
0.20 0.20 3.27 89.20
7.25 7.25 21.55 109.40
31.60 31.60 4.29 200.00
17.30 17.30 1.54 88.80
26.15 26.15 19.34 109.80
33.70 33.70 4.65 200.00
6.10 6.10 9.68 77.60
39.50 39.50 17.78 200.00
25.70 25.70 6.33 112.40
13.05 13.05 0.89 82.60
7.15 7.15 0.45 79.40
18.25 18.25 17.45 105.80
19.60 19.60 16.11 96.40
26.75 26.75 2.18 187.40
22.60 22.60 3.95 120.80
42.90 42.90 17.68 200.00
29.70 29.70 13.34 200.00

Run Pearson correlations on the assignments and exam:

  Assignment 1 Assignment 2 Assignment 3 Exam Score
Assignment 1 1
Assignment 2 0.99 1
Assignment 3 -0.07 -0.05 1
Exam Score 0.86 0.86 0.02 1

If your assignments are effective (and if they cover the same skills covered on the exam), you should get at least a 0.5 Pearson correlation coefficient between the assignments and the exam score. Assignments 1 and 2 correlate pretty highly, but note the third assignment. There is nearly no correlation between it and the exam score. This is a great big red flag, so compare the three assignments. It’s not enough just to ditch the third assignment and replace it with something else; you need to figure out what is wrong with the third assignment. What is different about the third one? How are the first two similar–and how is the third different from the first two? Whatever it is, it’s not working.

Note that you can use exactly the same method to determine how well your assignments and exams are teaching students what they need to know by running correlations on your students class scores and their standardized exam scores. You can also determine which teachers are better preparing their students. Here are two teachers’ 100-point final exam scores and the standardized exam scores (only part of the data are represented):

T1 Exam Score
T2 Exam Score
Standardized Exam Score
50.80 100.00 93.32
44.60 17.00 67.77
46.70 51.00 93.64
54.00 100.00 95.86
49.00 100.00 64.67
100.00 99.00 100.00
39.70 33.00 86.63
73.80 57.00 100.00
44.00 100.00 68.95
43.30 100.00 72.85
100.00 10.00 100.00
90.60 100.00 100.00
100.00 96.00 100.00
100.00 51.00 100.00
54.10 100.00 96.49
37.30 37.00 64.80
100.00 30.00 100.00
46.20 15.00 63.13
100.00 100.00 100.00
40.90 100.00 56.34
68.70 20.00 99.02
100.00 100.00 100.00
100.00 10.00 100.00
100.00 72.00 100.00

First, let’s look at the descriptive stats:

T1 Exam Score
T2 Exam Score
Standardized Exam Score
Mean 73.88 Mean 64.93 Mean 88.83
SE 2.49 SE 3.58 SE 1.65
Median 73.60 Median 81.00 Median 100.00
Mode 100.00 Mode 100.00 Mode 100.00
Stdev 24.94 Stdev 35.84 Stdev 16.53
Sample Variance 621.80 Sample Variance 1284.39 Sample Variance 273.26
Kurtosis -1.66 Kurtosis -1.53 Kurtosis 0.36
Skewness -0.17 Skewness -0.38 Skewness -1.31
Range 69.30 Range 97.00 Range 58.94
Minimum 30.70 Minimum 3.00 Minimum 41.06
Maximum 100.00 Maximum 100.00 Maximum 100.00
Sum 7388.10 Sum 6493.00 Sum 8882.69
Count 100.00 Count 100.00 Count 100.00
95% CL 4.95   95% CL 7.11   95% CL 3.28

Both teachers’ scores are lower than the standardized exam scores, and this can be a good thing, provided that the class exams are covering the right material and preparing students for the standardized exam. Both have fairly high standard deviations, though the second teacher’s is higher than the first, both have a low kurtosis, usually indicating more data in the tails, and both are slightly left skewed, indicating more data in the left (low) tail than the right. Note that the second teacher’s minimum score is 3/100! From only looking at the descriptive stats, it looks like the second teacher probably has a more difficult class than the first. But difficulty isn’t the issue; how well the teacher’s class matches the state curriculum is the issue. To check that, we run correlations:

 
T1 Exam Score
T2 Exam Score
Standardized Exam Score
T1 Exam Score 1
T2 Exam Score 0.06 1
Standardized Exam Score 0.75 0.17 1

We see a vast difference between the two teachers. The first teacher’s scores correlate highly with the standardized exam score, at 0.75. This means his curriculum fairly closely matches what the state prescribes. But the second teacher’s curriculum doesn’t correlate highly with the state curriculum at all, at only 0.17. The second teacher should sit down with the first and compare what they do, to see where he is going astray from the curriculum.

Universities often give departmental exams to large undergraduate classes. The same method can be used if you teach one of those classes to see how well you are teaching what you’re supposed to be teaching.

The point I’m trying to make is that statistics are more than just a tool for research. Statistics are an important tool that tell you how well you’re teaching, how well your curriculum matches the states’, and how fair your tests are, and all by doing nothing more complicated than running descriptive stats and correlations. Statistics are the laser grips that allow you to shoot in the dark.

7 Comments

  1. Peggy U says:

    That was very helpful! Thanks!

  2. Darren says:

    But, but, can you run these statistics on a portfolio assessment???

  3. Using Statistics To Improve Teaching says:

    […] Published by Dapp RSS: EducationContinue reading: Using Statistics To Improve Teaching […]

  4. Mike says:

    Darren on April 1, 2007 at 5:48 pm said:

    But, but, can you run these statistics on a portfolio assessment???

    Heh. Just don’t be snarky about essay exams (I teach history).

  5. Highlights From the 113th Carnival of Education at matthewktabor.com says:

    […] Right Wing Nation educates teachers on how they can use basic statistics to evaluate and improve their teaching. Please, teachers: read this post. I’m begging you. […]

  6. jack says:

    great post, but either I’m confused/overthinking or there’s a problem with the last example. I think this represents a dataset in which there are two classes of students, let’s say X each, taking the same subject as taught by different teachers, and then all taking the standardized test at the end. But in that scenario there would be 2X standardized exam scores, not just X. Just curious.

  7. rightwingprof says:

    jack on April 4, 2007 at 6:57 pm said:

    great post, but either I’m confused/overthinking or there’s a problem with the last example. I think this represents a dataset in which there are two classes of students, let’s say X each, taking the same subject as taught by different teachers, and then all taking the standardized test at the end. But in that scenario there would be 2X standardized exam scores, not just X. Just curious.

    You’re correct. It’s not real data. I was just trying to demonstrate.