Archive for the ‘Math’ Category.

Then And Now, Part Two

Looking through my high school algebra text is really quite fascinating. The last time I saw this book, I was a high school freshman, and I don’t recall having any reaction to the textbook, other than reading it and working through the problems. My perspective now is very different, after having taught for over two decades and reviewed God knows how many textbooks.

This is New Math. Not New New Math, but Cold War space race New Math, the only math pedagogy that has ever been designed by mathematicians. Yes, it had its weaknesses, which I will get to sometime in the future, but none in common with the weaknesses of today’s math pedagogy. Whereas today, students cover far too many topics and only shallowly, New Math covered fewer topics to great depth (perhaps in some cases, too much depth, but that’s for another article). Whereas math today is self-consciously non-linear and non-logical, New Math was coldly linear and logical. Whereas math today has no methodology and allows Janie to figure out her own way to solve the problem, New Math drilled a strict, step-by-step methodology. But let’s turn to the textbook.

First, the text is accurately named: Modern Algebra: A Logical Approach. This is not just a math textbook. Formal logic is built into each chapter. I’ll address the logic-specific problems in the future. Now, I want to address the problems, and the method we were taught to use when solving them. By the way, not only did we not use calculators, but they didn’t exist (we learned to use slide rules in high school).

Let’s look at that example problem I scanned yesterday. Here it is:

Let’s look at the methodology, which is the same, and was in general the same as what we were taught to do all the way back in elementary school. One of the characteristics of New Math is that even in elementary school, we learned the basic principles of algebra (even though that’s not what they were called) because we tackled each problem with the same step-by-step methodology. By the time we got to freshman algebra, we knew what you did to one side of the equation you had to do to the other, and so forth. We just didn’t know what “algebra” was until we got to high school.

First, we declared the variable. This may seem needlessly formalistic, but it isn’t. What, say, if you have a problem with two variables? You have to solve one in terms of the other, then substitute. If you declare the variables, it’s much easier then not to get confused about which one represents which value in the problem. In other words, declaring the variable clarifies the goal of our solution, and focuses us on that goal.

Next, we set up the problem directly from the text.

Next, we multiply both sides by 35.

Let’s stop here for a minute. Why did we multiply both sides by 35–or more to the point, how would we know to multiply both sides by 35? Because we had thoroughly learned fractions and the principle of the LCD long before we got to freshman algebra. I mention this because both fractions and LCD are topics that have fallen out of favor in current math curricula. Our teacher would have, if working through this, asked us how we would have gotten rid of the fractions. We would have known that the way to do that was multiply both sides by 35, because we had already worked with fractions a great deal, and to a relatively high level of complexity. So to us, this step would not have presented a cognitive leap.

Again, the remaining steps were the same general methodology we had been using all through elementary school. There were no surprises. The only thing “new” about methodology in algebra was the use of variables. We had already learned the basic principles–without calling them algebra.

Focus on Part 2. From what I can determine, students today “check” problems by guestimating the answer. We didn’t. This problem exemplifies one of the two ways in which we had been taught to check our solution: We substituted the solution and worked it to see if we got the correct answer. Again, we had been doing this since elementary school. It wasn’t new.

The other way we checked problems was to work it backward, but in a different way (sometimes, there is more than one way to work a problem). For example, here is problem 12 from the problems I scanned:

The music department of Eastern High School bought 12 band uniforms and 3 hats at a total cost of $615. Later the department bought 3 uniforms and 2 hats at a total cost of $165. If the two orders were for identical items, what was the cost of each uniform and each hat?

To solve this problem, we would have first solved one variable (the cost of a hat, say) in terms of the other (the cost of a uniform), substituted it, solved for the cost of a uniform, and then repeated the process to solve for the other variable. This reinforces the cold, linear, logical methodology and thought process.

Note, however, that if you know the value of one variable and the total cost, you can calculate the value of the other variable without substitution (in other words, if you know how much you spent total and how much you spent on hats, then you can use simple subtraction to figure out how much you spent on uniforms). This is an example of a problem that has more than one route to a solution. When checking a problem like this, we would have checked it by taking another route backwards, instead of the same route we used when solving it.

Why? Because it reinforces that there is more than one logical progression of steps to the answer. Note that there is nothing “fuzzy” about this. Many problems can be solved in more than one way. New Math focused on the logical process and the mathematics behind the problem. Our teachers wanted us not only to understand the mathematics, but they also wanted us to see the different routes to the solution–because that’s when you really understand the mathematics.

My point here is the methodology and its consistency throughout the math curriculum. Going from elementary to junior high math, then junior high math to algebra was a smooth, continuous process because we focused on the linear, logical progression of steps, and we did this all the way throughout the curriculum. New Math was designed as a linear continuum, where new material was not some kind of a leap, but logically progressed from what we had already learned. We knew all about fractions and finding the LCD, so that was nothing new. We knew that what you did to one side of the “equals sign” you had to do to the other, so that was nothing new. We had already spent eight years of math classes in school extracting information from text and turning it into numbers (equations), so that was nothing new.

We probably covered more in less time in freshman algebra than we had before, though here, I’m working on very old memories, and have no elementary New Math textbook from the same era to check. But we didn’t do calculus in high school then. The college math track was algebra I, geometry, algebra II, and “senior math,” which was trigonometry and pre-calculus math. The closest we got to statistics was calculating averages, and the first I learned about probability was as an undergraduate.

Also, the cognitive, logical process we had been using since elementary school immediately transferred to other courses without our having to figure it out, even in high school. When I took my first year of chemistry, nobody had to explain the step-by-step logical process of balancing equations. The variables, constants, and symbols were different, but the process was exactly identical. The logic transferred. The same was true with physics. Nobody had to make that connection for us, because we had learned it so thoroughly.

And this leads me to what is, perhaps, my greatest objection to those who sneer at “drill and kill.” Forget being able to recall what 12*11 is, though that is important. It was precisely this “drill and kill” that taught us this logical, step-by-step process which we could take from math to chemistry and physics and yes, even writing. We learned an analytical way of looking at the world, dissembling a problem to its component parts, and working our way to a solution. And so-called “fuzzy math” can never accomplish that.

Then And Now

I bought a copy of the freshman algebra book we used in high school (Pearson and Allen, Modern Algebra: A Logical Approach, 1961). Here’s a problem example dealing with fractions (and before you complain about the quality of the scan, you try scanning a book bound in 1961–they don’t make ‘em like that anymore):

and here are some of the problems we did:

Look at 13, which says:

Why is it that you cannot find the cost of one ball and one bat in this case although you could find the cost of one uniform and one hat in Exercise 12?

This is excellent. You’re not asked to solve the problem here. Instead, you’re asked to explain why you cannot solve this problem, but could solve a similar problem. In other words, you’re being asked to think analytically about the problem–about the mathematics behind the problem. If “higher-level thinking” meant anything that had any pedagogical usefulness, this is what it would mean.

Also note that the problems are literate–that is, they’re written in educated Standard English:

Were he to reverse the amounts, the yield would be only $330 per year.

That speaks less to the math and more to the place of literacy in education and the way educators treated students (then) seriously, rather than condescending to them with “hip” nonsense and pathetically trying to be “relevant.”

Now, contrast with these current examples of 8th grade math:

In a couple of paragraphs, explain how you would estimate the square root of 170.

x2 + 2y = 10

What do you notice about the expression?

In the first, the student is asked to do no math. Instead, he’s asked to talk about it. Note that he’s not asked to think about it–just write a narrative about how he would approach the problem. Also note that the embedded problem he doesn’t have to solve is to estimate, and not find, the square root of 170. The second problem isn’t a problem, just as the first isn’t a problem, and it’s ambiguous. What is the answer supposed to be? That the expression contains an x and a y? That it’s a quadratic equation?

And you wonder why we geezers can make change and these kids can’t?

How Our Brains Confuse Us

Bad Science has an excellent article up about how we are hardwired to find patterns:

But we have an innate human ability to make something out of nothing. We see shapes in the clouds and a man in the moon; gamblers are convinced that they have “runs of luck”; we can take a perfectly cheerful heavy metal record, play it backwards, and hear hidden messages about satan. Our ability to spot patterns is what allows us to make sense of the world but sometimes, in our eagerness, we can mistakenly spot patterns where none exist.

Back in grad school, this was a frequent topic of conversation among the cogsci crowd (specifically, the connectionists). Here’s an experiment you can do to demonstrate it to your class (particularly useful in a stats class, by the way).

Open up Excel. In cell AI, type “rand” and “Toss 1″ in cell B1. Click on B1, and drag the little rectangle in the lower right-hand corner through cells G1 (or type “Toss 2″ and so forth in the cells). In cell A2, under the “rand” label, enter =rand(). Grab the little rectangle in the lower-right corner of A2 and pull it down through A31. This will give you 30 random numbers.

In cell B2, under the “Toss 1″ label, type the following function:

=if($A2>0.5,”H”,”T”)

Grab the little rectangle in the lower right-hand corner of B2 and double-click it (it will fill in the function all the way down to the last random number). Grab the little rectangle in the lower right-hand corner of the selection, and drag it across to column G. Your results will look like this (I hid some rows to make it smaller and easier to see):

rand Toss 1 Toss 2 Toss 3 Toss 4 Toss 5 Toss 6
0.162217 T T T T T T
0.050143 T T T T T T
0.905826 H H H H H H
0.739022 H H H H H H
0.261197 T T T T T T

Now, select the range B2:G31, click Edit, Copy. Go to the next worksheet, and click in cell A1. Click Edit, Paste Special. Select the Values radio button, and press OK. You now have six series of thirty random coin tosses, more specifically, Bernoulli trials.

Here’s how the experiment works. Show students the trials on the second sheet, one column at a time. Ask students to look at the coin tosses and decide whether they are random or not. Give them a couple of minutes, then ask for a show of hands. At least nine-tenths of your students will say they are not random. Do this for each of the columns, then go back to the first worksheet and show the students how the data were created, and that they are, in fact, random.

Two related things are happening. First, the students’ brains are extracting patterns where there are none. Second, because we do this automatically, we don’t see “random” unless we see H T H T H T, where there is, ironically, a pattern.

If there is one fundamental concept about probability most people not only don’t understand, but have difficulty accepting, it is the independence of trials. This is a good way to get your students to get past their brains and understand that all coin tosses are independent of one another. You can appeal to their intellect by saying that the coin doesn’t “remember” how it came up the time before, but in my experience, that doesn’t overcome their natural tendency to see patterns where none exist.

Unanswerable Questions

There are really good questions, good questions, not so good questions, somewhat clueless questions, and questions that indicate brain death. We get enough of the latter two that surely, the four questions you most frequently want to ask (but never do) are, "Did you read the problem?" "What planet do you live on?" "Where have you been for the last six weeks?" and "How did you graduate from high school?"

Imagine that you are teaching a class at the university, and we’ll say you’re about six weeks into the semester. Most of your students have been attending class regularly (as most of mine always have). After having modeled this sort of problem countless times in class, and after students have worked this sort of problem quite a few times, you give them the following problem to do on their own to prepare for the coming exam.

Leary Chemical manufactures three chemicals: A, B, and C. These chemicals are produced via two production processes: 1 and 2. Running process 1 for an hour costs $4 and yields 3 units of A, 1 unit of B, and 1 unit of C. Running process 2 for an hour costs $1 and yields 1 unit of A and 1 unit of B. To meet customer demands, at least 10 units of A, 5 units of B, and 3 units of C must be produced daily. Determine what daily production will minimize costs.

Immediately, hands shoot up in the air, and you, along with your peer tutors, start running around the room to help students–and you get questions like this:

"How are we supposed to know how to solve this?"
"How do we know how much of each product to make?"
"What does ‘yields’ mean?"
"How do we know what customer demands are?"

And those are, of course, in addition to the ever present, "I don’t understand."

Most students at this stage are tackling the problem and at least have some clue about how to solve it ("How do we know what the constraints are?" would be a somewhat clueless question, since it does indicate that the student at least has an idea what kind of problem it is). But there are always those students who have been sitting right there in the classroom along with the other students who are still lost. You have to wonder what has been happening between their ears all those classes they sat through when you covered the topic, modeled the topic, and they worked on the topic on their own. Were they thinking about a new pair of shoes they wanted to buy, last night’s kegger, or the hot babe in their economics class?

Or consider this one:

Lessen Waist, Inc. produces low-fat cereals, which they sell in 12-ounce (weight) boxes. Because of settling and production scheduling, Lessen Waist cannot weigh every box of cereal, and 0.35 ounces (weight) is considered to be an acceptable variance from the advertized weight. Lessen Waist weighs a subset of boxes because the filling machines must be adjusted periodically. Use the 100 sample weights below and the appropriate statistical tests to determine if the boxes of cereal are within the acceptable weight. If they are not, use the appropriate statistical tests to determine how much the filling machines need to be adjusted.

Case
Weight
1 11.7273
2 12.1073
3 12.7418
4 13.0993
5 12.2189
6 12.5931
7 11.7320
8 11.7906
9 13.0690
100 12.6230

And you get questions like this:

"How many calories are in each box of cereal?"
"Why does it say weight after 12-ounce and 0.35 ounces?"
"What do you mean by settling?"
"How do we know what the acceptable weight is?"
"What do you mean by appropriate statistical tests?"

Frustrating enough, though not nearly so frustrating as the above questions, is that invariably, some students will turn in the following:

Weight
Mean 11.59893
SE 0.092277
Median 11.72967
Mode #N/A
Stdev 0.922774
Sample Variance 0.851512
Kurtosis 0.097091
Skewness -0.19386
Range 4.841527
Minimum 9.140988
Maximum 13.98252
Sum 1159.893
Count 100
95% CL 0.183098

then:

“12 - 11.59893 = 0.4010657, and 0.4010657 is greater than 0.35, so the machines need to be adjusted by 0.4010657 - 0.35, or 0.05106.”

So no, they didn’t even address either of the problems, much less answer them, but they want partial credit. Sure, they made a stab at it, but they demonstrate that they’ve been daydreaming for the last six weeks and completely miss the whole point and purpose of statistics. And these are the students who at least understand enough about statistics that they can find the data analysis toolpak in Excel and run it on the data, even if that doesn’t address the problem, but that’s rather like saying of a drunk driver that at least he knew enough to turn the key in the ignition; the ones who ask the questions above don’t get this far. And even the students who merely did the descriptive stats (as above) demonstrate their cluelessness when you try to help them and say, “Well, that would work fine for those hundred boxes, but we need to know about all of the boxes.” Deer in the headlights time.

Again, this isn’t all or most students. But it is enough students to be worrying, and that number of students has been slowly growing over the years. Do students have no experience with reading and solving problems in class–by themselves, and without the teacher or a classmate telling them how to do everything? Is that what’s going on? Are students asked to read problems, extract the data, and solve them using their brains? Are students unused to reading a problem critically and analytically?

And what has been going on in their heads all those classes they’ve sat through? How can you sit through six or eight weeks of statistics class and fail to understand the point and purpose of statistics, much less everything covered in class?

I’m just asking. I have no answers to these questions.

Just Because I Feel Like Ranting

Originally published September 17 2006:

Some eight years ago, I attended a series of presentations (not by choice) given by the ed school diversity police. At one, we got the party line on “learning styles/modalities,” presented with no evidence to back it up because like contrastive rhetoric, there is no evidence to back it up.

A particularly grumpy faculty member–who also happened to be a Dean at the time–asked the presenter what I, and no doubt many others, were thinking. He said, “Other than the fact that you have no evidence to support this, so what? We have material to cover. We barely have enough time as it is. We certainly don’t have time to present the material in each style just to make it easier for some of the students. So what do you want us to do with this information?”

Another one of these presentations was given by the feminonsense police, and covered how men are “goal-oriented,” and women are “process-oriented.” She and her co-feminuts, along with a few cooperative feminized males, presented a “role play” that began with a normal, goal-oriented meeting (of men) where the problem was addressed, a solution was agreed upon, and men were assigned to implement the solution. The next “role play” was feminuts having a meeting with no goal or purpose, other than to make each other feel good, and even though it was ostensibly to address the same problem as the first meeting “role play,” the feminuts ended the meeting without ever addressing a solution. Finally, there was the final, two-part “role play,” in which both sexes took part. In the first of the two-parter, the feminuts chose to shut up and sit there like lumps when the men insisted on having a meeting with a goal and purpose, and tackling the problem. In the second part of the two-parter, the men acquiesced to the “process oriented” meeting, the “issues” were discussed, feelings about the “issues” were shared, no solution was ever mentioned much less discussed, and nothing was accomplished (of course). The second of the two-parter was presented as how men could be more “sensitive” to women in meetings. When confronted with the fact that the “sensitive” meeting was unproductive, the feminuts accused the questioner of being patriarchal, and avoided the issue.

Ignore the man behind the curtain!

Both of these presentations illustrate why “being sensitive to our differences” (codename: diversity) is destructive to education.

As far as “learning styles” go, the unnamed Dean said it best at the time. Unless you have very little material to cover, in which case you shouldn’t be teaching the class in the first place, you don’t have the time to screw with, or worry about, such nonsense–especially when it is motivated by no evidence at all.

As far as “goal-oriented” v. “process-oriented” goes, education is, by definition, goal-oriented. “Process-oriented” approaches rarely produce a result.I’m hedging, since I do not know of one single case in which a “process-oriented” approach has resulted in a solution They are, by definition, unproductive–given that solving a problem of some kind is the goal of education, and if women truly are “process oriented” (and I’m not accepting that, given that there are so many logical women in the world, and have always been), then it is one purpose of education to teach them to be goal-oriented thinkers.

This “diversity” obsession is particularly destructive when it rears its inefficient, navel-gazing, narcissistic head in math education.

As knowledge systems go, math is the prototypical, linear system. Each skill builds upon others, so mastering a skill requires that one has already mastered previous skills. Math is essentially Aristotelian in nature, however patriarchal and serial raping and penis waving that may be.

Fifty percent of the reason for teaching any math skill, then, is because mastery of that skill will be required for the mastery of other skills down the road. While little Johnny may be a macaroni art learner or little Michelle may be a crayon and poster board project learner, allowing (worse, encouraging) little Johnny to solve the math problem by gluing macaroni to a toilet paper tube is counter-productive to fifty percent of the reason for covering the skill in class (and presenting Johnny with the problem). While Michelle’s crayon and poster board project may be very cute and creative, she learns no useful skill from doing it, and her failure to master the skill will handicap her later down the road. Educrats will then point to evil patriarchal traditionalist math teachers, Michelle’s sex, Michelle’s parents, Michelle’s socioeconomic status, the lack of technology in the classroom, or conservatives in general and blame them for “disadvantaging” poor little Michelle–when their own nutty educration methods are responsible. (For the latest example of fuzzy-headed, illogical educrat whining, see here.)

Repeat after me: There is no such thing as “mindless” drilling, or “mindless” rote memorization. Nothing about memorization or drilling is “mindless.” Rote memorization gives us domain knowledge, with which we can build other skills. Drilling is learning. Both teach discipline, both strengthen connections (there’s your neuroscience reference), and both build the skills necessary to solve problems.

When you can point to anyone in the real world solving a real-world problem by creating macaroni art, then by all means, object. I have a hard time trying to think of an example of anyone taking a complex problem and solving it “holistically,” or by sitting around in a matriarchal, vagina monologues-emulating, “process oriented” meeting, much less by making a cute, creative, crayon and poster board project. But please, let me know if you can think of any examples.

Educrats are fond of throwing around the phrase, “problem-solving skills,” yet seem to believe that every problem is unique, and unrelated to every other problem–as, indeed, you must believe if you think that macaroni art is, or ever can be, a problem-solving skill. We can see an example of this in this nonsense from the NEA:

A student well versed in algebra might do the following to set up the problem: p = pigs, c = chickens. p + c = 70 (heads) 4p + 2c = 200 (pigs have 4 legs and chickens have 2 legs). These two equations may be used to solve the problem. Students might solve this problem by “guessing and checking,” or drawing pictures. Some methods of solving problems might be considered more “efficient.” That may be true, but the correct answer can be found using multiple methods. Children think about mathematics in different ways depending on their prior experiences at home and school. By allowing students to think flexibly about numbers, we encourage them to “own” the math forever, instead of “borrowing” until class is over.

Allowing multiple methods encourages failure–because, again, math is wholly linear, and skills build upon other skills. Allowing students to “own” math means not teaching them math at all.

The linearity of math means that there is exactly one method, and only one method, for any given skill:Yes, I realize that one may approach a conditional bottom-up or top-down, or that one may calculate a problem with different series of steps, or put steps in different orders. that symbol manipulation which must be mastered not only to solve the current problem, but to master other skills down the road. It makes no difference if little Johnny would rather glue macaroni on toilet paper tubes. It makes no difference if little Michelle is a crayon project-oriented learner. Only one method accomplishes the entire reason for teaching the skill in the first place.

But teaching math has an even more basic function than math itself, and always has: Learning math is learning that step-by-step, logical approach to problem-solving, an approach whose applications far exceed the scope of mathematics. Problem-solving is its own knowledge system, and math is the best way to learn that knowledge system. Math teaches us to take a complex problem and simplify it by dissembling it. Math teaches us to take a complex problem and by writing equivalent statements, clarify it and the path to its solution. Math teaches us the progression of logical steps (remember all those proofs in geometry?) Math is coldly and unforgivingly logical–”close to the right answer” is an absurdity in math, where there is the right answer and there is every other, equally wrong, answer–and gives us problem-solving skills we will use throughout our lives.

Mathematics has, for this reason, been a cornerstone of education since the Greeks. Crayon and poster board projects accomplish nothing other than allowing Michelle to get an A without having mastered the content.

And doing all those cute projects ensures that little Johnny and little Michelle will go through life devoid of those invaluable problem-solving skills, that Aristotelian logic, and that they will be crippled for the rest of their lives. Is making them feel more comfortable by letting them glue macaroni to cardboard tubes really worth that?

Using Statistics To Improve Teaching

Statistics are an invaluable tool for improving your teaching and making your class fairer for your students. With statistics, you can identify bad test questions and throw them out. You can identify questions many students should have gotten right but did not, and determine what went wrong. You can determine how well the assignments you give your students work, and you can determine how well you are preparing students for those state exams.

Let’s start with that 100-point test you just gave. Here are your descriptive stats:

Exam Score
Mean 73.88
SE 2.49
Median 73.60
Mode 100.00
Stdev 24.94
Sample Variance 621.80
Kurtosis -1.66
Skewness -0.17
Range 69.30
Minimum 30.70
Maximum 100.00
Sum 7388.10
Count 100
95% CL 4.95

Your mean is just a little low (ideally, it should be in the mid 70s), but not low enough for concern. Your mode (most frequently occurring score) is 100, and that’s always good, but your standard deviation is large: Each score on average varied 24.94 points from the mean, and that’s a lot of spread. Your kurtosis is a bit low, too, and along with the large standard deviation, it looks like you have a lot of scores in the tails. It’s not a bad exam from just looking at the descriptive stats, though you would have liked to have had more students clustering around the mean.

Next, look at the correlations between the individual questions and the total score (yes, I know all about collinearity, but this is justified). Let’s pick two questions to look at, Question 11 and Question 35, and run Spearman correlations between the questions and the Exam Score:

 
Question 11
Question 35
Exam Score
Question 11 1
Question 35 0 1
Exam Score 0.02 0.94 1

When we don’t see the effects of collinearity, there’s a problem. Note that the correlation coefficient between Question 11 and the Exam Score is 0.02! There is something bad wrong with that question. First, pull the exam and read the question; usually, when this happens, it’s pretty obvious what went wrong, often a typo or a badly worded question, but sometimes a question that goes beyond the scope of what you covered. Once in a great while there will be nothing wrong with the question. If that is the case, leave it in, but otherwise, delete it from the exam and the exam results. Note that Question 35 highly correlates with the Exam Score. Leave it in. Do this for all the questions, deleting any that have suspiciously low correlations after you read the question and determine if there is anything wrong with it.

As you’re doing this, you will usually notice that there are questions on topics you covered in class that students should have gotten right, but did not. This is a pedagogical red flag (when this happens with questions only from reading assignments, it indicates that students didn’t do the reading, and I find that these are the questions most students miss). How did you cover those topics? How can you change your presentation to make it clearer to your students? Go through all the questions fewer than half the students got correct, and run them through the same process. Compare the questions on similar topics. If students missed many of the questions on the same topic, that’s a sign that there’s a problem with the way you present the topic.

Use statistics to tell you how well you’re presenting the material.

You can also use statistics to determine how effective those assignments you give your students are. Let’s say you’ve just given your first 200-point exam, and before that, you had given several assignments (we’ll look at three). Your data look like this (the table represents part, not all, of your data):

Assignment 1
Assignment 2
Assignment 3
Exam Score
3.50 3.50 8.36 78.40
34.30 34.30 6.40 200.00
34.80 34.80 9.42 183.20
12.80 12.80 22.95 149.60
29.30 29.30 14.33 200.00
27.20 27.20 24.51 133.20
6.35 6.35 6.59 117.80
0.20 0.20 3.27 89.20
7.25 7.25 21.55 109.40
31.60 31.60 4.29 200.00
17.30 17.30 1.54 88.80
26.15 26.15 19.34 109.80
33.70 33.70 4.65 200.00
6.10 6.10 9.68 77.60
39.50 39.50 17.78 200.00
25.70 25.70 6.33 112.40
13.05 13.05 0.89 82.60
7.15 7.15 0.45 79.40
18.25 18.25 17.45 105.80
19.60 19.60 16.11 96.40
26.75 26.75 2.18 187.40
22.60 22.60 3.95 120.80
42.90 42.90 17.68 200.00
29.70 29.70 13.34 200.00

Run Pearson correlations on the assignments and exam:

  Assignment 1 Assignment 2 Assignment 3 Exam Score
Assignment 1 1
Assignment 2 0.99 1
Assignment 3 -0.07 -0.05 1
Exam Score 0.86 0.86 0.02 1

If your assignments are effective (and if they cover the same skills covered on the exam), you should get at least a 0.5 Pearson correlation coefficient between the assignments and the exam score. Assignments 1 and 2 correlate pretty highly, but note the third assignment. There is nearly no correlation between it and the exam score. This is a great big red flag, so compare the three assignments. It’s not enough just to ditch the third assignment and replace it with something else; you need to figure out what is wrong with the third assignment. What is different about the third one? How are the first two similar–and how is the third different from the first two? Whatever it is, it’s not working.

Note that you can use exactly the same method to determine how well your assignments and exams are teaching students what they need to know by running correlations on your students class scores and their standardized exam scores. You can also determine which teachers are better preparing their students. Here are two teachers’ 100-point final exam scores and the standardized exam scores (only part of the data are represented):

T1 Exam Score
T2 Exam Score
Standardized Exam Score
50.80 100.00 93.32
44.60 17.00 67.77
46.70 51.00 93.64
54.00 100.00 95.86
49.00 100.00 64.67
100.00 99.00 100.00
39.70 33.00 86.63
73.80 57.00 100.00
44.00 100.00 68.95
43.30 100.00 72.85
100.00 10.00 100.00
90.60 100.00 100.00
100.00 96.00 100.00
100.00 51.00 100.00
54.10 100.00 96.49
37.30 37.00 64.80
100.00 30.00 100.00
46.20 15.00 63.13
100.00 100.00 100.00
40.90 100.00 56.34
68.70 20.00 99.02
100.00 100.00 100.00
100.00 10.00 100.00
100.00 72.00 100.00

First, let’s look at the descriptive stats:

T1 Exam Score
T2 Exam Score
Standardized Exam Score
Mean 73.88 Mean 64.93 Mean 88.83
SE 2.49 SE 3.58 SE 1.65
Median 73.60 Median 81.00 Median 100.00
Mode 100.00 Mode 100.00 Mode 100.00
Stdev 24.94 Stdev 35.84 Stdev 16.53
Sample Variance 621.80 Sample Variance 1284.39 Sample Variance 273.26
Kurtosis -1.66 Kurtosis -1.53 Kurtosis 0.36
Skewness -0.17 Skewness -0.38 Skewness -1.31
Range 69.30 Range 97.00 Range 58.94
Minimum 30.70 Minimum 3.00 Minimum 41.06
Maximum 100.00 Maximum 100.00 Maximum 100.00
Sum 7388.10 Sum 6493.00 Sum 8882.69
Count 100.00 Count 100.00 Count 100.00
95% CL 4.95   95% CL 7.11   95% CL 3.28

Both teachers’ scores are lower than the standardized exam scores, and this can be a good thing, provided that the class exams are covering the right material and preparing students for the standardized exam. Both have fairly high standard deviations, though the second teacher’s is higher than the first, both have a low kurtosis, usually indicating more data in the tails, and both are slightly left skewed, indicating more data in the left (low) tail than the right. Note that the second teacher’s minimum score is 3/100! From only looking at the descriptive stats, it looks like the second teacher probably has a more difficult class than the first. But difficulty isn’t the issue; how well the teacher’s class matches the state curriculum is the issue. To check that, we run correlations:

 
T1 Exam Score
T2 Exam Score
Standardized Exam Score
T1 Exam Score 1
T2 Exam Score 0.06 1
Standardized Exam Score 0.75 0.17 1

We see a vast difference between the two teachers. The first teacher’s scores correlate highly with the standardized exam score, at 0.75. This means his curriculum fairly closely matches what the state prescribes. But the second teacher’s curriculum doesn’t correlate highly with the state curriculum at all, at only 0.17. The second teacher should sit down with the first and compare what they do, to see where he is going astray from the curriculum.

Universities often give departmental exams to large undergraduate classes. The same method can be used if you teach one of those classes to see how well you are teaching what you’re supposed to be teaching.

The point I’m trying to make is that statistics are more than just a tool for research. Statistics are an important tool that tell you how well you’re teaching, how well your curriculum matches the states’, and how fair your tests are, and all by doing nothing more complicated than running descriptive stats and correlations. Statistics are the laser grips that allow you to shoot in the dark.

Test Discrepancies

Professor W. Stephen Wilson did an interesting cursory study (the PDF is here). Professor Wilson is a math professor at Johns Hopkins. He obtained the SATM scores for his Calculus I for the Biological and Social Sciences students in 1989 and 2006, and gave his 2006 students the same final exam he gave his 1989 students. The percentage of the students taking the class was essentially the same in both semesters, as were the SATM scores (although the number of students applying and accepted to Johns Hopkins had increased from 1989 to 2006 by roughly 146%).

The final exam scores were significantly lower in 2006 than 1989:

The 2006 Calculus I class took the same 77-point final exam as the 1989 class. The content of the Calculus I course has not changed, and, mathematically, using the old exam was completely appropriate.

The scores on the final exam were markedly different. The average of the 1989 scores was 48.4, with a standard deviation of 14.4, while the 2006 class average was 42.5, with a standard deviation of 11.3. The 5.9 point decrease in the average is a 12.2% decline. Daniel Naiman also ran the Wilcoxon test of significance on these two distributions and found a p-value of .001 for the two-sided test. [The p-value of less than 0.05 tells us that the difference is statistically significant, that is that it is not due to random variation.]

Here is a histogram of both semester scores showing the distribution:

He compares this difference to a similar difference in SATM scores to make his point:

How significant is this change educationally? Contemplate a similar drop in SATM scores. SATM scores range from 200 to 800. If there had been a 12.2% drop over the 17 years from the recentered SATM score of 662.3, the 2006 class would have an average SATM score of 605.9 (= 662.3 - .122 x (662.3 - 200)).

But the interesting thing (to me, at any rate) is that the SATM scores did not drop from 1989 to 2006:

The average SATM score for the 1989 Calculus I class was 662.6 with a standard deviation of 6.8. For the 2006 Calculus I class it was 664.9, with a standard deviation of 6.3. In the mid-1990s, SATM scores were “recentered,” [Rec07]. After recentering the 1989 class’s SATM scores, the new average was 662.3, with a standard deviation of 6.5.

Professor Wilson discusses the possible causes for this discrepancy, but what interests me is that the calculus final exam scores dropped significantly while the SATM scores did not. The distinction between the two exams Professor Wilson focuses on is that the SATM allows the use of calculators while the course final exam does not. I’m not sure this factor all by itself can account for this difference, because using calculators on exams is like looking things up on an open book exam: You lose what you gain because of wasted time.

How similar is the calculus content on the SATM and the content of the final exam? The SATM is a more comprehensive exam, and cannot devote as many questions to calculus. A fairer comparison (if it were possible) would be to somehow score the calculus questions on the SATM and compare those scores to the final exam scores.

Yet this decrease in final exam scores should have been reflected to some extent in the SATM scores. I have to wonder if this “recentering” of scores is somehow responsible for this. Of this topic, Diane Ravitch says:

For many years the College Board insisted that the Scholastic Assessment Test was “an unchanging standard.” But no more. The latest SAT scores, released last week, are the first to be graded on a new curve–one that destroy’s the test’s “unchanging standard.”

Two years ago, the College Board–decided to “recenter” the scores by arbitrarily declaring that the 1990 scores on both the verbal and mathematical portions of the test would serve as the new average. The fairly robust math score of 475 was transformed overnight to a 500, and the anemic verbal score of 424 also was lifted to 500. With the stroke of a pen, extremely poor performance on the verbal portion of the test was turned into the new norm.

So “recentering” the scores was inflating them. This would at least partially explain why the SATM scores did not decrease while the final exam scores did.

Still, university faculty, secondary school faculty, parents, and yes, College Board should be concerned about this difference. Somebody should research this, find out if it is a national trend, and if so, try to correct it.

More Common Sense, Please

Periodically, the topic of teaching statistics in the primary and secondary schools comes up at Kitchen Table Math. I’m torn on the issue. If you’re going to teach something, then do it — that’s the general way I feel about teaching anything. And that’s exactly what they’re not doing in the primary and secondary schools (statistics is more than means, medians, modes, and graphs). On the other hand, kids are coming out of schools without basic arithmetic knowledge, so why waste time on statistics, whether you really are teaching it or not?

However, there are times when something makes me feel nobody should be allowed to graduate from high school without statistics. The new Colgate ad campaign is one of those things.

If you haven’t seen it, they claim on their ads that dental health has been linked to cardiac health (and something else), implying that if you buy Colgate and brush your teeth with it, you won’t keel over dead from a heart attack at age fifty.

That’s crap — and let me show you why.

Let’s say we’ve got this study of 100 test subjects, and two variables for each: dental health index and cardiac health index. The first problem is that both variables are actually groups of related variables. How many times you brush your teeth each day and for how long, whether you floss or not, how many cavities you’ve had, these and other things comprise the dental health index; family history of heart disease, what you eat, how much you exercise and how, these and other things comprise the cardiac health index.

So our data look something like this:

Subject
Dental Index
Cardiac Health Index
1 63 56
2 31 25
3 35 27
97 52 47
98 15 7
99 19 18
100 64 57

We calculate Pearson correlations, and see this:

  Dental Index Cardiac Health Index
Dental Index 1
Cardiac Health Index 0.99347832 1

Wow, look at that correlation coefficient! There has to be a relationship! Obviously, if you brush your teeth a lot you won’t get a heart attack!

Er, wrong. Because researchers are often academics, and because that old adage about academics’ lacking common sense is more than just a little true, and here because Colgate wants to sell you toothpaste, research sometimes draws bizarre and unwarranted conclusions. Forget statistics. Step back for a minute and ask yourself this: If there is a relationship between dental health and cardiac health, does it make sense to say that brushing your teeth will stop heart attacks?

Of course not, unless there is some dental health gene and some cardiac health gene and the two are somehow linked. So what common sense reason is there for this correlation?

Well, there’s a third variable: How many loads of laundry you do a week. Here is the correlation matrix:

  Loads Laundry per Month Dental Index Cardiac Health Index
Loads Laundry per Month 1
Dental Index 0.982919963 1
Cardiac Health Index 0.985634829 0.99347832 1

And fancy that! The correlations between how many loads of laundry you do a week and the dental and cardiac health indices are almost as high as the correlation between the two health indices! Do more laundry and you won’t die of a heart attack!

Ask yourself this: What do people with good dental health, good cardiac health, and who wear clean clothes have in common?

They take care of themselves.

So the next time you see some article or commerical about a study, before you swallow it undigested, step back and apply a little common sense.

I can hear you now. That’s ridiculous. Nobody would make such a claim. Really? How about this, then?

Chew on this next time you’re idling in the drive-thru line: Cars on U.S. roads must burn nearly on billion additional gallons of gas a year because of overweight drivers and passengers. That was the conclusion of University of Illinois computer science professor Sheldon Jacobson, who, with colleague Laura McLay, used a mathematical model to combine federal data on gas consumption and weight gain from 1960 to 2002. They found that the average American’s weight jumped by more than 24 pounds over the period and that as a group, we now pump at least 938 million more gallons a year than we did in 1960. A relative drop in the gas bucket (about three days’ worth of passenger car consumption), but it’s unnecessary. Want to do something today to boost fuel economy? Eat fewer cookies.

Apply common sense liberally. Thanks.

Reading First In Madison

Ken DeRosa took a journalist to task for inaccuracies in her article about four Reading First schools in Madison, Wisconsin (go here for all the relevant information) then pointed me to the reading proficiency data the state of Wisconsin reported for all schools (the Wisconsin data are here). I downloaded the data, cleaned them up in Excel, and ran the stats, comparing the 98-99 and 04-05 school years. They reported four proficiency levels: minimal, basic, proficient, and advanced. We are interested in the percentages testing proficient or above (proficient+ in the tables below), so I added the percent proficient and percent advanced, and analyzed those data.

Before I go on, let me quickly address why we must analyze the data statistically, and cannot just report means. If we gave the same kids the same proficiency exams on two different days, say only a week apart, their scores would be different. Anytime we see a difference between scores, without statistics, we do not know if those differences are due to random variation or not. We cannot without statistics point to two different scores or means and say, "See? The scores increased!"

Also, let me mention a few crucial points.

  • The more data we have, the more reliable our statistical analysis will be (this will become an issue later on).
  • Means (averages) alone do not give us a complete picture, particularly when they are means of aggregated data, as these are (this is why I look at other descriptive statistics).
  • Statistics always deals with probability (uncertainty), and we calculate our statistics to a specific probability, 95% here (sometimes statistics are calculated to a 99% probability). This is the level of significance (alpha), here, 0.05, or 5%.
  • We are assuming here either that the proficiency exam standards did not change between the two years or that the proficiency reports for the two years are comparable (if they are not, then Wisconsin cannot make any statement about their proficiency levels over time — and we will address this later).

Celebrate!

Happy Pi Day! (it’s March 14 — 3.14 — get it?)

Good Article

From the Economist (which I’ll reproduce, because it will become inaccessible soon). Why so much medical research is rot:

PEOPLE born under the astrological sign of Leo are 15% more likely to be admitted to hospital with gastric bleeding than those born under the other 11 signs. Sagittarians are 38% more likely than others to land up there because of a broken arm. Those are the conclusions that many medical researchers would be forced to make from a set of data presented to the American Association for the Advancement of Science by Peter Austin of the Institute for Clinical Evaluative Sciences in Toronto. At least, they would be forced to draw them if they applied the lax statistical methods of their own work to the records of hospital admissions in Ontario, Canada, used by Dr Austin.

Dr Austin, of course, does not draw those conclusions. His point was to shock medical researchers into using better statistics, because the ones they routinely employ today run the risk of identifying relationships when, in fact, there are none. He also wanted to explain why so many health claims that look important when they are first made are not substantiated in later studies.

The confusion arises because each result is tested separately to see how likely, in statistical terms, it was to have happened by chance. If that likelihood is below a certain threshold, typically 5%, then the convention is that an effect is “real”. And that is fine if only one hypothesis is being tested. But if, say, 20 are being tested at the same time, then on average one of them will be accepted as provisionally true, even though it is not.

In his own study, Dr Austin tested 24 hypotheses, two for each astrological sign. He was looking for instances in which a certain sign “caused” an increased risk of a particular ailment. The hypotheses about Leos’ intestines and Sagittarians’ arms were less than 5% likely to have come about by chance, satisfying the usual standards of proof of a relationship. However, when he modified his statistical methods to take into account the fact that he was testing 24 hypotheses, not one, the boundary of significance dropped dramatically. At that point, none of the astrological associations remained.

Unfortunately, many researchers looking for risk factors for diseases are not aware that they need to modify their statistics when they test multiple hypotheses. The consequence of that mistake, as John Ioannidis of the University of Ioannina School of Medicine, in Greece, explained to the meeting, is that a lot of observational health studies—those that go trawling through databases, rather than relying on controlled experiments—cannot be reproduced by other researchers. Previous work by Dr Ioannidis, on six highly cited observational studies, showed that conclusions from five of them were later refuted. In the new work he presented to the meeting, he looked systematically at the causes of bias in such research and confirmed that the results of observational studies are likely to be completely correct only 20% of the time. If such a study tests many hypotheses, the likelihood its conclusions are correct may drop as low as one in 1,000—and studies that appear to find larger effects are likely, in fact, simply to have more bias.

So, the next time a newspaper headline declares that something is bad for you, read the small print. If the scientists used the wrong statistical method, you may do just as well believing your horoscope.

Need A Good Laugh?

The Anchoress has a collection of creative solutions to math problems. I’m still laughing.

Burger King Math (long)

When you teach a course and you are responsible for creating the materials, assignments, and exams, you look at problems differently — more analytically, and from various perspectives one wouldn’t normally. One of the ways in which you analyze problems is in terms of difficulty or complexity. And there’s more to it than most realize.

Let’s look at this whole thing cognitively and take as our first example what most would consider to be a relatively simple statistics problem: