Update: Thanks to a couple of sharp-eyed readers, I corrected a couple of decimal place errors. That’s what I get for even thinking about math on pain meds, I guess.
My doctor did a stint as an epidemiologist, so he’s a statistician (most doctors are not, by the way), and we always end up talking statistics. He turned me on to a great term today: The NPR number.
He was telling me how he was listening to NPR, and they reported, of course in a hysterical manner, that the occurrence of negative side-effects in women who were on hormone replacement treatments had increased by 4%. He yelled at the radio that fewer than 1 in 10000 women had those negative side-effects in the first place, so the “increase” was ridiculously small, and said the 4% was the “NPR number.”
That’s a great phrase. It reminds me of something we did at the beginning of the semester. I do this when introducing Bayes’ Theorem of conditional probability. It doesn’t have much to do with business (the example, not Bayes), but it is an easily understood context for which most people will immediately believe they know the answer.
We have a test for a disease (we’ll call it Jones Syndrome), and the test is 99% accurate, but it returns a false positive in 1% of those tested (that is, 1% of the time the test returns a positive, the disease is not present). If I test positive, what is the probability that I have Jones Syndrome?
Like I said, I’ve done this many times, and at least half the hands in the class will fly up, and all of those people will say, “99%.” The thing is, they’re all very, very wrong.
First, what, exactly, do we mean when we say that the test is “99% accurate”? We mean that out of 100 people who have Jones Syndrome, the test will give 99 a positive result. That relative clause in bold is crucial, because it leads us to the next point, that we are missing a very important piece of information.
How prevalent is Jones Syndrome, that is, what is the probability of my having it, irrespective of any test result? We’ll say that 1 in 10000 have Jones Syndrome, so the probability of having Jones Syndrome is 0.01%, or 0.0001. Note that we do not know the probability that the test will return a positive, regardless of whether the person has the disease or not (we’ll calculate it), but we’ll call the probability that the test returns a positive P(B). (By the way, I’m on pain meds and I did all of the math in my head — if I screwed anything up, let me know.)
We’ll use the following shorthand.
P(A) — The probability that I have the disease
P(B) — The probability that the test will return a positive result
What we eventually want to calculate is:
P(A|B) — The probability of A given B, that is, the probability that I have the disease if I got a positive test result
In order to calculate this, we have to know P(B), the probability that the test will return a positive result. In order to calculate it, we have to do some quick calculations.
P(A) = 0.0001 (the probability that I have the disease)
P(~A) = 1 - P(A) = 0.9999 (the probability that I don’t have the disease)
P(B|A) = 0.99 (the success rate of the test, or the probability that it will return a positive if I have the disease)
P(~B|A) = 1 - P(B|A) = 0.01 (the probability that the test will return a negative result if I have the disease)
P(B|~A) = 0.1 (false positive rate)
P(~B|~A) = 1 - P(B|~A) = 0.9 (the probability that the test will return a negative if I do not have the disease)
What we need to calculate is P(A|B), the probability that I have the disease if I have gotten a positive test result. Here’s the formula:
P(B|A) * P(A) / P(B)
Oops! We need to calculate P(B), the probability that the test will return a positive, whether I have the disease or not:
P(B) = P(B|A) * P(A) + P(B|~A) * P(~A)
P(B) = 0.99 * 0.0001 + 0.1 * 0.9999
P(B) = 0.100089
So now let’s calculate the probability that I have the disease if I have gotten a positive test result, P(A|B).
P(A|B) = P(B|A) * P(A) / P(B)
P(A|B) = 0.99 * 0.0001 / 0.100089
P(A|B) = 0.00098912, or 0.0989%
So the probability that I have the disease given that I got a positive test result is 0.00098912, or 0.0989%.
That’s wildly different from 99%, is it not? I think the reason so many people leap to the conclusion that I have a 99% probability of having the disease if I got a positive test result for five reasons: 1) We tend to make snap judgments; 2) We think in terms of generalities and not specifics; 3) We don’t really think much about the situation when it is presented to us, so we don’t realize that we’re missing vital information; 4) Even if we do realize we’re missing that information, we don’t know how to integrate it, and arrive at a result (nobody did until the Reverend Bayes came along); and finally, 5) Our society enables fuzzy thinking about — even utter incompetence in — math.
Bayes isn’t arcane. It isn’t even calculus. It’s basic arithmetic operations: addition, subtraction, multiplication, and division. Yet, introduce it in class, and eyes glaze over almost as soon as you start.
I’m not suggesting anyone should be able to do Bayes in his head. I am saying, however, that in this high tech age when we are being bombarded with the results of studies every day, and base so many of our own decisions on those studies, that reporters on NPR or Fox should be knowledgable enough that they don’t report “NPR numbers.” While I’m at it, it would be nice if more listeners yelled that fewer than 1 in 10000 women had those negative side-effects in the first place. I mean really, there is just no excuse for hearing the same junk statistics over and over again for years, you know, like women are paid 75 cents for every dollar a man makes, or those ridiculous stats about sexual assault (One out of every three women will be raped in her lifetime! Every second, over 1500 women are raped!)
Maybe I’ll write to Santa Claus.