I had a paper on predicting a more frequent event (death by suicide) rejected a week or so ago. A quite expert reviewer noted that there had been …. five reviews recently. Which all make the same point.
If some investigative reporter shoved a microphone and camera in my face and asked “how is it mental health professionals can’t predict this stuff better?” my answer would be:
“I will bet a bazillion dollars, right now, that the last patient I saw in my office today will not go rent a hotel room outside a country music concert and start shooting everyone in the crowd from his window tonight.”
Do you understand?
Let’s unpack this a little more. The number of variables that contribute to an incident like the Las Vegas concert shooting over the weekend, (or Gabrielle Giffords shooting, or Aurora Colorado, or Newtown, or…) are so complex, so unweighted, so numerous, so difficult to tease out every interaction, that even if you had the most sophisticated super computer and plugged in every piece of data we have on every individual in the US, you could not predict who will be the next Jared Loughner.
It is a matter of specificity vs sensitivity. I could, with the help of some super smart colleagues develop an algorithm that would be very sensitive at detecting those who are at an elevated risk for behaving this way. But how elevated? Imminent? Probably in the next 5 days? Maybe in the next 5 years? And if you employed such an algorithm you would detain an enormous amount of people who are simply pinging my test as risky above some level higher than the baseline population. But that brings us to the next point.
I work in a higher risk area, so I don’t gamble. The problem is that our tools cannot be that precise, because of Bayes’ Theorem. To expand, using a classic example.
The probability that a patient has HIV is 0.001 and the diagnostic test for HIV can detect the virus with a probability of 0.98. Given that the chance of a false positive is 6%, what is the probability that a patient who has already tested positive really has HIV?
Let’s first formulate the problem in formal terms. Let DD be the event that the person has the disease, then DcDc denotes the event that the person doesn’t have the disease. Let YY be the event that the test gives the positive result (person has the disease as per the test diagnostic) and NN be the event that the test gives the negative result.
Now let’s write down the given information.
We have to find P(D|Y)P(D|Y).
Now we’ll use Bayes’ theorem to find the required probability.
as DD and DcDc are mutually exclusive events and together form a partition of the sample space. Using the given values, we have
The prevalence of suicide in the general population is about 1:10 000. The prevalence of murder is a tenth of that. The prevalence of mass murder… is lower, and I am unsure if anyone has done the math. With events at this low rate of occurrence, the chance is almost certain that any person who is considered high risk is not going to do such an act. But there is also a false negative rate: some people who you think will never do such an event will do such, and there are more of them in most societies than people who are considered “high risk” and detained under various laws. Scott’s perfect test, at this low rate, will be… useless.