Machine learning over suicide risk assessment?

This is a proof of concept study, which data mined discharge letters from two Harvard affiliated hospitals, looking words for positive (happy) and negative (sad) value, and predicting risk of suicide by quartile.

Their outcome was death by suicide, and they did find they could predict a high risk group.

yoi160060f1

In this cohort, which spans approximately 2.4 million patient-years, we developed a model based on coded clinical data that predicts suicide and accidental death among patients discharged from academic medical centers at a rate substantially exceeding chance, with an area under the curve of approximately 0.73. To our knowledge, postdischarge risk for suicide death in large nonpsychiatric cohorts has not previously been modeled. We further found that addition of uncoded clinical data reflecting positive and negative valence available in general hospital discharge notes modestly improved prediction of these outcomes, suggesting more generally the potential usefulness of augmenting models using coded data only with concepts extracted from narrative clinical notes.

Among the coded data, we confirmed multiple clinical features previously associated with risk, such as male sex and white race, illustrating assay sensitivity and consistency with prior epidemiological investigations of suicide. Likewise, as anticipated, any psychiatric visit and prior psychiatric treatment were individually associated with substantial increase in risk. Because we sought to predict suicide death (and not unsuccessful attempt, as in most past efforts), our results are difficult to compare directly with prior studies….

Notably, 115 of 235 (48.9%) suicide deaths in the present study occurred among individuals with no coded data reflecting psychiatric International Classification of Diseases, Ninth Revision diagnostic codes in this health system. This finding is consistent with prior reports that, while individuals who die by suicide often have contact with a health professional, the clinician is likely to not be a psychiatrist or therapist,14 which underscores the importance of psychiatric expertise in the general hospital setting.

m_yoi160060t5

A couple of caveats. Correlation is not causation. Negative words don’t cause suicide, and given that there was 235 (0.1% period prevalence) deaths by suicide after 2.4 million patient/years of followup, the rate of suicide in the high risk is still low. To get decent statistical power, the authors conflated suicide and accidental death.

Watch this space: standard risk assessment helps little. I would want to see replicated clinical trials, with decent control states showing interventions in the high risk group decrease death not attempts or distress before changing one’s policy on machine learning.