How to distort statistics, again.

This study is reported as a non inferiority study demonstrating that Cognitive Therapy and Dynamic Therapy over 12 sessions have roughly the same effect. The face validity for this in the USA is high, as many clinicians do dynamic, not cognitive therapy, as consequence of the methods of training they underwent.
They care about theorectical models. And when I say that a shared formulation of the situation — a shared understanding of why this is happening to this person now — should drive treatment choice they look at me as if I’m odd.

This is from the protocol

The goal of this study is to conduct a randomized, comparative, non-inferiority clinical trial that tests the hypothesis that a widely used form of manualized dynamic psychotherapy (supportive-expressive psychodynamic therapy) is not inferior to cognitive therapy when implemented in community mental health settings for the treatment of major depressive disorder (MDD). The specific aims are (1) to conduct a randomized non-inferiority trial to compare supportive-expressive psychodynamic therapy and cognitive therapy for patients with MDD and (2) to assess the comparative effectiveness of supportive-expressive psychodynamic therapy and cognitive therapy on secondary measures of symptoms, patient functioning, and quality of life.

This is from the abstract.

Among the 237 patients (59 men [24.9%]; 178 women [75.1%]; mean [SD] age, 36.2 [12.1] years) treated by 20 therapists (19 women and 1 man; mean [SD] age, 40.0 [14.6] years), 118 were randomized to DT and 119 to CT. A mean (SD) difference between treatments was found in the change on the Hamilton Rating Scale for Depression of 0.86 (7.73) scale points (95% CI, ?0.70 to 2.42; Cohen d, 0.11), indicating that DT was statistically not inferior to CT. A statistically significant main effect was found for time (F1,198?=?75.92; P?=?.001).

But they then report a significant effect size for quality of life in favour of CT.

Screenshot 2016-08-04 11.32.59

THe text around the result is contradictory. You would have to discuss corrections to disavow a statistically significant difference at the usual cut off, but to then say that a similar result is significant is a nonsense. I wonder how this got through peer review. Please note I’ve removed some numbers relating to degrees of freedom: the paper is open source, and is correctly formatted.

Despite small observed effect size differences between the treatments, we cannot conclude that DT was statistically noninferior to CT on change on the BASIS-24 (Cohen d=?.14; 95% CI upper bound, 0.35), the QOLI total score (Cohen d?=0.22; 95% CI upper bound, 0.43), or the SF-36 Mental Component score (MCS) (Cohen d=0.15; 95% CI upper bound, 0.36) . We found a statistically significant main effect for time on the BASIS-24 (F=133.32; P=.001), the QOLI (F=44.55; P=.001), and the SF-36 MCS (F=60.52; P=.001). Superiority of CT over DT was not demonstrated for change on the BASIS-24 (F=1.07; P=.30), the QOLI (F=4.18; P=.04), or the SF-36 MCS (F1=0.049; P=.48). Dynamic psychotherapy was significantly noninferior to CT on the SF-36 Physical Component score (PCS) (Cohen d=0.07; 95% CI upper bound, 0.14; p=.03); however, both treatments demonstrated significant (but slight) deterioration across time (F=5.19; P=.02).

This is disappointing at best. We need, as clinicians, to know that we are offering the best and most appropriate treatment. Tne null hypothesis of non inferiority is challenged on some outcomes. If you are not interested in outcomes such as Quality of Life, don’t measure them.

The authors are enamoured with their power calculations, and are using effect size inappropriately. They have screwed the numbers by pushing the alpha (p value) to0.025, and the effect size to 0.29, when they should know that many trials with good control group equivelance have effecti sizes around 0.20.

We followed Hirotsu’s unifying approach to include a test of noninferiority followed by a subsequent test for treatment superiority only in the case for which noninferiority is not obtained. For this multiple decision process, the ? level was set a priori at .025 to account for the 2 decisions. The noninferiority of the secondary measures was evaluated using an a priori defined margin of Cohen d effect size of 0.29, which represents a small to moderate effect.

Power calculations used the formula of Julious to guarantee a power of 80% for assessing noninferiority and superiority while accommodating the repeated-measures design.62,63 Included in the formula were the noninferiority bound of 2.5 HAM-D points defined a priori, a pooled SD set at 8.5, ? set at .025, an attrition rate of 10%, repeated assessments, and an estimated within-subject correlation of 0.40. Sample size was determined to be 230 subjects.

Another paper with fundamental flaws. My irritation with the current editors of high impact journals is growing.

One thought on “How to distort statistics, again.

  1. Ugh. I’ve been commenting on the limitations of peer review for a long time–sad to say it seems I’ve got something of a point. Love the abuse of stats, too. We can’t possibly use standard statistical power and significance values, can we?

    (everyone, repeat after me; nobody gets their doctorate, or tenure, or that big promotion by retaining the null hypothesis unless the study sponsor wants it that way…..understand that bias and you’ll understand a LOT)

Comments are closed.