So, this is the moment you’ve all been waiting for. A while ago I wrote a comment on an article that was published in Homeopathy. This article, among other things, purported to show that the authors of a Lancet meta-analysis (Aijin Shang and co-workers) that had negative results for homeopathy had engaged in post-hoc hypothesising and data dredging. That was an outrageous slur on what is a perfectly reasonable paper, if you understand it properly. My comment has now been published, along with a response from the authors. If anyone needs a copy of my comment and doesn’t fancy paying for it, drop me a line and I’ll bung you a PDF. In any case, the original version appears on my blog here.
Meanwhile, the reply by original authors Rutten and Stolper is an exercise in evasion and obfuscation, and doesn’t really address most of the points that I made. This seems to be fairly typical (and to be fair isn’t only restricted to non-science like homeopathy). In their original paper, Rutten and Stolper claimed that “Cut-off values for sample size [i.e. the number of subjects in a trial, above which the trial was defined as “large”] were not mentioned or explained in Shang el al's [sic] analysis”. This is simply not true. So what do Rutten and Stolper have to say about this embarrassing error?
“Wilson states that larger trials were defined by Shang as “Trials with SE [standard error] in the lowest quartile were defined as larger trials”. According to Wilson this was done to predefine 'larger trials'. We agree with Wilson that this is indeed a strange way of defining 'larger trials', but it is perfectly possible to simply define larger studies a priori according to sample size in terms like 'above median' as we suggested in our paper. Shang et al did not mention the sensitivity of the result to this choice of cut-off value: if median sample size (including 14 trials) is chosen homeopathy has the best (significantly positive) result, if 8 trials are selected homeopathy has the worst result. In the post-publication data they mentioned sample sizes but not Standard Errors. Isn't it odd that the authors did not mention the fact that homeopathy is effective based on a fully plausible definition of 'larger' trials, but stated that it is not effective based on a strange definition of 'larger', but that this was not apparent because of missing data?”
So, nothing there about how they failed to properly read the paper to check what Shang et al.’s definition of larger trials was, while essentially accusing them of research misconduct. Instead, they shift the goalposts and decide that they don’t like the definition that was provided. Now, it certainly would be possible to define larger studies as being “above median” sample size. By doing this you would be including studies of smaller size than would be included using Shang’s definition. As is well understood, and as Shang et al. clearly showed, including studies with smaller sample size will give you more positive but, crucially, less reliable results. So I don’t think it was particularly odd that Shang et al. failed to abandon their definition of larger trials in favour of someone else’s definition, published three years later, that would inevitably lead to less reliable results. Rutten and Stolper state that using 8 larger, high quality trials gives the worst results for homeopathy: but to get a positive result, you would have to include at least 14 trials, as Ludtke and Rutten show in another paper in the Journal of Clinical Epidemiology. And, again, it was perfectly apparent what definition Shang et al. used to define larger trials: it is clearly stated in their paper.
OK, so why use standard error rather than simply using sample size directly, as Rutten and Stolper want to do? In meta-analyses, a commonly used tool is a funnel plot. This plots, for each study included in the analysis, standard error against odds ratio. The odds ratio is a measure of the size of the effect of the intervention being studied. If the value is 1, there is no effect. If it is less than one, there is a positive effect (the intervention outperformed placebo), if greater than one there is a negative effect (placebo outperformed the intervention). The plot is typically used to identify publication bias (and other biases) in the set of trials: to simplify, if the plot is asymmetric, then biases exist. Using their funnel plot of 110 trials of homeopathy (Figure 2 in the Lancet paper), Shang et al. were able to show, (to a high degree of statistical significance, p<0.0001)that trials with higher standard error show more positive results. It then makes perfect sense to screen the trials by standard error rather than sample size, because it has been demonstrated that standard error correlates with odds ratio. Of course, you could plot sample size against odds ratio, but that is not the recommended approach.
Rutten and Stolper also claim to be "surprised" that one apparently positive trial of homeopathy was excluded from Shang's analysis. Since it was excluded based on the clearly stated exclusion criteria, I didn't find that surprising myself. How do Rutten and Stolper respond?
"We were indeed amazed that no matching trial could be found for a homeopathic trial on chronic polyarthritis by Wiesenauer. Shang did not specify criteria for matching of trials. We would expect the authors to explain this exclusion because Wiesenauer's trial would have made a difference in meta-regression analysis and possibly also in the selection of the eight larger good quality trials".
This routine is now wearily familiar. Someone makes a claim that Shang et al. didn’t do something, in this case specify criteria for matching of trials; I check the Lancet paper, and find that claim to be false. What did Shang have to say about matching of trials? On page 727, they say “For each homoeopathy trial, we identified matching trials of conventional medicine that enrolled patients with similar disorders and assessed similar outcomes. We used computer-generated random numbers to select one from several eligible trials of conventional medicine”. And, of course, the authors did explain why the trial was excluded; it met one of the pre-defined exclusion criteria. To me, that seems clear enough. As it stands, Rutten and Stolper’s point is nothing more than an argument from incredulity. They are amazed! Amazed that no matching trial could be found. But they haven’t actually found one to prove their point. It’s possible that this Weisenauer trial might have made a difference to the selection of 8 large, high quality trials. But I doubt it would have made any significant difference to the meta-regression analysis, which was based on 110 trials.
Having wrongly accused Shang et al. of doing a bad thing by defining sub-groups post-hoc, Rutten and Stolper applied all kinds of post-hoc rationalisations for excluding trials they don’t like. For example, they decided to throw out all the (resoundingly negative) trials of homeopathic arnica for muscle soreness in marathon runners, on the basis that homeopathy is not normally used to treat healthy people, and these trials therefore have low external validity. I argued that Shang et al. had to include those studies, since they met the inclusion criteria and did not meet the exclusion criteria. On what basis could they exclude them? From Rutten and Stolper, answer came there none:
"Wilson's remark about prominent homeopaths choosing muscle soreness as indication is not relevant. Using a marathon as starting point for a trial is understandable from a organisational point of view, although doubt is possible about external validity. Publishing negative trials in alternative medicine journals is correct behaviour. There is, however, strong evidence that homeopathic Arnica is not effective after long distance running and homeopathy as a method should not be judged by that outcome".
Yes, publish the negative trials. But why shouldn’t the negative trials be included in a meta-analysis? Because they’re negative, and that just can’t be right? I don’t see any rationale here for excluding these trials.
Rutten and Stolper also take the tine-honoured approach of arguing about statistics:
“…the asymmetry of funnel-plots is not necessarily a result of bias. It can also occur when smaller studies show larger effect just because they were done in a condition with high treatment effects, and thus requiring smaller patient numbers”.
I think this is nonsense, but anyone with more statistical knowledge should feel free to correct me. If the high treatment effects are real, then the larger studies will show them as well, and there will be no asymmetry in the funnel plot. The smaller studies are always going to be less reliable than the larger ones.
Finally, Rutten and Stolper conclude that:
"The conclusion that homeopathy is a placebo effect and that conventional medicine is not was not based on a comparative analysis of carefully matched trials, as stated by the authors".
Homeopaths do want this to be true, but no matter how many times they repeat it, it continues to be false. I think the problem is that they have become fixated on the analysis of the subgroup of larger, higher quality trials, which was only one part of the analysis. The meta-regression analysis for all 110 vs 110 trials gave the same results; the analysis of the “larger, higher quality” subgroup merely lends support to those results. So after all that palaver, there’s still no reason to think that there is anything particularly wrong with the Shang et al. Lancet paper, and there is certainly no excuse for accusing its authors of research misconduct.