Rutten and Stolper  have conducted a re-analysis of the data used in the landmark Lancet meta-analysis (Shang et al.)  of trials of homeopathy and conventional medicine. However, their approach to this work seems to have been influenced by a belief that the Shang analysis was deliberately skewed against homeopathy, and in favour of conventional medicine. I argue here that the evidence does not support that contention, and that the re-analysis by Rutten and Stolper does not show that the Shang et al. study was invalid.
Rationale for the re-analysis
In the abstract of their paper, Rutten and Stolper state “There is a discrepancy between the outcome of a meta-analysis published in 1997 of 89 trials of homeopathy by Linde et al and an analysis of 110 trials by Shang et al published in 2005, these reached opposite conclusions”, and on page 170 they write “The contradiction between Linde's conclusion based on 89 trials, and Shang et al's conclusion, based on 110 trials seems odd”. But there is nothing particularly surprising about this discrepancy. The Linde paper referred to was published in the Lancet in 1997 . The same team re-analysed the data in a paper published in 1999 . They concluded that because trials of higher methodological quality had smaller effect sizes, and that because a number of newly published high-quality trials showed negative results for homeopathy, their meta-analysis had over-estimated the effectiveness of homeopathy. Hence there is no reason to see to the discrepancy between Shang et al. and Linde et al. (1997) as being particularly “odd”.
Rutten and Stolper make statements about the “pre-specified hypotheses” of the Shang et al. study, but these are not consistent through the paper. In the introduction, they state:
“The hypotheses predefined mentioned in the introduction of Shang et al's paper were: ‘Bias in conduct and reporting of trials is a possible explanation for positive findings of placebo-controlled trials of both homeopathy and allopathy (conventional medicine)’; and: ‘These biases are more likely to affect small than large studies; the smaller a study, the larger the treatment effect necessary for the results to be statistically significant, whereas large studies are more likely to be of high methodological quality and published even if their results are negative’.”
Yet, in Rutter and Stolper’s section on “Pre-specified hypotheses” they include “quality in homeopathy is worse than in conventional medicine” as a hypothesis of Shang et al., and say that this hypothesis was falsified in the Shang et al. study. This is a straw man: it is not a hypothesis that was discussed in the Shang et al. paper, and Rutten and Stolper have missed the point of including a matched set of trials of conventional medicine. As Rutten and Stolper state (p. 170) “Pooling of results is…questionable if homeopathy works for some conditions and not for others”. This is a reasonable point. However, it is clear that some experimental conventional treatments work and some do not. The results of the analysis of conventional medicine were not consistent with the placebo hypothesis, showing that it is possible to obtain a positive result using the methods of Shang et al., even there is considerable heterogeneity in the results .
Rutten and Stolper make the claim that the sub-sets of larger, higher quality studies were chosen post-hoc, presumably to make homeopathy appear less effective than it really is. In their paper, Rutten and Stolper state [p. 172-173]:
“Cut-off values for sample size were not mentioned or explained in Shang el al's [sic] analysis. Why were eight homeopathy trials compared with six conventional trials? Was this choice predefined or post-hoc? Post-publication data showed that cut-off values for larger higher quality studies differed between the two groups. In the homeopathy group the cut-off value was n = 98, including eight trials (38% of the higher quality trials). The cut-off value for larger conventional studies in this analysis was n = 146, including six trials (66% of the higher quality trials). These cut-off values were considerably above the median sample size of 65. There were 31 homeopathy trials larger than the homeopathy cut-off value and 24 conventional trials larger than the conventional cut-off value. We can think of no criterion that could be common to the two cut-off values. This suggests that this choice was post-hoc.”
The first thing to note is that it is not true that cut-off values for sample size were not mentioned or explained in the Shang et al. analysis. In the original Shang paper, on page 728, it is stated that “Trials with SE [standard error] in the lowest quartile were defined as larger trials”. In other words, the cut-off was not defined in terms of numbers of subjects, but in terms of standard error. It might be argued that this is a strange way of defining “larger” trials (and perhaps it should have been phrased as “lower standard error”). But it makes sense when criteria must be stated a priori. If a number of subjects were stated as a cut-off value, there would be no way of knowing how many studies would meet that criterion before looking at the data. You might find that a very large or very small number of studies met the criterion, making further analysis difficult. So, there is no mystery as to why the “cut-off values” were different between trials of homeopathy and trials of conventional medicine: it is because the distribution of standard errors is different between the two populations. This could be discovered simply by reading the original paper, and the conclusion that the groups were chosen post-hoc cannot be sustained.
A further point here is that the group of “larger” homeopathy trials contains smaller trials that would not have made the cut for “larger” trials in the conventional medicine group. Those smaller trials are more likely to show spurious positive results. It follows that had the authors engineered the groups to get the result they wanted, they had engineered them in favour of homeopathy.
Another paragraph in Rutten and Stolper states “We did not further investigate possible selection bias by excluding trials, but we were surprised by the exclusion of Wiesenauer's trial on chronic polyarthritis. This was a larger trial (n = 176), of good quality according to Linde, with positive results. This trial would have contributed positively to the outcome of the larger higher quality trials. Shang excluded this trial because no matching trial could be found” (page 171). Since the trial was excluded on the basis of the clearly stated, pre-specified exclusion criteria, what is surprising about it having been excluded? Including it would have made a nonsense of the design of the study and violated the pre-specified exclusion criteria, and would have been a gross error.
Another possible outcome?
Rutten and Stolper conduct a sensitivity analysis, but, as they note, the decisions they make in this analysis are highly subjective. They decide to exclude all trials of homeopathy for muscle soreness [6-9], on the grounds that “treatment of healthy individuals is very rare in homeopathic practice [and] this outcome has low external validity to judge the effect of homeopathy as a method” (page 173). Yet, the trials were conducted with the participation of prominent homeopaths, and some were published in homeopathic or alternative medicine journals [8, 9], so at least some homeopaths seem to be of the opinion that there is enough external validity for it to be worth conducting a trial. So how can the external validity of the trials be judged in a transparent way? In a meta-analysis based on clear, pre-specified criteria, there could be no justification for omitting the studies.
It is also notable that one of the authors was a co-author of another re-analysis published in the Journal of Clinical Epidemiology . That analysis showed that if random-effects meta-analysis is used, it is possible to add smaller trials to Shang’s set of “larger, higher quality” trials of homeopathy, and get a statistically significant (although clinically unimpressive) benefit for homeopathy. All this really shows is that a finding in favour of homeopathy is not robust, and as Shang et al. showed, including smaller trials also decreases the reliability of the findings. The re-analysis also showed that the benefit for homeopathy was statistically insignificant when a meta-regression analysis was used: this negative finding was strangely not mentioned in the Homeopathy paper. Because the results differed between meta-regression and random-effects analyses, and because Shang et al. showed highly significant evidence of funnel-plot asymmetry in their complete dataset of 110 trials of homeopathy, it is arguable that meta-regression analysis is a more appropriate choice.
Overall, it is clear that “another outcome” (i.e. one favourable to homeopathy) is possible, as long as negative studies are excluded without good reason, smaller and less reliable studies are included, and a particular method of statistical analysis is used. In a paper that (wrongly) criticises a study for analysing data based on criteria established post-hoc, this seems like an odd point to make.
The analysis by Rutten and Stolper contains misconceptions of Shang et al., contains some important errors, and does not show that the Shang et al. study was an invalid analysis. In particular, there is no evidence that the Shang et al. study involved post-hoc choice of subgroups. The results of meta-analyses can be debated, but scientists should not be accused of research misconduct on the basis of no evidence, or on the basis of having failed to read their work properly.
1. Rutten ALB and Stolper CF. The 2005 meta-analysis of homeopathy: the importance of post-publication data. Homeopathy 2008; 97: 169-177.
2. Shang A, Huwiler-Müntener K, Nartey L et al. Are the clinical effects of homeopathy placebo effects? Comparative study of placebo-controlled trials of homeopathy and allopathy, Lancet 2005; 366: 726–732.
3. Linde K, Clausius N, Ramirez G et al. Are the clinical effects of homeopathy placebo effects? A meta-analysis of placebo-controlled trials, Lancet (1997); 350: 834–843.
4. K. Linde K, Scholz M, Ramirez G, Clausius N, Melchart D, Jonas WB. Impact of study quality on outcome in placebo-controlled trials of homeopathy, J Clin Epidemiol 1999; 52: 631–36.
5. Shang A, Jüni P, Sterne JAC, Huwiler-Müntener K, Egger M. Are the clinical effects of homeopathy placebo effects? A meta-analysis of placebo-controlled trials: Author’s reply, Lancet 2005; 366: 2083-2084
6. Vickers AJ, Fisher P, Wyllie SE, Rees R. Homeopathic Arnica 30X is ineffective for muscle soreness after long-distance running – A randomized, double-blind, placebo-controlled trial. Clin J Pain 1998; 14: 227–231.
7. Vickers AJ, Fisher P, Smith C, Wyllie SE, Lewith GT. Homoeopathy for delayed onset muscle soreness - A randomised double blind placebo controlled trial. Brit J Sports Med 1997; 31: 304–307.
8. Jawara N, Lewith GT, Vickers AJ, Mullee MA, Smith C. Homoeopathic Arnica and Rhus toxicodendron for delayed onset muscle soreness - A pilot for a randomized, double-blind, placebo-controlled trial. Brit Hom J 1997; 86: 10–15.
9. Tveiten D, Bruset S, Borchgrevink CF, Norseth J. Effects of the homeopathic remedy Arnica D30 on marathon runners: A randomized, double-blind study during the 1995 Oslo Marathon. Complement Ther Med 1998; 6(2): 71–74.
10. Lüdtke R, Rutten ALB. The conclusions on the effectiveness of homeopathy highly depend on the set of analyzed trials. J Clin Epidemiol 2008; 61: 1197-1204