Wednesday 19 November 2008

I know I said life was too short...

I wrote a little about a paper by Rutten and Stolper recently published in the amusing pseudo-journal Homeopathy. The paper performed the usual homeopath party trick of throwing incorrect allegations of research misconduct at the Shang et al. meta-analysis of homeopathy that was published in the Lancet, while also engaging in dubious statistical analysis. I've now had a little time to put together something a bit more meaty, with proper references and everything, and send it off as a letter to the editor of Homeopathy. I reproduce the text below.

Rutten and Stolper [1] have conducted a re-analysis of the data used in the landmark Lancet meta-analysis (Shang et al.) [2] of trials of homeopathy and conventional medicine. However, their approach to this work seems to have been influenced by a belief that the Shang analysis was deliberately skewed against homeopathy, and in favour of conventional medicine. I argue here that the evidence does not support that contention, and that the re-analysis by Rutten and Stolper does not show that the Shang et al. study was invalid.

Rationale for the re-analysis

In the abstract of their paper, Rutten and Stolper state “There is a discrepancy between the outcome of a meta-analysis published in 1997 of 89 trials of homeopathy by Linde et al and an analysis of 110 trials by Shang et al published in 2005, these reached opposite conclusions”, and on page 170 they write “The contradiction between Linde's conclusion based on 89 trials, and Shang et al's conclusion, based on 110 trials seems odd”. But there is nothing particularly surprising about this discrepancy. The Linde paper referred to was published in the Lancet in 1997 [3]. The same team re-analysed the data in a paper published in 1999 [4]. They concluded that because trials of higher methodological quality had smaller effect sizes, and that because a number of newly published high-quality trials showed negative results for homeopathy, their meta-analysis had over-estimated the effectiveness of homeopathy. Hence there is no reason to see to the discrepancy between Shang et al. and Linde et al. (1997) as being particularly “odd”.

Trial quality

Rutten and Stolper make statements about the “pre-specified hypotheses” of the Shang et al. study, but these are not consistent through the paper. In the introduction, they state:

The hypotheses predefined mentioned in the introduction of Shang et al's paper were: ‘Bias in conduct and reporting of trials is a possible explanation for positive findings of placebo-controlled trials of both homeopathy and allopathy (conventional medicine)’; and: ‘These biases are more likely to affect small than large studies; the smaller a study, the larger the treatment effect necessary for the results to be statistically significant, whereas large studies are more likely to be of high methodological quality and published even if their results are negative’.”

Yet, in Rutter and Stolper’s section on “Pre-specified hypotheses” they include “quality in homeopathy is worse than in conventional medicine” as a hypothesis of Shang et al., and say that this hypothesis was falsified in the Shang et al. study. This is a straw man: it is not a hypothesis that was discussed in the Shang et al. paper, and Rutten and Stolper have missed the point of including a matched set of trials of conventional medicine. As Rutten and Stolper state (p. 170) “Pooling of results is…questionable if homeopathy works for some conditions and not for others”. This is a reasonable point. However, it is clear that some experimental conventional treatments work and some do not. The results of the analysis of conventional medicine were not consistent with the placebo hypothesis, showing that it is possible to obtain a positive result using the methods of Shang et al., even there is considerable heterogeneity in the results [5].

Post-hoc analysis?

Rutten and Stolper make the claim that the sub-sets of larger, higher quality studies were chosen post-hoc, presumably to make homeopathy appear less effective than it really is. In their paper, Rutten and Stolper state [p. 172-173]:

Cut-off values for sample size were not mentioned or explained in Shang el al's [sic] analysis. Why were eight homeopathy trials compared with six conventional trials? Was this choice predefined or post-hoc? Post-publication data showed that cut-off values for larger higher quality studies differed between the two groups. In the homeopathy group the cut-off value was n = 98, including eight trials (38% of the higher quality trials). The cut-off value for larger conventional studies in this analysis was n = 146, including six trials (66% of the higher quality trials). These cut-off values were considerably above the median sample size of 65. There were 31 homeopathy trials larger than the homeopathy cut-off value and 24 conventional trials larger than the conventional cut-off value. We can think of no criterion that could be common to the two cut-off values. This suggests that this choice was post-hoc.”

The first thing to note is that it is not true that cut-off values for sample size were not mentioned or explained in the Shang et al. analysis. In the original Shang paper, on page 728, it is stated that “Trials with SE [standard error] in the lowest quartile were defined as larger trials”. In other words, the cut-off was not defined in terms of numbers of subjects, but in terms of standard error. It might be argued that this is a strange way of defining “larger” trials (and perhaps it should have been phrased as “lower standard error”). But it makes sense when criteria must be stated a priori. If a number of subjects were stated as a cut-off value, there would be no way of knowing how many studies would meet that criterion before looking at the data. You might find that a very large or very small number of studies met the criterion, making further analysis difficult. So, there is no mystery as to why the “cut-off values” were different between trials of homeopathy and trials of conventional medicine: it is because the distribution of standard errors is different between the two populations. This could be discovered simply by reading the original paper, and the conclusion that the groups were chosen post-hoc cannot be sustained.

A further point here is that the group of “larger” homeopathy trials contains smaller trials that would not have made the cut for “larger” trials in the conventional medicine group. Those smaller trials are more likely to show spurious positive results. It follows that had the authors engineered the groups to get the result they wanted, they had engineered them in favour of homeopathy.

Another paragraph in Rutten and Stolper states “We did not further investigate possible selection bias by excluding trials, but we were surprised by the exclusion of Wiesenauer's trial on chronic polyarthritis. This was a larger trial (n = 176), of good quality according to Linde, with positive results. This trial would have contributed positively to the outcome of the larger higher quality trials. Shang excluded this trial because no matching trial could be found” (page 171). Since the trial was excluded on the basis of the clearly stated, pre-specified exclusion criteria, what is surprising about it having been excluded? Including it would have made a nonsense of the design of the study and violated the pre-specified exclusion criteria, and would have been a gross error.

Another possible outcome?

Rutten and Stolper conduct a sensitivity analysis, but, as they note, the decisions they make in this analysis are highly subjective. They decide to exclude all trials of homeopathy for muscle soreness [6-9], on the grounds that “treatment of healthy individuals is very rare in homeopathic practice [and] this outcome has low external validity to judge the effect of homeopathy as a method” (page 173). Yet, the trials were conducted with the participation of prominent homeopaths, and some were published in homeopathic or alternative medicine journals [8, 9], so at least some homeopaths seem to be of the opinion that there is enough external validity for it to be worth conducting a trial. So how can the external validity of the trials be judged in a transparent way? In a meta-analysis based on clear, pre-specified criteria, there could be no justification for omitting the studies.

It is also notable that one of the authors was a co-author of another re-analysis published in the Journal of Clinical Epidemiology [10]. That analysis showed that if random-effects meta-analysis is used, it is possible to add smaller trials to Shang’s set of “larger, higher quality” trials of homeopathy, and get a statistically significant (although clinically unimpressive) benefit for homeopathy. All this really shows is that a finding in favour of homeopathy is not robust, and as Shang et al. showed, including smaller trials also decreases the reliability of the findings. The re-analysis also showed that the benefit for homeopathy was statistically insignificant when a meta-regression analysis was used: this negative finding was strangely not mentioned in the Homeopathy paper. Because the results differed between meta-regression and random-effects analyses, and because Shang et al. showed highly significant evidence of funnel-plot asymmetry in their complete dataset of 110 trials of homeopathy, it is arguable that meta-regression analysis is a more appropriate choice.

Overall, it is clear that “another outcome” (i.e. one favourable to homeopathy) is possible, as long as negative studies are excluded without good reason, smaller and less reliable studies are included, and a particular method of statistical analysis is used. In a paper that (wrongly) criticises a study for analysing data based on criteria established post-hoc, this seems like an odd point to make.


The analysis by Rutten and Stolper contains misconceptions of Shang et al., contains some important errors, and does not show that the Shang et al. study was an invalid analysis. In particular, there is no evidence that the Shang et al. study involved post-hoc choice of subgroups. The results of meta-analyses can be debated, but scientists should not be accused of research misconduct on the basis of no evidence, or on the basis of having failed to read their work properly.


1. Rutten ALB and Stolper CF. The 2005 meta-analysis of homeopathy: the importance of post-publication data. Homeopathy 2008; 97: 169-177.

2. Shang A, Huwiler-Müntener K, Nartey L et al. Are the clinical effects of homeopathy placebo effects? Comparative study of placebo-controlled trials of homeopathy and allopathy, Lancet 2005; 366: 726–732.

3. Linde K, Clausius N, Ramirez G et al. Are the clinical effects of homeopathy placebo effects? A meta-analysis of placebo-controlled trials, Lancet (1997); 350: 834–843.

4. K. Linde K, Scholz M, Ramirez G, Clausius N, Melchart D, Jonas WB. Impact of study quality on outcome in placebo-controlled trials of homeopathy, J Clin Epidemiol 1999; 52: 631–36.

5. Shang A, Jüni P, Sterne JAC, Huwiler-Müntener K, Egger M. Are the clinical effects of homeopathy placebo effects? A meta-analysis of placebo-controlled trials: Author’s reply, Lancet 2005; 366: 2083-2084

6. Vickers AJ, Fisher P, Wyllie SE, Rees R. Homeopathic Arnica 30X is ineffective for muscle soreness after long-distance running – A randomized, double-blind, placebo-controlled trial. Clin J Pain 1998; 14: 227–231.

7. Vickers AJ, Fisher P, Smith C, Wyllie SE, Lewith GT. Homoeopathy for delayed onset muscle soreness - A randomised double blind placebo controlled trial. Brit J Sports Med 1997; 31: 304–307.

8. Jawara N, Lewith GT, Vickers AJ, Mullee MA, Smith C. Homoeopathic Arnica and Rhus toxicodendron for delayed onset muscle soreness - A pilot for a randomized, double-blind, placebo-controlled trial. Brit Hom J 1997; 86: 10–15.

9. Tveiten D, Bruset S, Borchgrevink CF, Norseth J. Effects of the homeopathic remedy Arnica D30 on marathon runners: A randomized, double-blind study during the 1995 Oslo Marathon. Complement Ther Med 1998; 6(2): 71–74.

10. Lüdtke R, Rutten ALB. The conclusions on the effectiveness of homeopathy highly depend on the set of analyzed trials. J Clin Epidemiol 2008; 61: 1197-1204


Anonymous said...

Well put. A small typo you may have already noticed, "This could be discovered simply by reading the original paper, and the conclusion that that the groups were chosen post-hoc cannot be sustained."

Peter Fisher, who co-authored one of the controvertial muscle soreness papers discussed this point in the 2006 debate with Ben Goldacre at the Natural History Museum. It might be worth checking the excuses.

But, as you say, it wasn't Shang et al.'s fault that the homeopathy research community put good quality trials of something they say they never do in the literature. Having set their selection criteria they had to use them. To do otherwise would have been dishonest.

Anyway, if you listen to some homeopaths there isn't anything it can't do. It seems that the definition of lacking internal validity is getting a negative result.

Paul Wilson said...

Heh, thanks for spotting the typo. I've gone and sent the thing now, but I'd be able to fix that at the proof stage, assuming it gets that far. And thanks for the link: I may get around to checking that out.

As you say, Shang and colleagues are now being criticised for following the exclusion and inclusion criteria of their study. My favourite bit is when Rutten and Stolper affect to be surprised by the exclusion of a study (the one Dana Ullman liked on polyarthritis), and then go on to point out which of the exclusion criteria it met. Myself, I didn't find it that surprising that the authors failed to abandon the design of their entire study so they could include that trial.

And obviously, it was quite wrong of them to include those negative studies on muscle soreness, because then the results would be negative.

Neuroskeptic said...

Excellent work - although if this gets printed, you'll have the dubious "honour" of a publication in Homeopathy on your record!

Anonymous said...

So, put a little more crudely, they're arguing that the Shang et al. paper is flawed because it didn't fiddle the results?

Anonymous said...

Very nice. Do let us know whether they publish it.

Paul Wilson said...

Excellent work - although if this gets printed, you'll have the dubious "honour" of a publication in Homeopathy on your record!

Well, after the famous Memory of Water fiasco, I already have two publications in Homeopathy. This one, and, as third author, this one. (apgaylard has a couple as well).

I don't put these on my academic CV when I'm applying for jobs, but I do put them on my Manchester Uni webpage.

Thank you all for your comments: I'll be posting what happens to the submission on this blog, but it's likely to be a long time before I hear anything.

Anonymous said...

According to the Faculty of Homeopathy's document We Answer The Critics, Peter Fisher "has written a detailed commentary" on The Ludtke/Rutten and Rutten/Stolper papers. They don't give an actual citation, and the link just takes me to a Journal of Clinical Epidemiology login page. I've tried browsing the journal's contents for the last few issues, but I can't find Fisher's commentary there.

Paul Wilson said...


I can't be sure, but I think the Fisher document might be this one, from the issue of Homeopathy that included the original Rutten and Stolper paper. Naturally, it's almost completely wrong about everything, but it might amuse you for a bit.

Anonymous said...

I don't think it would provide $31.50's worth of amusement...

Paul Wilson said...


Yes, clearly not $31.50 of amusement...I always forget that I only get these things for free because I work for a university.

If you want a copy, drop me an e-mail (details here), and I should be able to sort something out.