Wednesday, 24 June 2009

What do bibliometrics actually add to research evaluation?

Firstly, the reason that I haven't posted in an age is that I've been in Norway, interpreting seismic data for the new project I'm working on. Hopefully I can now post a bit more regularly, as I should actually be in Manchester for a few consecutive weeks, for the first time this year.

Regular readers will know that I like to whinge about the increasing use of statistical indicators (bibliometrics) to evaluate research performance. Previously in England, research performance has been evaluated by the Research Assessment Exercise, a cumbersome and involved system based around expert peer review of research. Currently, HEFCE (the body that decides how scarce research funding is allocated to English universities) is looking into replacing this with a cumbersome and involved system based around bibliometrics and "light-touch" peer review. To this end, a pilot exercise using bibliometrics and including 22 universities has been underway. An interim report on the pilot is now available.

Essentially, three approaches have been evaluated:

i) Based on institutional addresses: here papers are assigned to a university based on the addresses of the the authors, as stated in the paper. This would be cheap to do, as it would need no input from the universities.

ii) Based on all papers published by authors. In this approach, all papers written by staff selected for the 2008 RAE were identified. This requires a lot of data to be collected.

iii) Based on selected papers published by authors. Again, this approach used all staff selected for the 2008 RAE, but only used the most cited papers.

For each approach, the exercise was conducted twice: once using the Web Of Science (WoS) database, and once using Scopus. The results were then compared with those from the 2008 RAE.

Well, the results are interesting, if you like this sort of thing. It is clear that the results can be very different from those provided by the RAE, whichever method was used, although the "selected papers" method tends to give the closest results. It is also notable that the two different databases give different results, sometimes radically so; Scopus seems to consistently give higher values than WoS. Workers in some fields complained that they made more use of other databases, such as the arXiv or Google Scholar (it's worth noting that the favoured databases are proprietary, while the arXiv and Google Scholar are publically accessible).

In general, the institutions involved in the pilot preferred the "selected papers" method, but it seems that none of the methods produced particularly convincing results. According to the report (paras 66 and 67):

In many disciplines (particularly in medicine, biological and physical sciences and psychology), members reported that the ‘top 6’ model (which looked at the most highly cited papers only) generally produced reasonable results, but with a number of significant discrepancies. In other disciplines (particularly in the social sciences and mathematics) the results were less credible, and in some disciplines (such as health sciences, engineering and computer science) there was a more mixed picture. Members generally reported that the other two models (which looked at ‘all papers’) did not generally produce credible results or provide sufficient differentiation.

One of the questions here is what is meant by "reasonable" or "credible" results? The institutions involved in the pilot seem to assume that the best results are the ones that most closely match those of the RAE. I suspect this is because the large universities that currently receive the lion's share of research funding are not going to support any system that significantly changes the status quo.

The institutions involved in the pilot seem to think that bibliometrics would be most useful when used in conjunction with expert peer review. From the report:

Members discussed whether the benefits of using bibliometrics would outweigh the costs. Some found this difficult to answer given limited knowledge about the costs. Nevertheless there was broad agreement that overall the benefits would outweigh the costs – assuming a selective approach. For institutions this would involve a similar level of burden to the RAE and any additional cost of using bibliometrics would be largely absorbed by internal management within institutions. For panels, some members felt that bibliometrics might involve additional work (for example in resolving differences between panel judgements and citation scores); others felt that they could be used to increase sampling and reduce panels’ workloads.

According to the interim report, the "best" results (i.e. those most closely matching the results of the RAE) were obtained using a methodology that will have a similar administrative burden as the RAE. Even then the results had "significant discrepancies". So, if the aim of the pilot was to get similar results to the RAE with a lesser administrative burden, it seems that the pilot exercise has failed on both counts. So if bibliometrics don't seem to add much to the process, it's worth considering what they might take away. For which, see my previous post...


David Colquhoun said...

Thanks for keeping up the pressure on this important question

The problem with the HEFCE study, and almost all 'bibliometrics' is, it seems to me, that they never have any proper standard to judge their metrics against. Almost everything they write is circular because they never consider individuals.

I took a rather different approach and looked at the performance of someone who commands universal respect in my field (Nobel prize too). (There is a reprint here and blog version here.) He failed the requirements of at least one university for 'productivity' and didn't score particularly well on many metrics. His most highly cited paper was, predictably, a methods paper in a journal with an impact factor that promotions committee zombies would now tend to regard as unacceptable (the huge number of citations came later of course).

Another problem is that 'bibliometrics' has become a speciality in its own right, with its own set of 'experts'. None of them has ever done research in the ordinary sense of the word, and they know little about its unpredictability and vicissitudes.

The result, if real researchers don't manage to stop it, will be a culture which rewards 'productivity' above quality, which rewards short term shallow work more than real novelty, and which rewards marginal dishonesty and guest authorships (already a major plague). If we can't escape from the thrall of HR zombies the prospect is for a continual decline in the quality and honesty of research. That at least is a view that is expressed in a letter in the Financial Times today.

My favourite simple solution is to restrict everyone to an average of one original research paper per year. Anything that takes much less time than that is not likely to be very original (and/or not likely to have much contribution from the nominal author.

Paul Wilson said...

Thanks for the comment, David.

I remember reading the Physiology News article on your blog. The Imperial College publication score struck me then as plain daft, and not based on what actually happens in research.

It's impossible to account for all the strange things that happen with citations (people citing their mates, citing themselves, publications that get cited frequently because they are wrong, guest authorships, publications in languages other than English, inconsistencies between databases, errors in databases, and on and on) by a simple citation count. The proposed scheme for the REF seems to be a slightly more sophisticated version of citation counting.