Hawk/Handsaw: REF

Showing posts with label REF. Show all posts

Wednesday, 23 September 2009

REF consultation document published

For anyone interested in how research funding is allocated (fascinating stuff, I know), a consultation document on the Research Excellence Framework (REF) is now available here. REF is the mooted replacement for the old Research Assessment Exercise (RAE), the last one of which was conducted in 2008. Enjoy...

Friday, 18 September 2009

Playing the game: the impact of research assessment

Yesterday I was sent this report, produced by the Research Information Network, in conjunction with the Joint Information Systems Committee, and entitled "Communicating knowledge: How and why UK researchers publish and disseminate their findings". The report used a literature review, bibliometric analysis, an online survey of UK researchers, and focus groups or interviews with researchers to look at how and why researchers put information into the public domain. Being an early-career researcher, I'm interested in this sort of thing: I know why I'm publishing and disseminating information, but it's interesting to see why everyone else is doing it. It's also interesting to see the extent to which research assessment in the UK - until recently the Research Assessment Exercise (RAE) and in future the mysterious Research Excellence Framework (REF) - influence the decisions that researchers make. What particularly struck me about the report was the number of times researchers talked about "playing games": the framework of research assessment is seen as a game to be played, with the needs of research being subordinated to the need to put in a good performance. This has important implications for the REF, in which bibliometric indicators are likely to play an important role.

The key point of the report is that there is some confusion among researchers about what exactly it is they're supposed to be doing. There are conflicting and unclear messages form different bodies about what sort of research contributions are valued. The perception is that the only thing that really counts in terms of research assessment is peer-reviewed journal articles. Other contributions, such as conference proceedings, books, book chapters, monographs, government reports and so on are not valued. As a result, the proportion journal articles compared to other outputs increased significantly between 2003 and 2008. A couple of comments by researchers quoted in the report (p.15):

[There is] much more emphasis on peer reviewed journals …Conferences, working papers and book chapters are pretty much a waste of time … Books and monographs are worth concentrating on if they help one demarcate a particular piece of intellectual territory.

There is a strong disincentive to publish edited works and chapters in edited works, even though these are actually widely used by researchers and educators in my field, and by our students.

This is certainly the impression I get from my own field. In fact, I have been advised by senior colleagues to target high-impact journals, rather than, for example, special publications. I have never received any formal guidance on what research outputs are expected of me, but the prevailing atmosphere gives the impression that it's all about journal articles. After publishing a couple of things from my PhD, it took another three years to publish anything from my first post-doc. I worried about that: it seemed that the numerous conferences and internal company reports and presentations I produced over that time counted for nothing career-wise.

The report makes it clear that, in the case of the RAE, it is more perceptios than the reality causing the problem: the RAE rules meant that most outputs were admissible, and all would be treated equally. But it's perceptions that drive the way researchers respond to research assessment. Clearer guidance is needed.

An interesting point brought up by the report is how, when there is more than one author for a journal article, the list of authors is arranged. In my field, authors are typically listed in order of contribution, so I was surprised to find that this is by no means always the case. In some fields, especially in the humanities and social sciences, authors are commonly listed alphabetically. In some cases, the leader of the research group is listed first, in other cases last. And there are various mixtures of listing by contribution, grant-holding and alphabetic order. There is even a significant minority where papers based on work done by students have the student's supervisor as first author! This means that there is no straightforward way of apportioning credit to multiple authors of a paper, something that David Colquhoun has already pointed out. This is a huge problem for any system of assessment based on bibliometrics.

The report also examines how researchers cite the work of other people. Other researcher's work should be cited because it forms part of the background of the new research, because it supports a statement made in the new paper, or as part of a discussion of how the new paper fits into the context of previous research. Crucially, this includes citing work with which the authors disagree, or that is refuted or cast into doubt in the light of the new work (p.30):

Citing somebody often indicates opposition / disagreement, rather than esteem and I am as likely to cite and critique work that I do not rate highly as work I value.

So any system that relies on bibliometric indicators is likely to reward controversial science as much as good science (not that those categories are mutually exclusive, but they don't completely overlap either).

Researchers are perfectly clear that a system based on bibliometrics will cause them to change their publication behaviour: 22% will try to produce more publications, 33% will submit more work to high-status journals, 38% will cite their collaborators work more often, while 6% will cite their competitors work less often. This will lead to more journal articles of poorer quality, a the decline of perfectly good journals that have low "impact", and corruption in citation behaviour. In general, researchers aren't daft, and they've clearly identified the incentives that would be created by such a system.

The report presents a worrying picture of research, and scientific literature, distorted by the perverse incentives created by poorly thought-out and opaque forms of research assessment. It can be argued that scientists who allow their behaviour to be distorted by these incentives are acting unprofessionally: I wouldn't disagree. But for individuals playing the game, the stakes are high. Perhaps we ought to be thinking about whether research is the place for playing games. It surely can't lead to good science.

Wednesday, 24 June 2009

What do bibliometrics actually add to research evaluation?

Firstly, the reason that I haven't posted in an age is that I've been in Norway, interpreting seismic data for the new project I'm working on. Hopefully I can now post a bit more regularly, as I should actually be in Manchester for a few consecutive weeks, for the first time this year.

Regular readers will know that I like to whinge about the increasing use of statistical indicators (bibliometrics) to evaluate research performance. Previously in England, research performance has been evaluated by the Research Assessment Exercise, a cumbersome and involved system based around expert peer review of research. Currently, HEFCE (the body that decides how scarce research funding is allocated to English universities) is looking into replacing this with a cumbersome and involved system based around bibliometrics and "light-touch" peer review. To this end, a pilot exercise using bibliometrics and including 22 universities has been underway. An interim report on the pilot is now available.

Essentially, three approaches have been evaluated:

i) Based on institutional addresses: here papers are assigned to a university based on the addresses of the the authors, as stated in the paper. This would be cheap to do, as it would need no input from the universities.

ii) Based on all papers published by authors. In this approach, all papers written by staff selected for the 2008 RAE were identified. This requires a lot of data to be collected.

iii) Based on selected papers published by authors. Again, this approach used all staff selected for the 2008 RAE, but only used the most cited papers.

For each approach, the exercise was conducted twice: once using the Web Of Science (WoS) database, and once using Scopus. The results were then compared with those from the 2008 RAE.

Well, the results are interesting, if you like this sort of thing. It is clear that the results can be very different from those provided by the RAE, whichever method was used, although the "selected papers" method tends to give the closest results. It is also notable that the two different databases give different results, sometimes radically so; Scopus seems to consistently give higher values than WoS. Workers in some fields complained that they made more use of other databases, such as the arXiv or Google Scholar (it's worth noting that the favoured databases are proprietary, while the arXiv and Google Scholar are publically accessible).

In general, the institutions involved in the pilot preferred the "selected papers" method, but it seems that none of the methods produced particularly convincing results. According to the report (paras 66 and 67):

In many disciplines (particularly in medicine, biological and physical sciences and psychology), members reported that the ‘top 6’ model (which looked at the most highly cited papers only) generally produced reasonable results, but with a number of significant discrepancies. In other disciplines (particularly in the social sciences and mathematics) the results were less credible, and in some disciplines (such as health sciences, engineering and computer science) there was a more mixed picture. Members generally reported that the other two models (which looked at ‘all papers’) did not generally produce credible results or provide sufficient differentiation.

One of the questions here is what is meant by "reasonable" or "credible" results? The institutions involved in the pilot seem to assume that the best results are the ones that most closely match those of the RAE. I suspect this is because the large universities that currently receive the lion's share of research funding are not going to support any system that significantly changes the status quo.

The institutions involved in the pilot seem to think that bibliometrics would be most useful when used in conjunction with expert peer review. From the report:

Members discussed whether the benefits of using bibliometrics would outweigh the costs. Some found this difficult to answer given limited knowledge about the costs. Nevertheless there was broad agreement that overall the benefits would outweigh the costs – assuming a selective approach. For institutions this would involve a similar level of burden to the RAE and any additional cost of using bibliometrics would be largely absorbed by internal management within institutions. For panels, some members felt that bibliometrics might involve additional work (for example in resolving differences between panel judgements and citation scores); others felt that they could be used to increase sampling and reduce panels’ workloads.

According to the interim report, the "best" results (i.e. those most closely matching the results of the RAE) were obtained using a methodology that will have a similar administrative burden as the RAE. Even then the results had "significant discrepancies". So, if the aim of the pilot was to get similar results to the RAE with a lesser administrative burden, it seems that the pilot exercise has failed on both counts. So if bibliometrics don't seem to add much to the process, it's worth considering what they might take away. For which, see my previous post...

Friday, 9 January 2009

Does the REF add up to good science?

The RAE (Research Assessment Exercise) results from the 2008 were published back in December. You might have noticed this from the number of university websites that could be found frantically spinning the results. My very own University of Manchester, for example, is claiming that Manchester had broken into the “golden triangle” of UK research, that is, Oxford, Cambridge and institutions based in London. It seems that depending on the measure you pick, we’re anywhere between third and sixth place in the UK. Clearly these are excellent results, but whether we’re really up there with the Oxfords, Cambridges, Imperials and UCLs of the world I’m not sure.

In any case, that was the last ever RAE. It has been a fairly cumbersome process, involving expert peer review of the research contribution of research institutions, that has been a real burden on the academics who have had to administer it. I’m sure there are few who will mourn its passing. Now the world of English academia is waiting, like so many rats in an experimental maze, to find out what will replace the RAE. The replacement will be a thing called the Research Excellence Framework, or REF, and at this stage exactly what it will involve is fairly sketchy. However, it will be based on the use of bibliometrics (statistical indicators that are usually based on how much published work is cited in other publications) and “light-touch peer review”.

What kind of bibliometric indicators are we talking about? Last year HEFCE (the Higher Education Funding Council for England, the body that evaluates research and decides who gets scarce research funding) published a “Scoping study on the use of bibliometric analysis to measure the quality of research in UK higher education institutions” produced by the Centre for Science and Technology Studies at the University of Leiden, Netherlands. I’ve spent a fair amount of time reading through this, and in some ways I was encouraged. It’s clear that some thought has gone into creating bibliometric indicators that are as sensible as possible: I was dreading a crude approach based around impact factors, which have already done so much damage to the pursuit of good science. The authors of the “scoping study” came up with an “internationally standardised impact indicator”: I will abbreviate this as ISII for concision. The ISII takes the average number of citations for publications for the academic unit you are interested in (this might be a research group, an academic department or an entire university), and divides it by a weighted, field-specific international reference level. The reference level is calculated by taking the average number of citations for all publications in a specific field: if the publication falls under more than one field (as many will in practice), the reference level can be calculated as a weighted average of the number of citations generated by publications in all the fields in question. So, if the ISII for your research group comes out as 1, you’re average, if above 1, better than the average, and if below 1, worse than the average. The authors of the scoping study say that they regard the ISII as being “the most appropriate research performance indicator”, and suggest that a value of >1.5 indicates a scientifically strong institution. They also suggest a threshold of 3.0 to identify research excellence. It seems that the HEFCE is expecting to adopt the ISII as the main research performance indicator, according to their FAQs, where they say “We propose to measure the number of citations received by each paper in a defined period, relative to worldwide norms. The number of citations received by a paper will be 'normalised' for the particular field in which it was published, for the year in which it was published, and for the type of output”. However, they are still deciding what thresholds they will use to decide which institutions are producing high-quality research.

All well and good. If you insist that bibliometric indicators are necessary, this is probably as good a way as any of generating those data. However, there are some problems here, as well as philosophical difficulties with the entire approach.

Firstly, what is it we are trying to measure? In theory, what HEFCE wants to do is evaluate research quality. But the ISII does not directly measure research quality. Like any indicator based on citation rates, it is measuring the “impact” of the research: how many other researchers published papers that cited the research. It ought to be clear that while this should reflect quality to some degree, there are significant confounding factors. For example, research that is done in a highly active topic is likely to be cited more than research in which fewer groups are working. This does not mean that work in less active topics is of intrinsically lower quality, or even that it is less useful.

Secondly, there is an assumption that the be-all and end-all of scientific research is publication in peer-reviewed journals that are indexed in the Web of Science citation database published by Thomson Scientific. This a proprietary database that lists articles in the journals that it indexes, and also tracks citations. Criteria for journals to be included are not in the public domain (although the scoping report suggests these are picked based on their citation impact, p. 43). A number of journals that I would not consider to be scientifically reputable are included. For example, under the heading of Integrative and Complementary Medicine, the 2007 Journal Citation Reports (a database that compiles bibliometric statistics for journals in the citation database) includes 12 journals, including Evidence Based Complementary and Alternative Medicine (impact factor 2.535!) and the Journal of Alternative and Complementary Medicine (impact factor 1.526). This reinforces the point made above: it would be possible to publish outright quackery in either of these journals, have it cited by other quacks in the quackery that they publish, and get a respectable rating on the ISII. The ISII can’t tell you that this is a vortex of nonsense: it only sees that other authors have cited the work. It is also true that not all journals are included in the citation index: for example, in my own field the Bulletin of Canadian Petroleum Geology fails to make the cut, although it has always published good quality research. Although the authors of the scoping report make clear that it is possible to expand bibliometrics beyond the citation database, this will take much more effort and it seems that HEFCE will not take this route. So we will be relying on a proprietary and opaque database to make decisions on future research funding. A further point is that it is not clear how open access publications will be incorporated in the citation index: in principle there is no reason that this can’t happen, but can we be sure it will?

Thirdly, there is the assumption that research output can only be evaluated in terms of published articles in peer-reviewed journals. I’m not sure that this accurately reflects the actual research output of many scientists. For example, most of us put a lot of effort into presentations at scientific conferences, chapters in books, or government reports that will never make it into a citation database. This has become a problem for things like, in my own field, the special publications of the Geological Society of London. These are volumes that collect recent research on specific topics, and they generally contain excellent research. But they aren’t included in citation databases and they have no impact factor. This has led to a lack of interest in publishing results in these special publications, because they don’t tick the right boxes in terms of publication metrics. This is surely a bad thing. A similar problem occurs with things like government open-file reports. These are not, in general, pieces of world-class, cutting edge research. But that does not mean that they are useless or that they have no value. For example, good regional geological work can allow mineral exploration to be better targeted, benefiting the local economy. Yet that kind of work is ignored in a framework that only considers journal articles: HEFCE says only that “We accept that citation impact provides only a limited reflection of the quality of applied research, or its value to users. We invite proposals for additional indicators that could capture this”. To me, research quality and value cannot be measured by bibliometric indicators. It can only be evaluated by reading the research, understanding its context within the totality of pre-existing research, and understanding how it contributes to new understanding. That is, it can only be evaluated through peer review.

Which brings me to my fourth point; there are some questions about the role of peer review within the REF. HEFCE says that “the scoping study recommends that experts with subject knowledge should be involved in interpreting the data. It does not recommend that primary peer review (reading papers) is needed in order to produce robust indicators that are suitable for the purposes of the REF”. However, I’m not convinced that this accurately summarises what is written in the scoping report, which says “In the application of indicators, no matter how advanced, it remains of the utmost importance to know the limitations of the method and to guard against misuse, exaggerated expectations of non-expert users, and undesired manipulations by scientists themselves…Therefore, as a general principle we state that optimal research evaluation is realised through a combination of metrics and peer review. Metrics, particularly advanced analysis, provides the tools to keep the peer review process objective and transparent. Metrics and peer review both have their strengths and limits. The challenge is to combine the two methodologies in such a way that the strengths of one compensates for the limitations of the other”.

Finally, there is a hint of conflict of interest in the preparation of the scoping report by the Centre for Science and Technological Studies: according to their website, the centre is involved in selling "products" based on its research and development in the area of bibliometric indicators. Their report in favour of bibliometric indicators might allow them to drum up significant business from HEFCE.

At present, the proposals for the REF are at a fairly early stage, but the use of bibliometric indicators seems to be entrenched, and there will be a pilot exercise on bibliometric indicators this year. However, this is based on “expert advice” that consists of a single report from an organisation that makes money by creating bibliometric indicators. While academia in general might welcome the proposals on the grounds that they will be less burdensome than the RAE and give everyone more time to do research, I don’t think many academics will be kidding themselves that the bibliometric indicators involved actually tell us much about research quality and usefullness.

Hawk/Handsaw

Wednesday, 23 September 2009

REF consultation document published

Friday, 18 September 2009

Playing the game: the impact of research assessment

Wednesday, 24 June 2009

What do bibliometrics actually add to research evaluation?

Friday, 9 January 2009

Does the REF add up to good science?

Day job...

Welcome to Hawk/Handsaw...

About Me

Comments

Blog Archive

Interesting sites

Bad Science blogs

Other folk's blogs