Evaluating Investments and Performance in UK Science
In this chapter, I will discuss the key features of the UK research system and some of the essential elements of science policy under the Thatcher Administration and the Major Administration. (There have been few changes so far under the Blair Administration.) I will then focus on experiences with evaluation in three sets of agencies: the Higher Education Funding Council and the university research assessment exercises (which have been carried out since 1986), the research councils, and government department research. I will end with four general conclusions.
Key Characteristics of the UK Research System
The UK system is relatively decentralized, with approximately half a dozen government departments with major R&D budgets. They each determine their own R&D spending as part of that individual ministry's budget. In this respect, the United Kingdom is similar to the United States because there is no single government R&D budget as there is, for example, in France and Germany. We differ, though, in that, we have six main research councils: engineering and physical sciences, particle physics and astronomy, biosciences, medical research, natural environment research, and economic and social research.
We also differ in relation to universities. We essentially have only public universities and those are all funded by the national government, although we do now have separate Higher Education Funding Councils for England, Scotland, Wales, and Northern Ireland. These provide core funding for both teaching and research.
There has been some rather limited coordination and oversight across ministries, which has been resisted in some cases by ministries who exhibit the traditional "territorial imperative." That is still rather pronounced, although the new Labour Government has been trying to address this issue.
Over the last 10 years, we have seen a modest increase in the level of coordination and oversight through a number of mechanisms. First, and perhaps most importantly, the Technology Foresight Program is attempting to construct a long-term overview of all of government-funded science and technology. Other mechanisms are the Office of Science and Technology (OST), which was created in 1992, the government Chief Scientist and various committees such as the Committee of Departmental Chief Scientists, an inter-ministerial committee of ministers from departments with R&D responsibilities, and an advisory Council for Science and Technology. However, for the 5 years that several of these mechanisms have been in existence I don't think they have achieved as much as they might have. Again, the new government is looking closely at how to strengthen that coordination function.
Historical Emergence of Certain Key Features of the UK System
The UK equivalent of the Vannevar Bush report was a report by the Haldane Committee at the end of World War I, which set out the so-called Haldane Principle. This Principle led to independent research councils being established to fund basic research. (Applied research, in contrast, was funded by departments.) Initially, we had a Medical Research Council, which was followed in the 1930s by an Agricultural Research Council. Others were set up from the 1960s onward.
The end of World War I also saw the creation of the University Grants Committee and the establishment of the so-called "dual support system," with two streams of funding for university research. The Committee provided funding for teaching and also for some aspects of research. It provided core funding for academic salaries (typically UK academics spend about 40 percent of their time on research) as well as funding for the so-called well-found laboratorythat is, general infrastructural support for research. At the same time, the research council grant system (the second stream) was set up to cover the costs of specific research expenditures.
The next big change was the 1972 Rothschild report which argued that all government-funded applied research should be subject to the customer-contractor principle. This led to part of the research council funds from the Agricultural, Natural Environment and, initially, the Medical Research Councils being transferred to government departments, who then contracted back with those research councils to have a certain amount of research carried out on their behalf. This transfer was fiercely resisted by the Medical Research Council, and the decision was subsequently reversed a few years later in that particular case.
During the second half of the 1970s we saw the beginning of level government funding (and even cuts in some years) to science. This resulted in a number of things. The Science Research Council (as it was then called), which at that stage was by far the largest research council, engaged progressively in more selectivity and concentration. They also put more emphasis on the economic and social benefits from research. At the same time, the money from the University Grants Committee was gradually being squeezed, particularly as it pertained to the support for general research infrastructure. As many of you who work in universities are aware, it is easier to cut infrastructural support than academic salaries! That is what happened in many British universities.
Then, under the Thatcher Administration, we saw tight monetary policy. The key features of her Administration were rolling back the state, subjecting the public sector to the discipline of the marketplace, and ensuring value for money. She stressed the three Es: economy, efficiency, and effectiveness.
We now saw publicly funded science encouraged to move closer to industry to try to develop greater funding from industry and other sources, so it would rely less on the State. We saw a focus on the economic returns from research, as opposed to contributions to scientific knowledge or even to quality of life. We saw a much greater emphasis on accountability. Subsequently, monitoring and evaluation became increasingly important during the 1980s.
There were also new arrangements for closer coordination of government-funded research. The Science and Technology Assessment Office was established in the Cabinet Office to develop assessment methods and to encourage and oversee research assessment by research councils, ministries, and others. The Office was also responsible for answering that government-funded R&D contributed to the efficiency, competitiveness, and innovative capacity of the UK economy. A procedure of annual reviews of government R&D spending by the Cabinet Office was also established. Increasingly more and more detailed policy analyses were included in these published reviews.
During this time, the research councils were putting much greater emphasis on assessment. In 1986, the University Grants Committee launched the first of its research assessment exercises. That body was replaced a couple of years later by the Universities Funding Council. The big change there was that users, particularly industrialists, began to be represented.
Under Mr. Major's Administration, further changes included the creation of the Office of Science and Technology in the Cabinet Office in 1992, and a cabinet minister for science and technology, the first in 30 years. (Some of these ideas were lifted from the Labour Party Manifesto!) Also, about that time was set up the House of Commons Select Committee on Science and Technology. Prior to that, all we had was a House of Lords Committee trying to provide some oversight on conservative government policy since, during the 1980s, there was little or no interest among Members of Parliament in science and technology.
In 1993, the Cabinet Minister for science and technology, William Waldegrave, produced the first British Government White Paper on science and technology in 20 years. It was called Realizing Our Potential, an important title because it was concerned with how better to exploit the potential of the UK science and technology base. What we saw in this White Paper was the first explicit expression of a new social contract. Basically, it said if you researchers receive money from the public purse, you have a responsibility to try to identify the ultimate beneficiaries of your research and to work with them in identifying their longer term research needs, and to see whether you can address those longer term needs more effectively through your research.
This paper led to the research councils being reorganized. They went from five to six, with a new one being created for biotechnology and biological sciences. They were also given new missions: to contribute to wealth creation and to improving the quality of life. They were given target audiences to address. The paper also set up the Technology Foresight Program. This program had two aims: (1) to link science and technology more closely to national needs in relation to wealth creation and improving the quality of life and (2) to build partnerships between the science base and users in industry and elsewhere.
In 1995, the Office of Science and Technology moved to the Department of Trade and Industry, a slightly curious move for which no clear rationale was ever given. I think it was linked with negotiations going on at the time between John Major and one of his key ministers, although the official rationale was to try to link research more closely to wealth creation.
The Office of Science and Technology, while located in that single department, however, is still responsible for the coordination of science and technology across all of the government. That is not easy to do once the Office had become embedded in a single ministry. The other ministries have been rather less interested in listening to messages from OST in the last 3 or 4 years.
Also, the University Funding Council, in 1995, was replaced by these Higher Education Funding Councils for England, Scotland, Wales, and Northern Ireland, and that move also saw some partial transfer of research overhead funds from those councils to the research councils.
University Research Assessment Exercises
In 1984, the University Grants Committee argued that there needed to be a more selective approach to resource allocation using this stream of money for supporting the university research infrastructure, and that universities needed to develop more explicit research strategies and improved methods of research management. In 1986, the first research assessment exercise (RAE) was completed. In this exercise, departments had to categorize themselves under 37 headings, which were basically disciplines (although they were called units of assessment). Each of these was reviewed by a panel of typically a half a dozen or so peers. Each department outlined what its research achievements had been. It also listed the five best research publications from that unit. On the basis of these various pieces of paper that came in from all the departments, the peer-review panels graded each department on a four-point scale: below average, average, above average, or internationally outstanding.
How they went from those pieces of paper to those four grades was all rather mysterious. There was an absolute minimum of information about the process. As was pointed out subsequently, this process by which each department listed just their five best publications produced a systematic bias in favor of the large departments. On average, their five best papers were better than those of smaller departments.
The exercise was repeated in 1989 and in 1992. The fourth one, in 1996, was completed at the end of that year, by which time there were 67 units of assessments, with panels overseeing each of those. This time, those panels did include a few users. There were now seven grades, with various criteria based on the proportion of a university's research which fell into the category of nationally excellent or internationally excellent. The assessment process in relation to publications also changed. Now each active researcher could list up to four publications or other public outputs. (These could be concerts if you were a musician, for example.)
There were no bibliometric data. They had experimented with using such data in 1992, when they had asked each department or unit to list all their publications, classified into different categories articles in international refereed journals, books, conference proceedings, and so on. This was judged not to have been very successful and was dropped in the 1996 exercise.
In addition, they introduced a census date. You could now include all the research of academic staff who were employed on March 31, 1996. This meant you could include their research done at previous institutions. You can imagine the outcome of that. A transfer market developed, with people being bought up by universities who wanted to strengthen their reputation in a particular subject.
The other important thing to stress is that, over time, these different grades had bigger and bigger financial consequences. In the most recent exercise, grades one and two got no research infrastructural money. Grade three (b) got one unit; grade three (a) got one and a half units. Each subsequent grade introduced an extra 50 percent. The result is that the research assessment exercise now influences 95 percent of the research infrastructural money coming from the Higher Education Funding Councils, compared with just 15 percent when it started in 1986.
We have a parallel stream of assessment activities to deal with teaching quality, where there are no direct financial consequences. You do not get more money from the Higher Education Funding Council if you get a better score on teaching quality. You don't have to be an economist to figure out that universities have inevitably given greater emphasis to research compared with teaching.
What is my assessment of the research assessment exercises? I think they have improved the quality of research, particularly in some of the lower ranked universities. I think there are now clearer and, arguably, more effective university research strategies. The research assessment exercise has now become reasonably well accepted in the academic community, certainly in comparison with the 1986 one, which was rightly criticized and lacking in credibility.
On the negative side, however, these exercises result in a lot of game playing. Academics are very good and very ingenious when it comes to indicators. There is a lot of emphasis on presentation as opposed to improved quality of research, although I think it is fair to say there is even more of that in the teaching quality assessment (where, in my view, the costs are huge). I think the cost of a teaching quality assessment is arguably greater than any conceivable benefit. With the research assessment exercise, the cost benefit equation is just about favorable. A fundamental principle in any evaluation is that the costs should be much less than the costs of whatever it is that you are assessing. And they should also be less than the likely benefits. I fear that, in some cases, we may have lost sight of these boundary conditions.
Another consequence of these exercises is increasing resource concentration. After the 1996 exercise, our top five universities received no less than one-third of the research funds from the Higher Education Funding Council. You can debate whether that is a good or a bad thing.
A final concern centers on a contradiction between the research assessment exercises and government science policy. The research councils emphasize getting closer to users, which inevitably means more interdisciplinary and, perhaps, more applied research. The research assessment exercise, with its disciplinary focus, emphasizes basic and mainstream research. It may therefore penalize researchers engaged in interdisciplinary work relating to the needs of users. There are some interesting analyses in economics, which have shown it is very tough to get a high rating unless most of your work falls in mainstream economics of a relatively "pure" nature.
Research Council Evaluation
From the mid-1980s, there has been increasing emphasis within the research councils on directed programs (as opposed to traditional responsive mode grants, i.e. ,the Research Councils respond to individual research proposals on their merits. With directed programs, by contrast, the Research Councils invite researchers to submit proposals addressing identified priority topics.) and on evaluations. Several research councils set up their own evaluation units.
For research council institutes, there are now quite elaborate evaluation mechanisms, including site visits by peers, extensive reports, and the use of bibliometric data and other performance indicators (such as how much income you generate from users). Those evaluations can lead to serious results such as restructuring, merger, or even the closing of institutes.
The grants that the research councils give out are still primarily assessed through peer review. That has also become a much more systematic and important process. The consequences are now more crucial. There is also more involvement of users, particularly for the directed programs.
The various experiments with bibliometric indicators at the Natural Environment, Agricultural, and Economic and Social Research Councils have found only limited applications, for example, in relation to institutes, large research programs, and big science. There are other evaluation experiments going on in individual councils, but I will skip over those.
Government Department Research
A 1988 report led to some government research establishments becoming what were called "next steps agencies," accountable to the respective minister but financially self-sufficient. In other words, they were being partially exposed to market forces. In 1994, that process continued with the first full privatizations of a number of laboratories. In addition, there was the so-called "prior options" review of other government research establishments and research institutes to see whether some of them should become agencies or even privatized.
How has evaluation been dealt with herein relation to government department research? During the 1980s, a fairly systematic and comprehensive evaluation system was developed which goes under the acronym ROAMErationale, objectives, appraisal, monitoring, and evaluation. Each of those initials is important. Any proposed department research program must have a well-supported rationale for its research. The objectives must be clear and explicit, with the merit of those objectives being appraised before they are funded. Monitoring should be carried out while the research is in progress. And there should be evaluation at the end of each program, with an independent assessment that involves industry and academics as well as policymakers.
By the 1990s, ROAME had become a required part of all government department R&D programs. (It should be pointed out that these tend to be slightly more applied in orientationevaluation, perhaps, is somewhat easier with applied research.) However, there are some doubts about how effectively this has been implemented or whether it has just become a bit of a ritual.
First, I think we now have in the United Kingdom a revised social contract. We have moved beyond the Vannevar Bush model. Researchers receiving public funds now recognize that they have a duty to identify users and to help address their research needs. This applies across the full range of research. In a study I carried out last year for the Institute of Physics, for example, I found that even particle physicists were beginning to address this issue. They were getting quite excited about the possibilities of working with users and recognized that it did not necessarily infringe their autonomy.
Second, financial and political pressures have led to a very heavy emphasis on accountability and value for money and, therefore, on evaluating performance and results from government-funded research.
Third, as a result of that, the United Kingdom has been one of the countries in the forefront of the development of performance assessment in research. We have developed bibliometric and other indicators and research assessment exercises that are among the largest and most comprehensive of any in the world. In addition, we have the increasingly systematic evaluation by research councils and the ROAME evaluation system for government department research.
Last, we have seen increasing involvement of users from industry in making policy, identifying priorities, and making funding decisions. A key mechanism for achieving this is the Technology Foresight Program which brings together the industrial community, the scientific community, and the government to look at the long-term future and to identify the areas of strategic research and emerging generic technologies that are likely to lead to the greatest economic and social benefits to the Nation. The views of users are also sought in the evaluation of research projects and programs.
In conclusion, I hope that this analysis of the United Kingdom's experiences with the evaluation of research performance is of some relevance to discussions here in the United States about GPRA and how you assess the performance and results of government-funded research.
Ben Martin is director of Science and Technology Policy Research (SPRU) at the University of Sussex, Brighton, United Kingdom. This chapter draws extensively on a report that has been prepared by Jacqueline Senker, who is leading a European Union-funded program on changing public sector research systems in different European countries. This article is based on remarks delivered at the 23rd Annual AAAS Colloquium on Science and Technology Policy, held April 29-May 1, 1998, in Washington, DC.