Philosophy and Science Policy: A Report from the Field I

I’m actually going to give a series of reports from the field, including a chapter in a book on Field Philosophy that I’m revising now in light of editor/reviewer comments. In the chapter, I discuss our Comparative Assessment of Peer Review project. For a brief account of Field Philosophy, see the preprint of a manuscript I co-authored with Diana Hicks. That’s also being revised now.

Today, however, I will be focusing on more pressing current events having to do with Plan S. So, I will give a talk at the NJIT Department of Humanities Fall Colloquium Series to try to let my colleagues know what I’ve been up to recently. Here are the slides.

As a philosopher, my approach is not simply to offer an objective ‘report’. Instead, I will be offering an argument.

If we want to encourage academic flourishing, then we need new ways of evaluating academic research. We want to encourage academic flourishing. Therefore, we need new ways of evaluating academic research.

Of course, the argument also refers to my own activities. I want my department to understand and value my forays into the field of science policy. But that will mean revaluing the way I am currently evaluated (which is along fairly standard lines).

How journals like Nature, Cell and Science are damaging science | Randy Schekman | Comment is free | The Guardian

These journals aggressively curate their brands, in ways more conducive to selling subscriptions than to stimulating the most important research. Like fashion designers who create limited-edition handbags or suits, they know scarcity stokes demand, so they artificially restrict the number of papers they accept. The exclusive brands are then marketed with a gimmick called \”impact factor\” – a score for each journal, measuring the number of times its papers are cited by subsequent research. Better papers, the theory goes, are cited more often, so better journals boast higher scores. Yet it is a deeply flawed measure, pursuing which has become an end in itself – and is as damaging to science as the bonus culture is to banking.

via How journals like Nature, Cell and Science are damaging science | Randy Schekman | Comment is free | The Guardian.

Thanks to my colleague Diana Hicks for pointing this out to me.

The last line of the quotation strikes me as the most interesting point, one that deserves further development. The steering effect of metrics is well known (Weingart 2005). There’s growing resistance to the Journal Impact Factor. Although the persuasive comparison between researchers and bankers is itself over the top, the last line suggests — at least to me — a better way to critique the reliance on the Journal Impact Factor, as well as other attempts to measure research. It’s a sort of reverse Kant with an Illichian flavor, which I will formulate as a principle here, provided that everyone promises to keep in mind my attitude toward principles.

Here is one formulation of the principle: Measure researchers only in ways that recognize them as autonomous agents, never merely as means to other ends.

Here is another: Never treat measures as ends in themselves.

Once measures, which are instruments to the core, take on a life of their own, we have crossed the line that Illich calls the second watershed. That the Journal Impact Factor has in fact crossed that line is the claim made in the quote, above, though not using Illich’s language. The question we should be asking is how researchers can manage measures, rather than how we can measure researchers in order to manage them.
_______________________________________________________

Peter Weingart. Impact of bibliometrics upon the science system: Inadvertent consequences? Scientometrics Vol. 62, No. 1 (2005) 117-131.

What a difference a day makes: How social media is transforming scientific debate (with tweets) · deevybee · Storify

This is definitely worth a look, whether you’re into the idea of post-publication peer review or not.

What a difference a day makes: How social media is transforming scientific debate (with tweets) · deevybee · Storify.

PLOS Biology: Expert Failure: Re-evaluating Research Assessment

Do what you can today; help disrupt and redesign the scientific norms around how we assess, search, and filter science.

via PLOS Biology: Expert Failure: Re-evaluating Research Assessment.

You know, I’m generally in favor of this idea — at least of the idea that we ought to redesign our assessment of research (science in the broad sense). But, as one might expect when speaking of design, the devil is in the details. It would be disastrous, for instance, to throw the baby of peer review out with the bathwater of bias.

I touch on the issue of bias in peer review in this article (coauthored with Steven Hrotic). I suggest that attacks on peer review are attacks on one of the biggest safeguards of academic autonomy here (coauthored with Robert Frodeman). On the relation between peer review and the values of autonomy and accountability, see: J. Britt Holbrook (2010). “Peer Review,” in The Oxford Handbook of Interdisciplinarity, Robert Frodeman, Julie Thompson Klein, Carl Mitcham, eds. Oxford: Oxford University Press: 321-32 and J. Britt Holbrook (2012). “Re-assessing the science – society relation: The case of the US National Science Foundation’s broader impacts merit review criterion (1997 – 2011),” in Peer Review, Research Integrity, and the Governance of Science – Practice, Theory, and Current Discussions. Robert Frodeman, J. Britt Holbrook, Carl Mitcham, and Hong Xiaonan. Beijing: People’s Publishing House: 328 – 62. 

Developing Metrics for the Evaluation of Individual Researchers – Should Bibliometricians Be Left to Their Own Devices?

So, I am sorry to have missed most of the Atlanta Conference on Science and Innovation Policy. On the other hand, I wouldn’t trade my involvement with the AAAS Committee on Scientific Freedom and Responsibility for any other academic opportunity. I love the CSFR meetings, and I think we may even be able to make a difference occasionally. I always leave the meetings energized and thinking about what I can do next.

That said, I am really happy to be on my way back to the ATL to participate in the last day of the Atlanta Conference. Ismael Rafols asked me to participate in a roundtable discussion with Cassidy Sugimoto and him (to be chaired by Diana Hicks). Like I’d say ‘no’ to that invitation!

The topic will be the recent discussions among bibliometricians of the development of metrics for individual researchers. That sounds like a great conversation to me! Of course, when I indicated to Ismael that I was bascially against the idea of bibliometricians coming up with standards for individual-level metrics, Ismael laughed and said the conversation should be interesting.

I’m not going to present a paper; just some thoughts. But I did start writing on the plane. Here’s what I have so far:

Bibliometrics are now increasingly being used in ways that go beyond their design. Bibliometricians are now increasingly asking how they should react to such unintended uses of the tools they developed. The issue of unintended consequences – especially of technologies designed with one purpose in mind, but which can be repurposed – is not new, of course. And bibliometricians have been asking questions – ethical questions, but also policy questions – essentially since the beginning of the development of bibliometrics. If anyone is sensitive to the fact that numbers are not neutral, it is surely the bibliometricians.

This sensitivity to numbers, however, especially when combined with great technical skill and large data sets, can also be a weakness. Bibliometricians are also aware of this phenomenon, though perhaps to a lesser degree than one might like. There are exceptions. The discussion by Paul Wouters, Wolfgang Glänzel, Jochen Gläser, and Ismael Rafols regarding this “urgent debate in bibliometrics,” is one indication of such awareness. Recent sessions at ISSI in Vienna and STI2013 in Berlin on which Wouters et al. report are other indicators that the bibliometrics community feels a sense of urgency, especially with regard to the question of measuring the performance of individual researchers.

That such questions are being raised and discussed by bibliometricians is certainly a positive development. One cannot fault bibliometricians for wanting to take responsibility for the unintended consequences of their own inventions. But one – I would prefer to say ‘we’ – cannot allow this responsibility to be assumed only by members of the bibliometrics community.

It’s not so much that I don’t want to blame them for not having thought through possible other uses of their metrics — holding them to what Carl Mitcham calls a duty plus respicare: to take more into account than the purpose for which something was initially designed. It’s that I don’t want to leave it to them to try to fix things. Bibliometricians, after all, are a disciplinary community. They have standards; but I worry they also think their standards ought to be the standards. That’s the same sort of naivety that got us in this mess in the first place.

Look, if you’re going to invent a device someone else can command (deans and provosts with research evaluation metrics are like teenagers driving cars), you ought at least to have thought about how those others might use it in ways you didn’t intend. But since you didn’t, don’t try to come in now with your standards as if you know best.

Bibliometrics are not the province of bibliometricians anymore. They’re part of academe. And we academics need to take ownership of them. We shouldn’t let administrators drive in our neighborhoods without some sort of oversight. We should learn to drive ourselves so we can determine the rules of the road. If the bibliometricians want to help, that’s cool. But I am not going to let the Fordists figure out academe for me.

With the development of individual level bibliometrics, we now have the ability — and the interest — to own our own metrics. What we want to avoid at all costs is having metrics take over our world so that they end up steering us rather than us driving them. We don’t want what’s happened with the car to happen with bibliometrics. What we want is to stop at the level at which bibliometrics of individual researchers maximize the power and creativity of individual researchers. Once we standardize metrics, it makes it that much easier to institutionalize them.

It’s not metrics themselves that we academics should resist. ‘Impact’ is a great opportunity, if we own it. But by all means, we should resist the institutionalization of standardized metrics. A first step is to resist their standardization.

Snowflake Indicators | Postmodern Research Evaluation | Part 5 of ?

No two snowflakes are alike. No two people are the same.

                                                                                                   — Horshack

Image

                                                                Snowflakes by Juliancolton2 on flickr

Earlier posts in this series attempted to lay out the ways in which Snowball Metrics present as a totalizing grand narrative of research evaluation. Along with attempting to establish a “recipe” that anyone can follow — or that everyone must follow? — in order to evaluate research, this grand narrative appeals to the fact that it is based on a consensus in order to indicate that it is actually fair.

The contrast is between ‘us’ deciding on such a recipe ourselves or having such a recipe imposed on ‘us’ from the outside. ‘We’ decided on the Snowball Metrics recipe based on a consultative method. Everything is on the up and up. Something similar seems to be in the works regarding the use of altmetrics. Personally, I have my doubts about the advisability of standardizing altmetrics.

— But what’s the alternative to using a consultative method to arrive at agreed upon standards for measuring research impact? I mean, it’s either that, or anarchy, or imposition from outside — right?! We don’t want to have standards imposed on us, and we can’t allow anarchy, so ….

Yes, yes, QED. I get it — really, I do. And I don’t have a problem with getting together to talk about things. But must that conversation be methodized? And do we have to reach a consensus?

— Without consensus, it’ll be anarchy!

I don’t think so. I think there’s another alternative we’re not considering. And no, it’s not imposition of standards on us from the ‘outside’ that I’m advocating, either. I think there’s a fourth alternative.

SNOWFLAKE INDICATORS

In contrast to Snowball Metrics, Snowflake Indicators are a delicate combination of science and art (as is cooking, for that matter — something that ought not necessarily involve following a recipe, either! Just a hint for some of those chefs in The Scholarly Kitchen, which sometimes has a tendency to resemble America’s Test Kitchen — a show I watch, along with others, but not so I can simply follow the recipes.). Snowflake Indicators also respect individuality. The point is not to mash the snowflakes together — following the 6-step recipe, of course — to form the perfect snowball. Instead, the point is to let the individual researcher appear as such. In this sense, Snowflake Indicators provide answers to the question of researcher identity. ORCID gets this point, I think.

To say that Snowflake Indicators answer the question of researcher identity is not to suggest that researchers ought to be seen as isolated individuals, however. Who we are is revealed in communication with each other. I really like that Andy Miah’s CV includes a section that lists places in which his work is cited as “an indication of my peer community.” This would count as a Snowflake Indicator.

Altmetrics might also do the trick, depending on how they’re used. Personally, I find it useful to see who is paying attention to what I write or say. The sort of information provided by Altmetric.com at the article level is great. It gives some indication of the buzz surrounding an article, and provides another sort of indicator of one’s peer community. That helps an individual researcher learn more about her audience — something that helps communication, and thus helps a researcher establish her identity. Being able to use ImpactStory.org to craft a narrative of one’s impact — and it’s especially useful not to be tied down to a DOI sometimes — is also incredibly revealing. Used by an individual researcher to craft a narrative of her research, altmetrics also count as Snowflake Indicators.

So, what distinguishes a Snowflake Indicator from a Snowball Metric? It’s tempting to say that it’s the level of measurement. Snowball Metrics are intended for evaluation at a department or university-wide level, or perhaps even at a higher level of aggregation, rather than for the evaluation of individual researchers. Snowflake Indicators, at least in the way I’ve described them above, seem to be aimed at the level of the individual researcher, or even at individual articles. I think there’s something to that, though I also think it might be possible to aggregate Snowflake Indicators in ways that respect idiosyncrasies but that would still allow for meaningful evaluation (more on that in a future post — but for a hint, contrast this advice on making snowballs, where humor and fun make a real difference, with the 6-step process linked above).

But I think that difference in scale misses the really important difference. Where Snowball Metrics aim to make us all comparable, Snowflake Indicators aim to point out the ways in which we are unique — or at least special. Research evaluation, in part, should be about making researchers aware of their own impacts. Research evaluation shouldn’t be punitive, it should be instructive — or at least an opportunity to learn. Research evaluation shouldn’t so much seek to steer research as it should empower researchers to drive their research along the road to impact. Although everyone likes big changes (as long as they’re positive), local impacts should be valued as world-changing, too. Diversity of approaches should also be valued. Any approach to research evaluation that insists we all need to do the same thing is way off track, in my opinion.

I apologize to anyone who was expecting a slick account that lays out the recipe for Snowflake Indicators. I’m not trying to establish rules here. Nor am I insisting that anything goes (there are no rules).  If anything, I am engaged in rule-seeking — something as difficult to grasp and hold on to as a snowflake.

NISO to Develop Standards and Recommended Practices for Altmetrics – National Information Standards Organization

Can we talk about this? Or if I suggest standards are a double-edged sword, will no one listen?

“For altmetrics to move out of its current pilot and proof-of-concept phase, the community must begin coalescing around a suite of commonly understood definitions, calculations, and data sharing practices,” states Todd Carpenter, NISO Executive Director. “Organizations and researchers wanting to apply these metrics need to adequately understand them, ensure their consistent application and meaning across the community, and have methods for auditing their accuracy. We must agree on what gets measured, what the criteria are for assessing the quality of the measures, at what granularity these metrics are compiled and analyzed, how long a period the altmetrics should cover, the role of social media in altmetrics, the technical infrastructure necessary to exchange this data, and which new altmetrics will prove most valuable. The creation of altmetrics standards and best practices will facilitate the community trust in altmetrics, which will be a requirement for any broad-based acceptance, and will ensure that these altmetrics can be accurately compared and exchanged across publishers and platforms.”

“Sensible, community-informed, discipline-sensitive standards and practices are essential if altmetrics are to play a serious role in the evaluation of research,” says Joshua M. Greenberg, Director of the Alfred P. Sloan Foundation’s Digital Information Technology program. “With its long history of crafting just such standards, NISO is uniquely positioned to help take altmetrics to the next level.”

NISO to Develop Standards and Recommended Practices for Altmetrics – National Information Standards Organization.

The post on Snowflake Indicators is coming …

New record: 66 journals banned for boosting impact factor with self-citations : Nature News Blog

More on the Journal Impact Factor from Richard Van Noorden.

Since the journal’s publisher, PLoS, is a signatory of DORA, it probably does not mind [the fall of its Journal Impact Factor].

via New record: 66 journals banned for boosting impact factor with self-citations : Nature News Blog.

My earlier post on DORA is also relevant.

The evidence on the Journal Impact Factor

With the release of the new Journal Impact Factors, everyone should read this blog posted by Paul Wouters at “The Citation Culture.”

The Citation Culture

The San Francisco Declaration on Research Assessment (DORA), see our most recent blogpost, focuses on the Journal Impact Factor, published in the Web of Science by Thomson Reuters. It is a strong plea to base research assessments of individual researchers, research groups and submitted grant proposals not on journal metrics but on article-based metrics combined with peer review. DORA cites a few scientometric studies to bolster this argument. So what is the evidence we have about the JIF?

In the 1990s, the Norwegian researcher Per Seglen, based at our sister institute the Institute for Studies in Higher Education and Research (NIFU) in Oslo and a number of CWTS researchers (in particular Henk Moed and Thed van Leeuwen) developed a systematic critique of the JIF, its validity as well as the way it is calculated (Moed & Van Leeuwen, 1996; Moed & Leeuwen, 1995; Seglen, 1997). This line of research…

View original post 1,366 more words

Evaluating Research beyond Scientific Impact How to Include Criteria for Productive Interactions and Impact on Practice and Society

New, Open Access article just published.

Authors: Wolf, Birge; Lindenthal, Thomas; Szerencsits, Manfred; Holbrook, J. Britt; Heß, Jürgen

Source: GAIA – Ecological Perspectives for Science and Society, Volume 22, Number 2, June 2013 , pp. 104-114(11)

Abstract:

Currently, established research evaluation focuses on scientific impact – that is, the impact of research on science itself. We discuss extending research evaluation to cover productive interactions and the impact of research on practice and society. The results are based on interviews with scientists from (organic) agriculture and a review of the literature on broader/social/societal impact assessment and the evaluation of interdisciplinary and transdisciplinary research. There is broad agreement about what activities and impacts of research are relevant for such an evaluation. However, the extension of research evaluation is hampered by a lack of easily usable data. To reduce the effort involved in data collection, the usability of existing documentation procedures (e.g., proposals and reports for research funding) needs to be increased. We propose a structured database for the evaluation of scientists, projects, programmes and institutions, one that will require little additional effort beyond existing reporting require ments.

Peer Evaluation : Evaluating Research beyond Scientific Impact How to Include Criteria for Productive Interactions and Impact on Practice and Society.