Developing Metrics for the Evaluation of Individual Researchers – Should Bibliometricians Be Left to Their Own Devices?

So, I am sorry to have missed most of the Atlanta Conference on Science and Innovation Policy. On the other hand, I wouldn’t trade my involvement with the AAAS Committee on Scientific Freedom and Responsibility for any other academic opportunity. I love the CSFR meetings, and I think we may even be able to make a difference occasionally. I always leave the meetings energized and thinking about what I can do next.

That said, I am really happy to be on my way back to the ATL to participate in the last day of the Atlanta Conference. Ismael Rafols asked me to participate in a roundtable discussion with Cassidy Sugimoto and him (to be chaired by Diana Hicks). Like I’d say ‘no’ to that invitation!

The topic will be the recent discussions among bibliometricians of the development of metrics for individual researchers. That sounds like a great conversation to me! Of course, when I indicated to Ismael that I was bascially against the idea of bibliometricians coming up with standards for individual-level metrics, Ismael laughed and said the conversation should be interesting.

I’m not going to present a paper; just some thoughts. But I did start writing on the plane. Here’s what I have so far:

Bibliometrics are now increasingly being used in ways that go beyond their design. Bibliometricians are now increasingly asking how they should react to such unintended uses of the tools they developed. The issue of unintended consequences – especially of technologies designed with one purpose in mind, but which can be repurposed – is not new, of course. And bibliometricians have been asking questions – ethical questions, but also policy questions – essentially since the beginning of the development of bibliometrics. If anyone is sensitive to the fact that numbers are not neutral, it is surely the bibliometricians.

This sensitivity to numbers, however, especially when combined with great technical skill and large data sets, can also be a weakness. Bibliometricians are also aware of this phenomenon, though perhaps to a lesser degree than one might like. There are exceptions. The discussion by Paul Wouters, Wolfgang Glänzel, Jochen Gläser, and Ismael Rafols regarding this “urgent debate in bibliometrics,” is one indication of such awareness. Recent sessions at ISSI in Vienna and STI2013 in Berlin on which Wouters et al. report are other indicators that the bibliometrics community feels a sense of urgency, especially with regard to the question of measuring the performance of individual researchers.

That such questions are being raised and discussed by bibliometricians is certainly a positive development. One cannot fault bibliometricians for wanting to take responsibility for the unintended consequences of their own inventions. But one – I would prefer to say ‘we’ – cannot allow this responsibility to be assumed only by members of the bibliometrics community.

It’s not so much that I don’t want to blame them for not having thought through possible other uses of their metrics — holding them to what Carl Mitcham calls a duty plus respicare: to take more into account than the purpose for which something was initially designed. It’s that I don’t want to leave it to them to try to fix things. Bibliometricians, after all, are a disciplinary community. They have standards; but I worry they also think their standards ought to be the standards. That’s the same sort of naivety that got us in this mess in the first place.

Look, if you’re going to invent a device someone else can command (deans and provosts with research evaluation metrics are like teenagers driving cars), you ought at least to have thought about how those others might use it in ways you didn’t intend. But since you didn’t, don’t try to come in now with your standards as if you know best.

Bibliometrics are not the province of bibliometricians anymore. They’re part of academe. And we academics need to take ownership of them. We shouldn’t let administrators drive in our neighborhoods without some sort of oversight. We should learn to drive ourselves so we can determine the rules of the road. If the bibliometricians want to help, that’s cool. But I am not going to let the Fordists figure out academe for me.

With the development of individual level bibliometrics, we now have the ability — and the interest — to own our own metrics. What we want to avoid at all costs is having metrics take over our world so that they end up steering us rather than us driving them. We don’t want what’s happened with the car to happen with bibliometrics. What we want is to stop at the level at which bibliometrics of individual researchers maximize the power and creativity of individual researchers. Once we standardize metrics, it makes it that much easier to institutionalize them.

It’s not metrics themselves that we academics should resist. ‘Impact’ is a great opportunity, if we own it. But by all means, we should resist the institutionalization of standardized metrics. A first step is to resist their standardization.

Altmetrics — Meretricious or Meritorious?

I think Jeffrey Beall has got this wrong. He claims that altmetrics are an “Ill-conceived and Meretricious Idea.”

On the other hand, I think Euan Adie has got this right. Here is his measured response (sorry, couldn’t resist) to Beall.

So, I come down on the meritorious side. Of course, none of this is to say that altmetrics are without flaws. But one thing they are decidedly good for is connecting academic researchers to those who read their research. That’s what scholarly communication is all about, in my book (sorry again).

On being tweeted to the top

Our results show that scientists who interacted more frequently with journalists had higher h-indices, as did scientists whose work was mentioned on Twitter. Interestingly, however, our data also showed an amplification effect. Furthermore, interactions with journalists had a significantly higher impact on h-index for those scientists who were also mentioned on the micro-blogging platform than for those who were not, suggesting that social media can further amplify the impact of more traditional outlets.

via Opinion: Tweeting to the Top | The Scientist Magazine®.

I’m surprised there’s no mention of altmetrics in this piece. Results are relevant for altmetrics, though. Hoping they’ve published the study in more detail somewhere so I can check it out.

In any case, this is a quick and thought-provoking read — though it appears that it’s more about being tweeted to the top than tweeting to the top (two are interestingly related, however, I suspect).

Snowflake Indicators | Postmodern Research Evaluation | Part 5 of ?

No two snowflakes are alike. No two people are the same.

                                                                                                   — Horshack

Image

                                                                Snowflakes by Juliancolton2 on flickr

Earlier posts in this series attempted to lay out the ways in which Snowball Metrics present as a totalizing grand narrative of research evaluation. Along with attempting to establish a “recipe” that anyone can follow — or that everyone must follow? — in order to evaluate research, this grand narrative appeals to the fact that it is based on a consensus in order to indicate that it is actually fair.

The contrast is between ‘us’ deciding on such a recipe ourselves or having such a recipe imposed on ‘us’ from the outside. ‘We’ decided on the Snowball Metrics recipe based on a consultative method. Everything is on the up and up. Something similar seems to be in the works regarding the use of altmetrics. Personally, I have my doubts about the advisability of standardizing altmetrics.

— But what’s the alternative to using a consultative method to arrive at agreed upon standards for measuring research impact? I mean, it’s either that, or anarchy, or imposition from outside — right?! We don’t want to have standards imposed on us, and we can’t allow anarchy, so ….

Yes, yes, QED. I get it — really, I do. And I don’t have a problem with getting together to talk about things. But must that conversation be methodized? And do we have to reach a consensus?

— Without consensus, it’ll be anarchy!

I don’t think so. I think there’s another alternative we’re not considering. And no, it’s not imposition of standards on us from the ‘outside’ that I’m advocating, either. I think there’s a fourth alternative.

SNOWFLAKE INDICATORS

In contrast to Snowball Metrics, Snowflake Indicators are a delicate combination of science and art (as is cooking, for that matter — something that ought not necessarily involve following a recipe, either! Just a hint for some of those chefs in The Scholarly Kitchen, which sometimes has a tendency to resemble America’s Test Kitchen — a show I watch, along with others, but not so I can simply follow the recipes.). Snowflake Indicators also respect individuality. The point is not to mash the snowflakes together — following the 6-step recipe, of course — to form the perfect snowball. Instead, the point is to let the individual researcher appear as such. In this sense, Snowflake Indicators provide answers to the question of researcher identity. ORCID gets this point, I think.

To say that Snowflake Indicators answer the question of researcher identity is not to suggest that researchers ought to be seen as isolated individuals, however. Who we are is revealed in communication with each other. I really like that Andy Miah’s CV includes a section that lists places in which his work is cited as “an indication of my peer community.” This would count as a Snowflake Indicator.

Altmetrics might also do the trick, depending on how they’re used. Personally, I find it useful to see who is paying attention to what I write or say. The sort of information provided by Altmetric.com at the article level is great. It gives some indication of the buzz surrounding an article, and provides another sort of indicator of one’s peer community. That helps an individual researcher learn more about her audience — something that helps communication, and thus helps a researcher establish her identity. Being able to use ImpactStory.org to craft a narrative of one’s impact — and it’s especially useful not to be tied down to a DOI sometimes — is also incredibly revealing. Used by an individual researcher to craft a narrative of her research, altmetrics also count as Snowflake Indicators.

So, what distinguishes a Snowflake Indicator from a Snowball Metric? It’s tempting to say that it’s the level of measurement. Snowball Metrics are intended for evaluation at a department or university-wide level, or perhaps even at a higher level of aggregation, rather than for the evaluation of individual researchers. Snowflake Indicators, at least in the way I’ve described them above, seem to be aimed at the level of the individual researcher, or even at individual articles. I think there’s something to that, though I also think it might be possible to aggregate Snowflake Indicators in ways that respect idiosyncrasies but that would still allow for meaningful evaluation (more on that in a future post — but for a hint, contrast this advice on making snowballs, where humor and fun make a real difference, with the 6-step process linked above).

But I think that difference in scale misses the really important difference. Where Snowball Metrics aim to make us all comparable, Snowflake Indicators aim to point out the ways in which we are unique — or at least special. Research evaluation, in part, should be about making researchers aware of their own impacts. Research evaluation shouldn’t be punitive, it should be instructive — or at least an opportunity to learn. Research evaluation shouldn’t so much seek to steer research as it should empower researchers to drive their research along the road to impact. Although everyone likes big changes (as long as they’re positive), local impacts should be valued as world-changing, too. Diversity of approaches should also be valued. Any approach to research evaluation that insists we all need to do the same thing is way off track, in my opinion.

I apologize to anyone who was expecting a slick account that lays out the recipe for Snowflake Indicators. I’m not trying to establish rules here. Nor am I insisting that anything goes (there are no rules).  If anything, I am engaged in rule-seeking — something as difficult to grasp and hold on to as a snowflake.

NISO to Develop Standards and Recommended Practices for Altmetrics – National Information Standards Organization

Can we talk about this? Or if I suggest standards are a double-edged sword, will no one listen?

“For altmetrics to move out of its current pilot and proof-of-concept phase, the community must begin coalescing around a suite of commonly understood definitions, calculations, and data sharing practices,” states Todd Carpenter, NISO Executive Director. “Organizations and researchers wanting to apply these metrics need to adequately understand them, ensure their consistent application and meaning across the community, and have methods for auditing their accuracy. We must agree on what gets measured, what the criteria are for assessing the quality of the measures, at what granularity these metrics are compiled and analyzed, how long a period the altmetrics should cover, the role of social media in altmetrics, the technical infrastructure necessary to exchange this data, and which new altmetrics will prove most valuable. The creation of altmetrics standards and best practices will facilitate the community trust in altmetrics, which will be a requirement for any broad-based acceptance, and will ensure that these altmetrics can be accurately compared and exchanged across publishers and platforms.”

“Sensible, community-informed, discipline-sensitive standards and practices are essential if altmetrics are to play a serious role in the evaluation of research,” says Joshua M. Greenberg, Director of the Alfred P. Sloan Foundation’s Digital Information Technology program. “With its long history of crafting just such standards, NISO is uniquely positioned to help take altmetrics to the next level.”

NISO to Develop Standards and Recommended Practices for Altmetrics – National Information Standards Organization.

The post on Snowflake Indicators is coming …

Should we develop an alt-H-index? | Postmodern Research Evaluation | 4 of ?

In the last post in this series, I promised to present an alternative to Snowball Metrics — something I playfully referred to as ‘Snowflake Indicators’ in an effort to distinguish what I am proposing from the grand narrative presented by Snowball Metrics. But two recent developments have sparked a related thought that I want to pursue here first.

This morning, a post on the BMJ blog asks the question: Who will be the Google of altmetrics? The suggestion that we should have such an entity comes from Jason Priem, of course. He’s part of the altmetrics avant garde, and I always find what he has to say on the topic provocative. The BMJ blog post is also worth reading to get the lay of the land regarding the leaders of the altmetrics push.

Last Friday, the editors of the LSE Impact of Social Sciences blog contacted me and asked whether they might replace our messy ’56 indicators of impact’ with a cleaned-up and clarified version. I asked them to add it in, without simply replacing our messy version with their clean version, and they agreed. You can see the updated post here. I’ll come back to this later in more detail. For now, I want to ask a different, though related, question.

COULD WE DEVELOP AN ALT-H-INDEX?

The H-index is meant to be a measure of the productivity and impact of an individual scholar’s research on other researchers, though recently I’ve seen it applied to journals. But the original idea is to find the number of a researcher’s publications that have been cited at least X times. Of course, the actual number of one’s H-index will vary based on the citation data-base one is using. According to Scopus, for instance, my H-index is 4. A quick look at my Researcher ID and it’s easy enough to see that my H-index would be 1. Then, if we look at Google Scholar, we see that my H-index is 6. Differences such as these — and the related question of the value of such metrics as the H-index — are the subject of research being performed now by Kelli Barr (one of our excellent UNT/CSID graduate students).

Now, if it’s clear enough how the H-index is generated … well, let’s move on for the moment.

How would an alt-H-index be generated?

There are a several alternatives here. But let’s pursue the one that’s most parallel to the way the H-index is generated. So, let’s substitute products for articles and mentions for citations. One’s alt-H-index would then be the number of products P that have at least P mentions on things tracked by altmetricians.

I don’t have time at the moment to calculate my full alt-H-index. But let’s go with some things I have been tracking: my recent correspondence piece in Nature, the most recent LSE Impact of Social Sciences blog post (linked above), and my recently published article in Synthese on “What Is Interdisciplinary Communication?” [Of course, limiting myself to 3 products would mean that my alt-H-index couldn’t go above 3 for the purposes of this illustration.]

According to Impact Story, the correspondence piece in Nature has received  41 mentions (26 tweets, 6 Mendeley readers, and 9 CiteULike bookmarks). The LSE blog post has received 114 mentions (113 tweets and 1 bookmark). And the Synthese paper has received 5 (5 tweets). So, my alt-H-index would be 3, according to Impact Story.

According to Altmetric, the Nature correspondence has received 125 mentions (96 tweets, 9 Facebook posts/shares, 3 Google+ shares, blogged by 11, and 6 CiteULike bookmarks), the LSE Blog post cannot be measured, and the Synthese article has 11 mentions (3 tweets, 3 blogs, 1 Google+, 2 Mendeley, and 2 CiteULike). So, my alt-H-index would be 2, according to Altmetric data.

Comparing H-index and alt-H-index

So, as I note above, I’ve limited the calculations of my alt-h-index to three products. I have little doubt that my alt-h-index is considerably higher than my h-index — and would be so for most researchers who are active on social media and who publish in alt-academic venues, such as scholarly blogs (or, if you’re really cool like my colleague Adam Briggle, in Slate), or for fringe academics, such as my colleague  Keith Brown, who typically publishes almost exclusively in non-scholarly venues.

This illustrates a key difference between altmetrics and traditional bibliometrics. Altmetrics are considerably faster than traditional bibliometrics. It takes a long time for one’s H-index to go up. ‘Older’ researchers typically have higher H-indices than ‘younger’ researchers. I suspect that ‘younger’ researchers may well have higher alt-H-indices, since ‘younger’ researchers tend to be more active on social media and more prone to publish in the sorts of alt-academic venues mentioned above.

But there are also some interesting similarities. First, it makes a difference where you get your data. My H-index is 4, 1, or 6, depending on whether we use data from Scopus, Web of Science, or Google Scholar. My incomplete alt-H-index is either 3 or 2, depending on whether we use data from Impact Story or Altmetric. An interesting side note that ties in with the question of the Google of altmetrics is that the reason for the difference in my alt-H-index when using data from Impact Story and Altmetric is that Altmetric requires a DOI. With Impact Story, you can import URLs, which makes it considerably more flexible for certain products. In that respect, at least, Impact Story is more like Google Scholar — it covers more — whereas Altmetric is more like Scopus. That’s a sweeping generalization, but I think it’s basically right, in this one respect.

But these differences raise the more fundamental question, and one that serves as the beginning of a response to the update of my LSE Impact of Social Sciences blog piece:

SHOULD WE DEVELOP AN ALT-H-INDEX?

It’s easy enough to do it. But should we? Asking this question means exploring some of the larger ramifications of metrics in general — the point of my LSE Impact post. If we return to that post now, I think it becomes obvious why I wanted to keep our messy list of indicators alongside the ‘clarified’ list. The LSE-modified list divides our 56 indicators into two lists: one of ’50 indicators of positive impact’ and another of ‘6 more ambiguous indicators of impact’. Note that H-index is included on the ‘indicators of positive impact’ list. That there is a clear boundary between ‘indicators of positive impact’ and ‘more ambiguous indicators of impact’ — or ‘negative metrics’ as the Nature editors suggested — is precisely the sort of thinking our messy list of 56 indicators is meant to undermine.

H-index is ambiguous. It embodies all sorts of value judgments. It’s not a simple matter of working out the formula. The numbers that go into the formula will differ, depending on the data source used (Scopus, Web of Science, or Google Scholar), and these data also depend on value judgments. Metrics tend to be interpreted as objective. But we really need to reexamine what we mean by this. Altmetrics are the same as traditional bibliometrics in this sense — all metrics rest on prior value judgments.

As we note at the beginning of our Nature piece, articles may be cited for ‘positive’ or ‘negative’ reasons. More citations do not always mean a more ‘positive’ reception for one’s research. Similarly, a higher H-index does not always mean that one’s research has been more ‘positively’ received by peers. The simplest thing it means is that one has been at it longer. But even that is not necessarily the case. Similarly, a higher alt-H-index probably means that one has more social media influence — which, we must realize, is ambiguous. It’s not difficult to imagine that quite a few ‘more established’ or more traditional researchers could interpret a higher alt-H-index as indicating a lack of serious scholarly impact.

Here, then, is the bottom line: there are no unambiguously positive indicators of impact!

I will, I promise, propose my Snowflake Indicators framework as soon as possible.

New Kids on the Bibliometrics Block | Altmetric.com

New Kids on the Bibliometrics Block | Altmetric.com.

Other infrequently asked questions about impact

Here are some other infrequently asked questions about impact that didn’t make it into the final cut of my piece at the LSE Impact of Social Sciences Blog.

Why conflate impact with benefit?

Put differently, why assume that all impacts are positive or benefits to society? Obviously, no one wants publicly supported research not to benefit the public. It’s even less palatable to consider that some publicly supported research may actually harm the public. But it’s wishful thinking to assume that all impacts are beneficial. Some impacts that initially appear beneficial may have negative consequences. And seemingly negative indicators might actually show that one is having an impact – even a positive one. I discuss this point with reference to Jeffrey Beall, recently threatened with a $1 billion lawsuit, here.

The question of impact is an opportunity to discuss such issues, rather than retreating into the shelter of imagined value-neutrality or objectivity. It was to spark this discussion that we generated a CSID-specific list – it is purposely idiosyncratic.

How can we maximize our impact?

I grant that ‘How can we maximize our impact?’ is a logistical question; but it incorporates a healthy dose of logos. Asking how to maximize our impacts should appeal to academics. We may be choosey about the sort of impact we desire and on whom; but no one wants to have minimal impact. We all desire to have as much impact as possible. Or, if we don’t, please get another job and let some of us who do want to make a difference have yours.

Wherefore impact?

For what reason are we concerned with the impact of scholarly communication? It’s the most fundamental question we should be asking and answering. We need to be mindful that whatever metrics we devise will have a steering effect on the course of scholarly communications. If we are going to steer scholarly communications, then we should discuss where we plan to go – and where others might steer us.

Developing indicators of the impact of scholarly communication is a massive technical challenge – but it’s also much simpler than that | Impact of Social Sciences

Developing indicators of the impact of scholarly communication is a massive technical challenge – but it’s also much simpler than that | Impact of Social Sciences.

In which I expand on ideas presented here and here.

Postmodern Research Evaluation? | 1 of ?

This will be the first is a series of posts tagged ‘postmodern research evaluation’ — a series meant to be critical and normative, expressing my own, subjective, opinions on the question.

Before I launch into any definitions, take a look at this on ‘Snowball Metrics‘. Reading only the first few pages should help orient you to where I am coming from. It’s a place from where I hope to prevent such an approach to metrics from snowballing — a good place, I think, for a snowball fight.

Read the opening pages of the snowball report. If you cannot see this as totalizing — in a very bad way — then we see things very differently. Still, I hope you read on, my friend. Perhaps I still have a chance to prevent the avalanche.