Monday, November 8, 2010

Cite as you like - I

Like I mentioned but six months ago, I'm baa-aack. So here I am. Not in the least tardy.

[Warning: Serious post.]

It is quite fashionable to rail against the peer review system and all associated evils these days, and with good reason. There are not just chinks in its armour but gaping holes, even if perhaps not big enough to be on the next installment of Jon Stewart & Anderson Cooper Look at Gaping Holes. (Yes yes, I am mixing metaphors but I had to segue that in somehow.)

Daniel Lemire, for one, does a great job of highlighting the issues involved. (His post from September titled "How Reliable is Science" is recommended reading to get the context of this article.) He also goes on to provide some nice solutions like trusting unimpressive results more, being suspicious of popular trends, and running your own experiments. While these are good researcher-level solutions (and one can argue that ‘good’ researchers already follow most of it), there are perhaps systemic changes that can be tried out to make things better.

Stepping back a little, it's useful to ask how success is measured in scientific research. Sure, there are prestigious awards and scholarships, and then there is success in simply obtaining funding for your research. But fundamental to success is publication, preferably the peer-reviewed kind. So one could say that a fairly basic measure of success in research is the number of publications you have in hand. As with anything that basic, the measure is very fallible. Anyone can (and lots of people have,) publish a large number of fairly banal and hardly original research and accumulate a large list of publications and then claim to be good at what they do.

The next measure of 'citations' was in part aimed to be a better measure of success. A citation can be loosely defined as a reference to a previously published work. To the best of my knowledge, citations started being used more often (and perhaps in a more formal manner) in the 20th century, with the sheer body of science increasing in girth at an alarming rate, such that few (or no) achievements in science could stand on their own and without being connected to past work. Citations as a measure of success rely on the fairly simple idea that the "better" your work, the more "important" your findings, the more the citations that you will receive. This in turn led to many wonderful things like the h-index and the g-index, that combine the number of papers one has published and the number of citations in a rather clever manner. (I'm partial to the former.)
Thus, while the number of publications and citations are fairly good measures of success and impact, there is little to support the notion that they're good measure of scientific integrity and the reliability of someone's research. Most people just expect the peer-review system and other (fairly marginal) mechanisms to take care of that.

With peer review, a complaint that Lemire voices is that reviewers do not reproduce results, and separately, that [c]itations are not validations and that impressive results are more likely to be cited. Given that citations are but references, all of them are certainly not validations, but I hope no one takes issue when I say that some citations are. So if I were to broadly classify the types of citations, there would be:

A) Citations in reviews, citations in the introduction or literature sections of papers. Now these citations have more of a chronicling function rather than anything else, where the more often something gets cited, the more the people who consider the work a significant development in the field. It is also unlikely that even so much as a developed opinion is presented alongside the mention of said work.
Now these are the citations most susceptible to intellectual reach-arounds. Because of the overwhelming notion that citations are primate amongst all directives in humankind, besides making babbies, there’s every incentive for researchers to cite each others’ works to no end. This clustershag is perhaps the biggest problem with citations.
B) Citations that help an author build a story. These are citations where results of a published work and theirs are related in some way. Here, published work that helps them tell their story of how the world works. Often occurring in discussion sections, things that agree with their worldview get cited, or things that go against their view (when the authors think themselves capable of explaining the disagreement).
These are citations that are most likely to contribute to growing popular trends and more big-picture analyses. The trends and big-picture analyses are very much essential to science, to pull back provincial minds from being lost in the minutiae of their specialization, but the simplest suggestion that can be given to researchers is: Maintain your own big-picture. The extra-large ego that scientists are stereotypically infamous for can no doubt help in mistrusting someone else’s grand narrative.
C) Citations where a technique or an experiment from the cited work are used. The rarest of them all. And by far the most meaningful. ‘Ware all this entails: it means that the Citer (or more likely, his little-paid bonded labourer) has read the (no doubt fascinating) Methods section, and most likely the unkempt wards of the Supplementary Information documents as well. Now, note that the citer has used (with reasonable success, let’s assume) this technique or experiment, which means that he’s read all the relevant information, identified all the usual holes that are present in the description of the technique (present in any paper that isn't a methodology paper), tinkered with the system until something worked well enough to produce some results. Now, being true to the spirit of our cynical selves, such a citation can lead us to only conclude that the cited work is good enough to be ripped off by another researcher.

It is now, good reader that I make my small recommendation. As things stand today, we already make the distinction between self and non-self citations in research. Now, what if we were to go beyond that and classify the citations in the aforementioned way? Citations of type A would act as a measure of apparent significance and reach, type B of general agreements with the cited work’s findings, and type C of true repeatability and scientific rigour. It might also be useful to divide type B into positive and negative citations, to get a better sense how validated the work is. The measures would all be flawed and riddled with caveats, of course, but perhaps less so than a simple number of total citations. Thus we would be in a better position than we are right now in observing both the impact of research and its validity and genuine nature.


The amount of resources currently devoted to exposing scientific fraud is next to nothing. This system of classifying citations is perhaps capable of generating selection criteria for investigating scientific fraud. If a piece of work is low on total citations, then it is either in a niche area, or is not considered important enough by the scientific community. These could effectively be ignored by any watchdogs. Low citations would also mean a lower impact of potential fraud. Among those well-cited, papers with citations of type A vastly outnumber types B and C are likely over-hyped and have a greater potential for scientific fraud. 

Lemire also mentions the rise of Open Scholarship which allows greater outsider participation in research. Complementary to this, I think, is the rise of PubMed in the world of medicine. Not only is it a very search-friendly database of publications, but since last year the National Institutes of Health mandated that all scholarly work that has been published as a result of their funding should be put up online free of charge at PubMed.

PS. Please don't kill me for the terrible title. I'm really bad at them. 


Pixie said...

I have only successfully published paper, and one that failed utterly to become a paper - and I am still pretty clueless on how to use citations in a paper. Even the ones which I've read, citations mostly fall in category A) and comes with all the problems you've listed. Makes me want to believe the meme that people who cite works in their papers haven't even read them.

On a side note, how cool is PubMed? Mom types in "Khosla A H" occasionally just to get an ego boost sometimes. Sigh.

PS said...

Well, a couple of points. The lack of citations of type B and C could be looked at as commentary on the fragmented nature of research in the field (making the assumption that your published paper is representative of the field :)).

I'm far from an expert on what happens in electrical engineering research, but I think that the emphasis on the completeness of referencing is higher in the sciences. In part I think because engineering research might use quite a few "products" and fewer publications, and in part because of a pervasive conference culture that I want to comment on at length in another post.

PS said...

Also, I can't really claim familiarity with your paper, but I find it hard to believe that you can do much research today without significant type C citations.