Monday, November 15, 2010

Cite as you like - II

Staying on the topic of the previous post, I was thinking about how citations work in different fields of research.

The poor quality of citations in a field appears to be closely linked to how search-friendly the literature in that field is. If you have to expend as much effort into collecting all papers relevant to your work as what you would in actually doing the work, then it would be unlikely to find many papers with excellent referencing within them. The problem with Google being your main search tool is that you can never claim to have the definitive collection of literature on any field. For example, I've spent a good amount of time these past few months looking for literature on climate change in India. Every week I find something new on some obscure corner of the internet. What I do find usually lacks good references, the few occurring references are either self-citations or incestuous in nature, and end up containing more opinions and leaps of logic than real, hard science.

Who can blame them? No one really knows what the state of the art is, and the few who claim to are almost always liars and charlatans. 'Expert opinion' replaces good research. A collective big-picture takes a back seat to an individual's myopic world view.

As I've mentioned before, the bio and medical sciences have PubMed. The sciences and some of the social sciences have the ISI Web of Knowledge (not free), which is excellent for advanced searching, and also for checking out citations. Then there are also sites like Arxiv and Scifinder and others. Just to name a few.

India-specific research has nothing. India-centric social sciences, even less. Compared to funding research, the money required for developing a good search tool is negligible. Consider the idea that a new repository website is created for India-specific research. Let's suppose that all governmental and foreign funding agencies mandate that at least the abstract of all funded work should be put up on the search website. At least till it becomes a habit. We (hopefully) start building a one-stop shop for the field.

Following the train of thought is not hard: Increase the ease of access to your work > more people read it > more people cite it > your work has higher impact > more people want to sleep with you. Flawless logic. Augment this approach with funds and a system to mine the interwebs and pull up a good fraction of prior work done as well, and we're in business.

Google Scholar is not quite the scholar's Google. Something else can be, though.

PS. I did a lousy job of citing my own sources in the last post. Mea Culpa. Many thanks to KVM and Vatsa for sharing Lamire's posts on the peer review system and academic fraud on Google Reader.

PPS. I'm not very sure about how things work in engineering. I remember hearing things like a journal with Impact Factor of around 3 in chemical engineering is usually a great one to publish in. The good bioscience journals seem to have impact factors higher than 10... and while the research output by itself is far greater in the biosciences than in any field of engineering, I wonder if a poor search system and dubious conference submissions are a factor in poor citations and hence poor impact factors.

Monday, November 8, 2010

Cite as you like - I

Like I mentioned but six months ago, I'm baa-aack. So here I am. Not in the least tardy.

[Warning: Serious post.]

It is quite fashionable to rail against the peer review system and all associated evils these days, and with good reason. There are not just chinks in its armour but gaping holes, even if perhaps not big enough to be on the next installment of Jon Stewart & Anderson Cooper Look at Gaping Holes. (Yes yes, I am mixing metaphors but I had to segue that in somehow.)

Daniel Lemire, for one, does a great job of highlighting the issues involved. (His post from September titled "How Reliable is Science" is recommended reading to get the context of this article.) He also goes on to provide some nice solutions like trusting unimpressive results more, being suspicious of popular trends, and running your own experiments. While these are good researcher-level solutions (and one can argue that ‘good’ researchers already follow most of it), there are perhaps systemic changes that can be tried out to make things better.

Stepping back a little, it's useful to ask how success is measured in scientific research. Sure, there are prestigious awards and scholarships, and then there is success in simply obtaining funding for your research. But fundamental to success is publication, preferably the peer-reviewed kind. So one could say that a fairly basic measure of success in research is the number of publications you have in hand. As with anything that basic, the measure is very fallible. Anyone can (and lots of people have,) publish a large number of fairly banal and hardly original research and accumulate a large list of publications and then claim to be good at what they do.

The next measure of 'citations' was in part aimed to be a better measure of success. A citation can be loosely defined as a reference to a previously published work. To the best of my knowledge, citations started being used more often (and perhaps in a more formal manner) in the 20th century, with the sheer body of science increasing in girth at an alarming rate, such that few (or no) achievements in science could stand on their own and without being connected to past work. Citations as a measure of success rely on the fairly simple idea that the "better" your work, the more "important" your findings, the more the citations that you will receive. This in turn led to many wonderful things like the h-index and the g-index, that combine the number of papers one has published and the number of citations in a rather clever manner. (I'm partial to the former.)
Thus, while the number of publications and citations are fairly good measures of success and impact, there is little to support the notion that they're good measure of scientific integrity and the reliability of someone's research. Most people just expect the peer-review system and other (fairly marginal) mechanisms to take care of that.

With peer review, a complaint that Lemire voices is that reviewers do not reproduce results, and separately, that [c]itations are not validations and that impressive results are more likely to be cited. Given that citations are but references, all of them are certainly not validations, but I hope no one takes issue when I say that some citations are. So if I were to broadly classify the types of citations, there would be:

A) Citations in reviews, citations in the introduction or literature sections of papers. Now these citations have more of a chronicling function rather than anything else, where the more often something gets cited, the more the people who consider the work a significant development in the field. It is also unlikely that even so much as a developed opinion is presented alongside the mention of said work.
Now these are the citations most susceptible to intellectual reach-arounds. Because of the overwhelming notion that citations are primate amongst all directives in humankind, besides making babbies, there’s every incentive for researchers to cite each others’ works to no end. This clustershag is perhaps the biggest problem with citations.
B) Citations that help an author build a story. These are citations where results of a published work and theirs are related in some way. Here, published work that helps them tell their story of how the world works. Often occurring in discussion sections, things that agree with their worldview get cited, or things that go against their view (when the authors think themselves capable of explaining the disagreement).
These are citations that are most likely to contribute to growing popular trends and more big-picture analyses. The trends and big-picture analyses are very much essential to science, to pull back provincial minds from being lost in the minutiae of their specialization, but the simplest suggestion that can be given to researchers is: Maintain your own big-picture. The extra-large ego that scientists are stereotypically infamous for can no doubt help in mistrusting someone else’s grand narrative.
C) Citations where a technique or an experiment from the cited work are used. The rarest of them all. And by far the most meaningful. ‘Ware all this entails: it means that the Citer (or more likely, his little-paid bonded labourer) has read the (no doubt fascinating) Methods section, and most likely the unkempt wards of the Supplementary Information documents as well. Now, note that the citer has used (with reasonable success, let’s assume) this technique or experiment, which means that he’s read all the relevant information, identified all the usual holes that are present in the description of the technique (present in any paper that isn't a methodology paper), tinkered with the system until something worked well enough to produce some results. Now, being true to the spirit of our cynical selves, such a citation can lead us to only conclude that the cited work is good enough to be ripped off by another researcher.


It is now, good reader that I make my small recommendation. As things stand today, we already make the distinction between self and non-self citations in research. Now, what if we were to go beyond that and classify the citations in the aforementioned way? Citations of type A would act as a measure of apparent significance and reach, type B of general agreements with the cited work’s findings, and type C of true repeatability and scientific rigour. It might also be useful to divide type B into positive and negative citations, to get a better sense how validated the work is. The measures would all be flawed and riddled with caveats, of course, but perhaps less so than a simple number of total citations. Thus we would be in a better position than we are right now in observing both the impact of research and its validity and genuine nature.


Afterthoughts.


The amount of resources currently devoted to exposing scientific fraud is next to nothing. This system of classifying citations is perhaps capable of generating selection criteria for investigating scientific fraud. If a piece of work is low on total citations, then it is either in a niche area, or is not considered important enough by the scientific community. These could effectively be ignored by any watchdogs. Low citations would also mean a lower impact of potential fraud. Among those well-cited, papers with citations of type A vastly outnumber types B and C are likely over-hyped and have a greater potential for scientific fraud. 


Lemire also mentions the rise of Open Scholarship which allows greater outsider participation in research. Complementary to this, I think, is the rise of PubMed in the world of medicine. Not only is it a very search-friendly database of publications, but since last year the National Institutes of Health mandated that all scholarly work that has been published as a result of their funding should be put up online free of charge at PubMed.


PS. Please don't kill me for the terrible title. I'm really bad at them.