There are bad news. Viruses and worms are subject to a constant evolution and we are far from reaching the steady state. New influenza viruses, an infectious disease caused by RNA viruses, are constantly produced by mutation and reassortment (the mixing of genetic material from two similar viruses). In the olden days of computing, when we gazed at EGA graphics, computer users content against boot sector viruses and other malicious code affecting their programs. These kinds of viruses became less common in later generations where virus developer focused on exploiting the rich scripting functionalities provided by modern office application suites and Macro Viruses were becoming more widespread. Nowadays, one has to cope with security exploits in hosted software (e.g. phpBB), security leaks in web 2.0 applications (e.g. Facebook applications), phising, ….
This are well-known facts. I presented them to illustrate that viruses evolve and infect new hosts. The bad news is that research has been infected by a new virus called scholastica googlensis, as Alois Potton highlights in the 3/2008 issue of the PIK journal. Scholastica googlensis causes a linearisation of humans aiming towards a perfect alignment, making researchers comparable. Reputation is reduced to a single number, the Google Scholar index, expressing the amount of papers written by the considered author which are indexed in Google’s database. Only the number counts, publish or perish! Research is scaled down to a single metric. The higher the index, the higher the reputation, the higher chances are in an appointment board when filling a vacancy for an full professor. Alois Potton mentioned in his column the idea to reduce the review process at Dagstuhl seminars to a single one dimensional number: the Google Scholar index of the author. Life can be pretty simple.
The consequences are that a single company using the page rank algorithm not only controls the available knowledge - a fact is known, if and only if it is presented within the first n search results - but also influences the way knowledge is created by impairing the selection process in research.
Regarding to Einstein, everything should be made as simple as possible, but no simpler. Is this metric already a way too simple?

The above image shows the Internet density as logarithmic visualisation of the Information Please (r) database holding data concerning the Internet usage from 2005 mixed with a population grid.
As I was discussing about citations in academic papers during lunch today, I thought it’s time to write about some older paper dealing with the citation process. In 2002, Simkin and Roychowdhury published a paper entitled Read Before You Cite where they claimed that only about 20 % of the citers read the paper they were citing. They studied the distribution of misprints in bibliographic references and assumed a correlation between misprints and the fact that the author read the paper. At the first glance, this assumption seems to be quite logical, as an alert reader will find the errors in the bibliographic record. Simkin et al. present a nice analytical evaluation where they also showed that the misprint distribution follows a Zipf law. However, the correctness of the result simply depends on the correctness of the basic assumptions. And this is what I believe the problem of this paper, as at least in computer science, the citation process might have some more properties that were neglected in the paper.
My citation process is decoupled from my reading process. When discovering an interesting paper, I mostly print it out as this is more comfortable when taking notes and allows “offline” reading in the suburban train or bus. I’m too lazy to take my (heavy) laptop with me all the time but mostly have some papers in my bag. After reading the paper, I might file it away. There may elapse some time before I grab the paper again to cite it when working on a publication or writing a mail. However, as some time passed by since I got the paper, I might have forgot about some of the details needed for citing it (maybe the volume of the journal). Mostly, I write some short note on the heading of the first page that will remind me on the most important bibliographic data, but sometimes I just forget it. When citing the paper, I mostly use public databases (such as provided by the ACM) or access the authors web page to obtain everything I need in the BibTeX format, ready to cut and paste it into my bibliographic database. Nowadays, this technique is very convenient and fast. What if the record I just copied was erroneous? (sometimes even bibliographic records provided at the author’s page are erroneous!) Well, then I might spread another misprint as measured by Simkin et al.
All I want to say is that there not necessarily a correlation or even a causal connection between a misprinted bibliographic record and the fact whether an author actually read the paper or not. Moreover, a colleague draw up with a metric that may be more reliable: simply compute the amount of papers an author has to read per day (works only for authors writing tons of papers). However, as such an author will likely be an full professor or the head of the department that puts his name behind all works of his Ph.D. students, the most interesting question would be: did they read what they wrote?
Related information: