The Right Coast

July 13, 2005
 
A voice, crying in the wilderness, and then just crying
By Tom Smith

Please go to the bottom of this post; I have new numbers . . .

I would delete the old post, but that seems to violate some norm of the blogosphere, but I am retracting the numbers in the post between the brackets and substituting the new numbers below. My blog my rules!

[Do you sometimes feel, Professor, that no one is listening to you, that your articles are ignored? IS ANYBODY LISTENING?!! Well, the bad news is, you are probably right. That is, you probably are being ignored. I will try to make this point in forthcoming article(s), but probably no one will pay any attention to me, so this is your chance . . .

I just got some new data back from Lexis, with whom I am engaged in a massive citation study, but that's another story. This data concerns law review articles that are in their Shepard's database and how much they get cited. This data covers about 385,000 law review articles, notes, comments, etc. etc. that appear in 726 law reviews and journals, and looks at how often they are cited. Cited by other law reviews, or cases.

First of all, 43 percent of the articles are not cited . . . at all. Zero, nada, zilch. Almost 80 percent (i.e. 79 percent) of law review articles get ten or fewer citations. So where are all the citations going? Well, let's look at articles that get more than 100 citations. These are the elite. They make up less than 1 percent of all articles, .898 percent to be precise. They get, is anybody listening out there? 96 percent of all citations to law review articles. That's all. Only 96 percent. Talk about concentration of wealth.

Why, you ask, is it like this? You should read my paper here, into which this new lawrev data will be incorporated, though I think it may justify a little article on its own. Similar dynamics are probably at work. Possible titles: Why this article (and yours) is a waste of time. Or, Stop that law professor before he writes again. This distributions of cites to law review articles and to cases look the same. Your basic stretched exponential with a long tail, or some would say a power law distribution. On a log-log chart, close to a 45 degree line.

So stop that blogging, professors, and get back to writing those law review articles!

Some other thoughts:

If I got rid of self-citations, which is impractical for me to do, the numbers would be even more pathetic.

You can always tell yourself, maybe someday, someone will cite me. Maybe, but don't count on it. After four years or so, your odds of getting cited are less and less. But it never hurts to hope.

Just because your article has been cited a total of, oh, four times, does not mean it hasn't changed lives. It changed your life, didn't it? Maybe got tenure out of it! Who cares if no one ever cites it!

It's just the way things are. Citation distributions are similar in physics. The distribution is in the same family as income distribution. It's a rough old world.

Well, I've got to get back to work on that pathbreaking article . .]

WHAT about Canadian and European journals? . My data does not include those. But I am pretty certain it would not make any difference in the overall distribution.

I suppose you could also look at the bright side. If you manage to get cited 20 or 30 times, you are in fact doing very well. And if you get cited zero times, well, that's the fate of more than 40 percent.

SOMEONE made a good point in the comments to Todd Zwycki's link at VC. Lawyers (and law professors too) use law review articles, especially for background information and to plunder cites, and don't necessarily cite them. So an article could be significant and not cited. Just as some articles are no doubt cited much more often than they are read. This would seem almost impossible to measure, however.

AS LONG AS I HAVE YOUR ATTENTION, I would like to point out that these results do not seem very consistent with Glenn Reynolds's hopeful assertion that Lexis and Westlaw- type legal search engines are flattening the hierarchy of legal scholarship. His point is more about law journals, not individual articles, but I would be willing to bet that the elite articles are very likely to appear in the elite journals. But I have another, I hope more provocative point. The effect Glenn points to, that Lexis and Westlaw searches mix in articles from obscure journals along with those in top journals, is in fact a consequence of the fact that they use search engines that are primitive compared to what we are now used to from Google. It is not, as he suggests in his article, a sign of things to come, but a symptom of technology that has now been superceded. A sign of things one hopes will soon be improved upon. The obscure journal articles are mostly the equivalent of the "junk results", the elimination of which by Google's PageRank algorithm made so many Google billionaires and millionaires, and for the rest of us makes the WWW so much more useful. Legal search engines, as I explain in my article, ought to exploit the structure in the network of legal citations, just as Google exploits the hypertextual structure of the Web! That's why I call it The Web of Law. Then we would get result rankings that correspond (more) to the hierarchy that is inevitably woven into the citation network. It would negate the (not very desirable in my view) hierarchy flattening effect that comes from mixing in so many less relevant results from less prestigious journals (along with, I concede, perhaps some overlooked diamonds in the mud) into search results, but it would save lawyers and scholars a lot of time.

DAVE HOFFMAN at the Conglomerate worries that the Lexis database is too small. My larger point is that these stretched-exponential-with-power law-tail or power law distributions or whatever they are (physicists, mathematicians, and others are still fighting about these, and boy, lawyers have nothing on them in the fighting department, I have learned) are deeply embedded in the enterprize. The lawrev distribution is very like the distribution for Second Circuit cases, which is very like that for the Alabama Supreme Court, which is very like the distribution for papers in physics, and so on. So it is very unlikely anything would change by increasing the number of journals.

Dave also asked some interesting questions I would love to answer, but for which I do not have the data. Someday, maybe. All I have right now is what I call citation frequency distribution. How many law review articles are there that are cited cited 0 times, 1 time, 2 times etc. Now, I also have that data for every U.S. jurisdiction on virtually all U.S. state and federal cases, so I have a lot of data. But there is much, much more data one could have, and yes, I really hope someday it will all be in an analyzable form. The big thing would be to have in some readily analyzable form, the whole citation network. Imagine a giant matrix, 4 million cells on a side, with the cite of every case on the X and on the Y axis. X is case citing, Y is case cited. If case X cites case Y, there would be a 1 in the (x,y) cell, otherwise a 0. Something like that. You also need the date of the cite. If you had that, you could discover an enormous, and I mean enormous amount about how the legal system is structured, evolves and so on.

OK, IT TURNS OUT I am some good news and some bad news, and the good news is also bad news and the bad news is also good news. In a sense. Anyway, it turns out that I made some miscalculations and the numbers are not nearly as bad as I thought. However they are still really bad. So the good news is that they are not as bad as I thought, and the bad news is that I made a miscalculation. Which, by the way, is why I said the numbers were preliminary. And they still are! And I will keep making corrections until I get it right! Or until I give up! Anyway, here are the revised, but still very preliminary numbers.

So, what I have now is that the top .5% of law review articles gets 18% of all citations (yes, I know that is very different from what I said before, but it is still a very skewed distribution); the top 5.2% gets about 50% of all citations; and the top 17% of articles gets 79% of all citations. And about 40% of articles never get cited at all.

I like these numbers better. They are much closer to a classic "scale free" distribution, even though I am not saying they are a scale free distribution. They are also much closer to the distribution in physics articles, which is kind of interesting in itself.

These numbers are still very preliminary. Not only might they be changed, they almost certainly will be changed. For one thing, there is a big chunk of citations unaccounted for, and while I doubt they will end up on the low citation end of the distribution, when they are tracked down, they are bound to change the numbers somewhat. But at any rate, these numbers are much closer to being sound than my first try. This just goes to show you, you should never do any permanent harm to yourself on the basis of a blog post.