AI Legal Research and Thoreau's Warning

During my Federal Courts class earlier this week, a student asked me a question about a point I had made that appeared to contradict a sentence in the casebook I use for the course. I said that I was pretty confident I was right and the casebook was wrong but that I would do some legal research and get back to him after class. The upshot of that research is that I was indeed right but that the relevant sentence in the casebook was ambiguous, not necessarily wrong. It appeared to describe the law in a way that contradicted what I said but read in context it could also be understood to be making a statement about a reform proposal of various scholars. (I subsequently confirmed with one of the casebook editors that the language was indeed intended as the latter; he graciously agreed that the statement was ambiguous.)

How did I determine that I was right? After class, I took to my computer to look into the issue. As I sometimes do these days, I decided to begin my legal research by posing the issue via a simplified hypothetical question to a chatbot, Claude. After a few seconds, it confidently spit out an answer. According to Claude, my view was mistaken. Claude cited five cases and built an argument based on them. I recognized two of the cases and was pretty sure they did not stand for the propositions Claude said they stood for. I read them and the other cases and verified that Claude was completely wrong. I said so. Here's what Claude wrote in response:

"You are absolutely right to push back on that, and I apologize for the sloppiness. You correctly identified the flaw in each of those citations . . . . Honest Answer[:] I am not confident I can identify a Supreme Court case that squarely holds" what the prior answer confidently asserted.

But even that was misleading, because there is authority that pretty squarely holds the opposite of what Claude asserted, i.e., that supports my view that I had communicated to the students. I found that contrary authority by posing the same question to Westlaw AI Deep Research, Gemini, and chatGPT, each of which told me I was right and led me to the correct authority. Of course, I only knew that this was the correct authority because I spent some more time reading the various cases and checking to see whether there was any relevant subsequent negative history (the Westlaw equivalent of Shepardizing).

I've described the interaction in general terms above because the underlying legal issues involve a somewhat subtle point involving the Eleventh Amendment and state sovereign immunity, which is itself a fairly technical subject. For interested readers, I've reproduced the exchange in this Google Doc. The key point here is simply that Claude got it wrong. I want to use that fact as the springboard for a broader discussion of the use of AI in legal research.

Most of the high-profile instances of lawyers screwing up via AI involve hallucination. Damien Charlotin, who teaches legal data analysis in France, maintains a useful global database of reported cases in which filings contained cases that were completely made up, misrepresented, or quoted for language that did not actually appear in them. The database currently has over 850 entries for the United States--and those are only reported cases. Undoubtedly, there have been many others. Most but hardly all of the offenders in the U.S. cases in Charlotin's database are pro se litigants. An alarming number of lawyers seem to be submitting legal documents written by AI without bothering to check whether the material the AI cites is represented accurately, quoted accurately, or even exists.

That is obviously extremely problematic. Perhaps there will come a day when AI is good enough to be trusted to replace lawyers, but it is nowhere near there now. I mentioned above that despite Claude's fumble of my sovereign immunity question, the other chatbots got it right, but even Westlaw--which is the least likely to hallucinate because it operates in a closed universe of legal materials--somewhat overstated the import of several key cases and also devoted about half of its answer to irrelevant material (discussing abrogation, not just waiver, for interested experts).

To be clear, I am not saying that AI can't be used responsibly in legal practice. What I am saying is that, unless and until it becomes more reliable, it might not be worth the investment. To explain why, I want to lean on a point that Henry David Thoreau made in Walden: that when one takes account of all the labor that ostensibly labor-saving devices require, they may end up requiring more labor, not less. Writing in the mid-19th century, of course Thoreau was not discussing AI, but the point generalizes. Here's his argument about the then-relatively-new technology of the passenger railroad:

I have learned that the swiftest traveller is he that goes afoot. I say to my friend, Suppose we try who will get there first. The distance is thirty miles; the fare ninety cents. That is almost a day’s wages. I remember when wages were sixty cents a day for laborers on this very road. Well, I start now on foot, and get there before night; I have travelled at that rate by the week together. You will in the mean while have earned your fare, and arrive there some time to-morrow, or possibly this evening, if you are lucky enough to get a job in season. Instead of going to Fitchburg, you will be working here the greater part of the day. And so, if the railroad reached round the world, I think that I should keep ahead of you; and as for seeing the country and getting experience of that kind, I should have to cut your acquaintance altogether. 

Whether Thoreau was right about the railroad even in his day is debatable. Walking thirty miles per day is no mean feat. And while it's true that one can see the country and have interesting experiences while walking, one might meet interesting people or read a good book or take a much-needed nap while riding the train. But whatever Thoreau thought about this particular example, surely there are instances in which his point holds--especially with respect to marginal technological improvements.

I ask readers who work in large organizations (like a corporation, law firm, government agency, or university) to reflect on the last time the organization "upgraded" its accounting software or switched from one platform to another. Think of all the additional time spent learning the new system, waiting to have questions answered, and re-doing whatever submission you made thinking you had finally gotten it right only to have the system reject it for missing something you don't understand. Those hours count on the cost side, as does the time spent by IT professionals learning the system and the money spent by the organization to license the new system. If time is money, Thoreau's point is that equally, money is time. So a full accounting of the time saved could end up being negative. I wouldn't say that such changes are never justified. I would say that there's never a guarantee they will be justified.

Now suppose you've got a brief due and want to turn to generative AI to cut down on the time it takes to write it. It will certainly take less time for the AI to do the relevant legal research and write a first draft of the brief than it would take you. But now let's add up the costs in dollars and time on the other side.

Start with the cost of using the AI. You might use a free general purpose chatbot, but it will be more error-prone than one for which you pay, so you or your firm will probably want to pay for a subscription. Even the free AI chatbots aren't really free. There's the negative externality of their power usage, although that's admittedly not a cost that the individual lawyer will bear, except as a member of the general public. Likewise for the harm AI causes the authors and artists whose work it exploits without compensation.

But even if we put the monetary cost and negative externalities aside, I wonder whether the time one needs to spend checking a chatbot's work and editing its prose doesn't eat up much or even all of the time-savings that turning to it in the first place supposedly afforded. It's true that a competent lawyer would cite-check a brief written without AI assistance too, but that's a shorter process. If you've written a brief, you might have inadvertently gotten some pin cites wrong, but you won't often have gotten the holdings of cases wrong, much less made up cases.

The point isn't that hallucinated cases or quotations are difficult to spot. If you actually look up every citation, hallucinations are incredibly easy to spot. The point is rather that because you can't trust the chatbot to have cited real authority for the actual points that authority supports, you can't trust the argument the chatbot constructs. And that means that you can't trust the brief. So have you really saved any time? Or have you wasted time on a false start?

Walden is sometimes read as prescribing asceticism. It is true that some passages lend themselves to that understanding, but read sympathetically, Thoreau is better understood as advocating deliberation about what constitutes genuine progress and improvement. His promotion of simplicity, frugality, and connection to nature as antidotes to unthinking consumerism is, if anything, more timely now than in his day. So too is his skepticism about new technologies, which, he argued, should be adopted only after an honest and thorough assessment of their costs and benefits.

One need not be an AI doomer to worry that AI's adoption in any field could be more costly than beneficial. That worry strikes me as warranted for at least some of the uses to which it is being put by lawyers. There is no excuse for a lawyer ending up on Charlotin's database, but even much responsible AI use may not be worth the effort--at least not yet.

--Michael C. Dorf