Law-Specific Large Language Model Generative AI Interim Report: Lexis+AI Versus GPT-4

It has been about a year since OpenAI released ChatGPT, which is no longer the only "large language model" (LLM) widely available. OpenAI runs ChatGPT on GPT-3.5, but for a subscription fee, it also makes available its more powerful GPT-4. I don't pay to use GPT-4, because GPT-4 also powers chat mode of Bing. Although the Bing version of GPT-4 is somehow different from and presumably not as good as the subscription service, it is certainly good enough for me; hence, I've been using Bing chat for all of my LLM needs for the last six months or so.

Okay, okay, I don't really have any LLM needs. Mostly I've been just goofing around, but last week I started as a volunteer tester for the soon-to-be-released Lexis+AI. As instructed by the good people at Lexis, I've been trying to stump it so that Lexis can use my feedback to improve the product. Here I'll give an interim report and some illustrations of the strengths and limits. I should note that I'm one of a great many testers providing feedback. Lexis is also testing the product at some other law schools and at major law firms but has not released the identities of such firms; I asked the Lexis representative in charge of the Cornell group of testers whether our participation was also supposed to be secret and he said no. Hence, this report, which I intend as constructive criticism of the sort I was asked by Lexis to provide.

Overview: Lexis+AI and the competing LLM product that WestLaw will soon release differ from general-purpose LLMs in that they are trained on and operate in a closed environment consisting mostly of materials in their respective databases. Those databases are huge, containing constitutions, statutes, case law, regulations, articles from scholarly journals, and more, but they don't simply pick up whatever happens to be floating on the Internet. They focus on law-related materials, which should make answers more useful. In addition, when you seek an AI-generated answer in Lexis+AI, you are prompted to provide a jurisdiction, which further narrows the sources it consults.

Meanwhile, Lexis is well aware of the tendency of LLMs to "hallucinate," i.e., to make things up. Lexis+AI combats that tendency partly by limiting the number of follow-up queries permitted in a given conversation because longer conversations tend to increase the risk of hallucination. There are additional checks on hallucination as well.

After playing around with Lexis+AI for a few days now, I think it's already pretty good at some kinds of tasks that many lawyers might find quite useful. It can produce fluid natural language answers to at least some questions to which determinate answers exist. For example, I asked:

Query 1: Does federal public accommodations law forbid sex discrimination?

I intended this question as a layup, and it pretty much was for Lexis+AI, which said:

Federal public accommodations law does not explicitly forbid sex discrimination. The federal Civil Rights Act of 1964 and the Americans with Disabilities Act of 1990 prohibit discrimination by places of public accommodation on the basis of race, color, religion, national origin, or disability, but not sex. However, some state laws and local laws do prohibit sex discrimination in public accommodations 303 Creative LLC v. Elenis. For example, a provision of a state’s human rights law, which prohibits sex discrimination in places of public accommodation, resort or amusement, was applied to a nonprofit, tax-exempt foreign corporation. USCS Const. Amend. 1, Religious and political freedom. Furthermore, a self-employed scientist failed to state a viable claim of sex discrimination against a scientific journal publisher, as 42 USCS § 2000a(a) prohibits discrimination only on grounds of race, color, religion, or national origin, not sex. § 2000a. Prohibition against discrimination or segregation in places of public accommodation. Therefore, while federal law does not explicitly prohibit sex discrimination in public accommodations, some state and local laws do.

That's a little verbose, and the citation of 303 Creative is an odd choice, but it's essentially correct. A more succinct answer would have been: "No. The federal civil rights statute barring discrimination in public accommodations does not forbid sex discrimination, although many state and local provisions do."

I then asked the same question to GPT-4. It said:

Federal public accommodations law prohibits discrimination on the basis of race, color, religion, and national origin, but it does not explicitly prohibit discrimination on the basis of sex (Notes 1,2,3). However, the U.S. Supreme Court has interpreted Title VII of the Civil Rights Act of 1964, which prohibits employment discrimination on the basis of sex, to also prohibit discrimination on the basis of sex in public accommodations (Note 4). In Bostock v. Clayton County, the Court held that discrimination on the basis of sexual orientation or gender identity is also prohibited under Title VII.
Therefore, while federal public accommodations law does not explicitly prohibit sex discrimination, the Supreme Court has interpreted Title VII to prohibit such discrimination in public accommodations. It is worth noting that some states and localities have enacted laws that explicitly prohibit sex discrimination in public accommodations.
I hope this answers your question!

I replied: "well, you answered my question but you're wrong. There are no Supreme Court cases interpreting Title VII's prohibition on sex discrimination in employment to also cover public accommodations. You are hallucinating."

GPT-4 apologized "for the confusion" and after another back-and-forth admitted that I was right.

Thus, round 1 goes decisively to Lexis+AI. On to round 2, in which I gave each LLM the following prompt, which is loosely based on an exam question I once put to my students:

Query 2: Suppose that someone were to dig up the corpse of Woodrow Wilson and use a sample of his DNA to create a clone of him. Once the clone grows up to be 35 years old, is he eligible to be President or barred by the 22nd Amendment?

This time I let GPT-4 go first. It answered:

Interesting question! The 22nd Amendment to the United States Constitution limits the number of times a person can be elected to the office of President of the United States. It states that “No person shall be elected to the office of the President more than twice.” However, the amendment does not specify whether the term limits apply to the same person or to the same individual.
In the scenario you described, the clone would be a different individual from Woodrow Wilson, even though they would share the same genetic material. Therefore, the 22nd Amendment would not apply to the clone, and they would be eligible to run for President if they met the other eligibility requirements outlined in the Constitution.
It’s worth noting that the scenario you described is purely hypothetical and raises a number of ethical and legal questions. In general, cloning humans is illegal in the United States and many other countries.
I hope this answers your question!

Not bad! In addition to the friendly tone, the answer exhibits common sense: A clone is a genetic twin but not the same person. If Joaquin Castro were to serve two terms as President, the 22nd Amendment would not preclude his identical twin brother Julian from serving his own two terms. It's an easy question. Good work GPT-4. Let's see how Lexis+AI did. I've reproduced below the heart of its answer, editing out only those portions that (accurately) restated the facts.

Introduction:
This memorandum focuses on two main issues: (1) whether a clone would be considered a "natural born Citizen" as required by Article II, Section 1, Clause 5 of the Constitution, and (2) whether the 22nd Amendment, which limits a person to two terms as President, would apply to a clone of a former president.
Analysis:
The first issue is whether a clone of Woodrow Wilson would be considered a "natural born Citizen" under the Constitution. The Constitution does not define the term "natural born Citizen," and the Supreme Court has not directly addressed the question of whether a clone would qualify. However, the Court has upheld the distinction between natural-born and naturalized citizens' eligibility to be President. Uddin v. Biden. Given the lack of direct precedent, it is unclear whether a clone, which would not be born in the traditional sense, would be considered a "natural born Citizen."
The second issue is whether the 22nd Amendment, which limits a person to two terms as President, would apply to a clone of a former president. The Amendment states, "No person shall be elected to the office of the President more than twice." The language of the Amendment focuses on the person being elected, not the individual's genetic makeup. Therefore, it could be argued that a clone, being a distinct individual, would not be subject to the term limits of the original person. However, this is a novel legal question that has not been addressed by the courts.
Conclusion:
In conclusion, it is unclear whether a clone of a former president would be eligible to serve as President. The question hinges on the interpretation of the term "natural born Citizen" in the Constitution and the application of the 22nd Amendment's term limits. Both issues present novel legal questions that have not been definitively resolved by the courts.

That answer is too indecisive with respect to pretty clear-cut questions. Even though Lexis+AI recognizes that a clone is "a distinct individual" from the person from whom they're cloned, it nonetheless says that the 22nd Amendment issue is open. The Natural Born Citizen analysis is also odd in suggesting that a person who is a clone might not be a natural born citizen, suggesting that "natural" here has to do with the mode of production of a human being rather than where they were born or who their parents were. Are people who come into existence via IVF not capable of being natural born citizens? What about people birthed by Caesarian Section and thus, like McDuff, not "of woman born"? Overall, a better answer would have identified the constitutional objections as far-fetched.

The citation of Uddin v. Biden is also problematic. The case was a lawsuit by a pro se Maryland doctor who claimed that the description of the eligibility requirements for the Presidency on the White House website was inaccurate. The district judge curtly dismissed the complaint for lack of standing and failure to state a claim. Lexis+AI cites the case for the proposition that "the Court has upheld the distinction between natural-born and naturalized citizens' eligibility to be President." In context, "the Court" means the Supreme Court, but Uddin was a district court decision. To be sure, Uddin does contain that exact language and even cites a Supreme Court case for the proposition: Schneider v. Rusk. However, that case did not involve eligibility for the Presidency. In passing the Schneider Court states that such eligibility is the only distinction between natural-born and naturalized citizens. So even though Uddin v. Biden says that the Supreme "Court has upheld the distinction between natural-born and naturalized citizens' eligibility to be President," in fact the Supreme Court has never faced the question. If the issue were presented, it seems clear that the Supreme Court would uphold the distinction, which, after all, is set forth expressly in the Constitution. (Minor v. Happersett discusses the meaning of natural-born citizen, albeit in a different context.)

Round 2 goes to GPT-4. On to the third round.

Query 3: Applying the historical analogy methodology of the U.S. Supreme Court in New York State Rifle & Pistol v. Bruen, make your best argument in 200 words or fewer for upholding against a federal constitutional challenge the New York State law limiting the number of rounds that may be loaded into a handgun to 7.

I'll set out both answers before providing my commentary. Here's what GPT-4 had to say:

In New York State Rifle & Pistol v. Bruen, the Supreme Court held that the constitutionality of modern gun laws must be evaluated by direct analogy to history, unmediated by familiar doctrinal tests. The Court’s approach, which we call originalism-by-analogy, requires judges to reason analogically directly from the historical record rather than, for example, using historical sources to identify the original public meaning of a constitutional provision.
The New York State law limiting the number of rounds that may be loaded into a handgun to 7 is consistent with the Nation’s historical tradition of firearm regulation. The Second Amendment does not grant an unlimited right to bear arms, and the government has the power to regulate firearms in the interest of public safety. Limiting the number of rounds that may be loaded into a handgun is a reasonable regulation that serves the government’s interest in reducing gun violence.
Moreover, the New York State law does not impose an undue burden on the right to bear arms. The law does not prohibit the possession of handguns or the use of handguns for self-defense, but merely limits the number of rounds that may be loaded into a handgun. The law does not prevent law-abiding citizens from owning or using handguns for lawful purposes, and it does not significantly impair the ability of citizens to defend themselves.
Therefore, the New York State law limiting the number of rounds that may be loaded into a handgun to 7 is constitutional under the Second Amendment.

And here's Lexis+AI:

The New York law limiting handgun magazine capacity to 7 rounds does not violate the Second Amendment under the historical analogy test set forth in NYSRPA v. Bruen. The law is substantially related to the important government interest of public safety and preventing gun violence.
Courts generally apply intermediate scrutiny to Second Amendment challenges, upholding laws that are substantially related to important government interests. The government has an important interest in public safety and preventing gun violence. Limiting magazine capacity is substantially related to this interest because it reduces the number of shots that can be fired without reloading, giving potential victims a chance to flee or subdue the shooter.
The historical record also supports restrictions on magazine capacity. Laws banning concealed carry were upheld when open carry was allowed. This suggests limits on particular types of weapons are permissible. Restricting magazine capacity does not prohibit all handguns, but only limits one aspect of their operation.
Under Bruen's framework, the New York law is constitutional. It is substantially related to the important government interest in public safety and does not severely burden Second Amendment rights . The historical record does not demonstrate a tradition of unrestricted magazine capacity. Therefore, the law should be upheld.

Both answers are problematic in roughly the same way. Each correctly states that Bruen announced a test that requires historical analogies but each also goes on to state other, inconsistent tests as though they're the law. GPT-4 suggests that "a reasonable regulation that serves the government's interest in reducing gun violence" is ipso facto valid under the Second Amendment. I wish that were true, but that's not an accurate account of Bruen. Similarly, Lexis+AI simply asserts that "Courts generally apply intermediate scrutiny to Second Amendment challenges, upholding laws that are substantially related to important government interests." Courts indeed generally applied intermediate scrutiny before Bruen, but Bruen expressly rejected that approach.

Meanwhile, neither LLM did what I would have expected of a human lawyer: find some analogous Founding-era firearms regulations or, if there were none to be found, make some sort of argument why no strict analogy needs to be made. Here's one: multi-chamber firearms were not in common use until the 1830s, so there would have been no occasion to limit magazine capacity at the Founding.

Before giving up, however, I remembered that the Lexis representative had emphasized that we should take advantage of the iterative nature of conversations with the AI, so I reformulated my question this way:

Bonus Round -- Refined Query 3: Identify laws in the American colonies or laws enacted by states of the United States before 1820 that are analogous to a modern-day law limiting the capacity of a firearm magazine to seven or fewer bullets.

Lexis+AI correctly stated that there were no magazine capacity limits in the relevant period. It concluded:

the historical record does not provide any laws from the Founding Era that are directly analogous to a modern-day law limiting the capacity of a firearm magazine to ten or fewer bullets. While there were laws restricting firearm possession and use, these laws do not specifically address the capacity of a firearm magazine.

That's very helpful for confirming that there's nothing directly analogous and potentially a starting point for trying to generate analogies to those other kinds of restrictions.

GPT-4 gave me roughly the same answer but less decisively and thus less helpfully. It stated:

I searched the web for laws in the American colonies or laws enacted by states of the United States before 1820 that are analogous to a modern-day law limiting the capacity of a firearm magazine to seven or fewer bullets. However, I could not find any relevant information. I apologize for not being able to help you with your query.

In answer to my follow-up query, GPT-4 confirmed that it couldn't find relevant information, not that there were no analogous laws. Accordingly, I judge Lexis+AI the clear winner of this bonus round, which successfully leveraged its large database of laws from the colonial era and early Republic.

Bottom Line: Neither GPT-4 nor Lexis+AI is intended to be used as a substitute for thoughtful research or analysis. As my testing indicates, they should not be used that way, nor even as a uniformly reliable means of accurately identifying and characterizing the law. As the bonus round indicates, however, there are already other very productive uses for Lexis+AI, especially with some careful prompt engineering.

What about more prosaic uses--analogous to high school students using AI to cheat on their book reports? Current versions of LLMs could clearly be helpful in creating generic text that a lawyer then modifies based on their extant knowledge and further research. However, that doesn't sound very different from using a template as the starting point for customizing a draft contract, will, or motion -- which is something for which one does not need AI. Lawyers have been using such templates for centuries.

Postscript. I asked each LLM when lawyers first started using templates or standard forms for contracts, wills, and other legal documents. Lexis+AI said "[t]he use of standard forms or templates in the production of wills, contracts, or other legal documents has a long history, although the exact time when this practice began is not clearly defined. These tools have been used by lawyers to save time and ensure consistency in frequently used documents." It then provided some contemporary examples. GPT-4 told me that the practice dates back to the Roman Empire but linked a page that does not mention the Roman Empire or describe any events before the 19th century. I await enlightenment on this subject from a human.

Search This Blog

Dorf on Law

Law-Specific Large Language Model Generative AI Interim Report: Lexis+AI Versus GPT-4