Hacker News

by Ryan Harman

AI outperforms law professors in Stanford Law study (law.stanford.edu)

368 points by berlianta 17 hours ago

https://law.stanford.edu/wp-content/uploads/2026/06/salinas_...

godelski 15 hours ago

I find this study quite suspect. I'd have to dive deeper but there's definitely significant alarm bells that should be going off for anyone reading.

Figure 2 (page 6) screams problems. There's only 16 professors (3k comparisons each?!?!) and the professors are all over the place. That's very high variance, suggesting the study has no meaningful statistical power. Poor instructor 16 can't catch a break lol

There's also really clear bias given that the main results only feature Google models. Other models show up elsewhere, why not there?

I'm no lawyer, but I'm a pretty competent statistician and can confidently say this paper has a smell to it. I can't call it bullshit, but there are red flags all over

NuclearPM 2 minutes ago

> confidently say this paper has a smell to it. I can't call it bullshit, but there are red flags all over

You can confidently say that you are unsure?

volkercraig 2 hours ago

More than that, the entire structure of the study is pointless. They set up as a question/response and then had humans rate the response. That's literally what LLM's are trained to do, which ultimately is convincing a human to click the "I like this one better" button on it's response.

dcre an hour ago

They're only good at it because that's what they're good at? Come on.

FromTheFirstIn 43 minutes ago

gguncth 8 hours ago

Sure, but in two years AI has gone from “impressive tool, but not a replacement for knowledge workers” to “the study where it beats our highest caliber of knowledge workers may have some methodological deficits.” In another two years it’s going to be curtains.

wouldbecouldbe 7 hours ago

The issue is, it almost always outperforms knowledge workers.

IF the right questions are asked, and IF steered into and corrected at a few crucial points. IF not it goes off in the wrong direction really quick and that's a problem that's still mostly unsolved in the last 2 years.

And that can be catastrophic in high risk environments, like legal, medical or high risk software products where being wrong in the wrong place can mean bankruptcy or even cost a life.

I help run a few marketing websites where I let the CEO's run crazy with Claude cowork, they are making PR's like a madman, but they are not allowed to touch any of the API's & platforms where there is real user data & sensitive information.

goolz 7 hours ago

spider-mario 5 hours ago

pezgrande 6 hours ago

nonethewiser 4 hours ago

ambicapter 2 hours ago

Autopilots have been able to land planes for years (decades?), and yet they still don't land passengers planes at any increased rate.

amelius 7 hours ago

> Sure, but in two years AI has gone from “impressive tool, but not a replacement for knowledge workers” to “the study where it beats our highest caliber of knowledge workers may have some methodological deficits.”

With that kind of logic ... anything is possible.

AlecSchueler 7 hours ago

> the study where it beats our highest caliber of knowledge workers may have some methodological deficits

The point is that if the study can't validate the claims being made then we can't actually extrapolate from that claim. What you're predicting may or may come true, but the study (which is the topic at hand) isn't useful for supporting the assertion.

bobro 3 hours ago

>the study where it beats our highest caliber of knowledge workers may have some methodological deficits.

That isn’t even remotely what this study is looking at.

taco_emoji 3 hours ago

I will never trust an AI as much as a person

Forgeties79 8 hours ago

Assuming it keeps improving at the same rate, which I think we are already seeing not play out. If you compare the first six months when GPT truly hit the mainstream to the previous six months, the improvements are not nearly as evident. That isn’t to say they aren’t noticeable, I could definitely tell it’s improving, but not nearly at the pace it once was.

There’s also the fact that they can’t possibly keep improving frontier models at the same rate (I.e. training investment) when investment starts slowing down. The amount of cash being burned is completely unsustainable and you’re already seeing some pullback.

kenjackson 3 hours ago

nopurpose 7 hours ago

Hfuffzehn 5 hours ago

byzantinegene 7 hours ago

adampunk 36 minutes ago

skywhopper 6 hours ago

Your “some methodological deficits” is doing a lot of work.

0xDEAFBEAD 4 hours ago

internet_points 6 hours ago

"the study that claims it beats our highest caliber of knowledge workers has methodological deficits" ftfy

so extrapolating from that, in another two years it will continue to bamboozle

Paracompact 11 hours ago

Independent of whether it has any meaning (because the entire paper might be a bit iffy), I find it curious that Instructors 3 and 8 have the lowest harmfulness rates, quite a bit lower than even the LLMs, but not the highest preference rates. Harmfulness anticorrelates with preference, but not perfectly. Some amount of charisma appears to be a factor even in selections by professionals?

godelski an hour ago

Yeah it's difficult to interpret.

One possible interpretation, the statements were very bland. These would be very low harm but also not very informative

RataNova 7 hours ago

This is exactly why I'd be cautious about interpreting the preference metric too strongly

esquivalience 10 hours ago

I think your 3k figure comes from here - It is explained:

> As judges, the professors then completed 2,918 blinded, forced-choice comparisons (median per judge: 200), each time indicating which of the two anonymized responses, from the instructor or the LLM, they would rather give to a student

IshKebab 9 hours ago

So did were the answers fact checked? If not that seems like a pretty obvious flaw!

epolanski 5 hours ago

saidnooneever an hour ago

more and more i see papers. interview 8 ppl, draw conclusions based on their expert opinions. AI and Cybersecurity are full of this.

Even saw some where they just slapped interviews + protocol into chatgpt as 'methodology' to extract the results -_-. Peer reviewed and published.

hungryhobbit an hour ago

People don't always have the resources to conduct massive "proper" studies. We live in the real world, and have to settle for what studies people can conduct.

Not saying we should take such studies as the "gospel truth" ... but if you ignore them and only consider "proper" studies, you'll be waiting a very long time to learn anything new.

runarberg 40 minutes ago

dragonwriter 2 hours ago

> There's also really clear bias given that the main results only feature Google models.

The main results also don’t seem to know what a “model” is, as the two “models” it refers to are “stock Gemini 2.5 Pro” and “a retrieval-augmented version of NotebookLM”.

One of which is a model, and the other of which is an interface backed by different models depending on exactly when the analysis was performed.

giancarlostoro 2 hours ago

I never get the same answer from any two lawyers. I hate law as a result. With developers you might get disagreements based on experience, but there's usually a strong consensus on specific things, with lawyers and courts its all over the flipping place. I wouldn't be surprised if LLMs can "pass" on paper (ie college exams) but in practice, they might 'struggle' in different courts.

...On the other hand, if an LLM has access to every transcript of every case a Judge has overseen, they might have an unfair advantage in any case... Hmmm...

This all assuming the AI lawyer doesn't hallucinate and start referencing cases that don't exist.

vlan121 3 hours ago

Reversly viewed ones should ask with what intend the study should be like this. And for obvious reasons it sounds like monetary-nature.

skywhopper 6 hours ago

I find it entirely likely that the preference for the AI generated answers is entirely due to the confidence of its assertions. Given the numbers of evaluations each prof had to do, there’s no way they researched the answers thoroughly. But if there’s one thing we all know LLMs can do well, it’s to generate text that sounds extremely confident. And that signal is appealing in choosing which of two statements you’d give to students.

ALittleLight 13 hours ago

The paper says the professors have a median of 200 comparisons each. It also says they only used 2 models because using more models would require more comparisons and they selected Google models because Google was branded/advertised as being education focused. When you see other models show up elsewhere, that's because they extended the main idea to other models but using LLMs to judge instead of human professors.

godelski 13 hours ago

Sure, but the biggest problem is they have no statistical significance. Variance is too high. How do you distinguish the signal from the noise? Confidence intervals aren't enough.

But is it a surprise law professors aren't great statisticians?

Certhas 9 hours ago

runarberg 37 minutes ago

I think it is more likely that they selected Gemini because the lead author is a fellow at an institute which receives a lot of their funding from Google.

RataNova 7 hours ago

Agreed. The study might show something useful, but the headline is doing a lot of work.

jstummbillig 9 hours ago

But does it really matter? It seems fairly obvious that AI is going to outperform professors. While the studies run, there are three more model releases that change the calculus entirely. I wonder how much we are learning with these studies about what is going on.

greggoB 9 hours ago

> I wonder how much we are learning with these studies about what is going on.

So your alternative is to not have any studies and everyone can just stump up anecdata as "evidence" for the capabilities of these models?

jstummbillig 7 hours ago

master-lincoln 8 hours ago

it sounds like you are saying science doesn't matter but your feelings do

suddenlybananas 9 hours ago

Does it matter if a study is fraudulent or incompetent? Yes.

zeristor 9 hours ago

That is the assumed narrative; however it shouldn’t bias any evidence.

runarberg 13 hours ago

The study was conducted by Stanford’s HAI institute, which receives heavy funding from Google (how much I couldn’t find because they don‘t publish their donations in a place I could find it; but I suspect it is alot). And the authors did not declare a non-conflict of interest at the end of the paper.

keeda 11 hours ago

Wait, where are you seeing the link to HAI? TFA mentions something called "liftlab" which seems to be something under Stanford Law School and separate from HAI. The study has more than a dozen authors from as many different universities but HAI is not mentioned.

tomjakubowski 10 hours ago

runarberg 11 minutes ago

pezgrande 6 hours ago

Do papers need a "non-conflict of interest" disclosure nowadays to not be considered just ads?

runarberg an hour ago

net01 5 hours ago

The HAI is also funded with money from OpenAI, Antropic, and other big tech corporations. I don't know what you are trying to prove.

scotty79 4 hours ago

> That's very high variance

Do you doubt that educational value of a law professor can vary from 0 to somewhat reasonable? You are not studying screws here.

philipwhiuk 6 hours ago

This is the bit I'm suspicious of:

> They calibrated AI responses to match the length and structure of human answers

which I would guess removes AI's hallucinations and errors somewhat.

causal 16 hours ago

As a software engineer I have some intuition for what the risks are of letting agents do some tasks vs others.

I don't have a similar intuition calibrated for what could go wrong when asking AI to draft a legal document. Some things seem harmless, i.e. drafting a will, but I don't really know- our legal system is notoriously rife with footguns.

qingcharles 11 hours ago

I've used general purpose LLM AI (e.g. run-of-the-mill Claude, GPT etc) heavily to draft legal documents. The biggest trap is the hallucinated citation. It will easily insert an absolutely authentic sounding quotation from another case that perfectly proves the point you are trying to make, then it'll make up an authentic name for it, e.g. United States v. Shenzhou Electronics Inc or whatever. You can get really comfortable after checking its output a few times and getting no false citations, and then BAM, it'll put three in the next motion it writes.

Any lawyer who isn't using LLMs for research is behind the curve, though. They are unbelievable at finding niche cases you would never have found on your own. Previously it was a lot of exact search term matching, which is inherently useless for a lot of legal research. I need something that can search on vaguer terms, which AI can do incredibly well. Just check the results. I'm sure the LLMs from Lexis Nexis/Westlaw are probably better than the general purpose ones.

LLMs make fantastic paralegals. If you're doing any legal work, you should be using it, even if it's just to shoot ideas at. Have it play devil's advocate. My friend always has it play the other party's lawyer to see what all the counter-arguments are going to be.

Just like you would with software development. If you care about what you are creating, CHECK THE OUTPUT.

em500 10 hours ago

> The biggest trap is the hallucinated citation. It will easily insert an absolutely authentic sounding quotation from another case that perfectly proves the point you are trying to make, then it'll make up an authentic name for it, e.g. United States v. Shenzhou Electronics Inc or whatever.

Naive question from an outsider: aren't there searchable databases of cases (with complete text) so that citations could be checked automatically, either by the same or an independent agent?

timpera 9 hours ago

mxkopy 9 hours ago

thenickdude 7 hours ago

>The biggest trap is the hallucinated citation

The "biggest problem" being the one thing that is trivial to verify against concrete databases is a bit convenient don't you think?

I think it's more likely that it makes mistakes evenly but the one thing that you are able to check with certainty is the only place you discover the errors.

FeepingCreature 6 hours ago

lawtalkinghuman 6 hours ago

Just because the citation exists, what the LLM says it stands for and what it actually stands for are not the same.

For testing, I've asked (admittedly last-gen) LLMs to generate legal opinions regarding issues in commercial English civil litigation, and I received back cases where the citation is real, but the area of law (family law) is not relevant as family courts apply a very different set of procedural rules.

(If you squint a bit, they sometimes might be relevant... and could be useful for a particularly creative litigator to make a novel argument on behalf of a very risk tolerant client. But you would very much want to go read those cases and think quite hard about them.)

eunos 10 hours ago

Seems companies like Thomson Reuters or other legal services have incentive to build LLM with RAG over legal cases texts and robust hallucinations detection on reference

skinfaxi 4 hours ago

Chatgpt regularly hallucinates entire cases whole cloth or fabricates an entirely different fact pattern for a given case. Perplexity does much better at citing its sources and providing accurate quotes, at least in my experience.

RataNova 7 hours ago

I think the paralegal analogy is right, but with one important difference: a human paralegal usually knows when they are unsure, or at least can be trained to flag uncertainty

BartjeD 10 hours ago

A legal professional can be personally liable for not finding the most recent case-law.

The knowledge cut off gap means the models sometimes don't know about the most recent case-law, in a given situation.

I've seent his happen multiple times now. Accountants and legal professionals advising clients based on outdated information assembled through chat-gtp, claude and copilot.

Professionals drafting letters and missing recent case-law which handles their exact case. It's unreliable.So it can save you some work; but it can't save you all of the work. And in some cases its mistakes really force you to redo all the work, and more, to be thorough and have confidence in the result.

lukan 7 hours ago

timpera 9 hours ago

thewebguyd 16 hours ago

I think this is probably true for most skilled professions. AI is best used in the hands of folks already knowledgeable in the skills/professions they are using it for.

I liken it to me googling things as a sysadmin vs. Jane from accounting doing it. The non-tech end user is far more likely to make the problem worse, or install something sketchy from the ad riddled results than I am, or one of my help desk employees are.

I wouldn't trust myself to draft an important legal document using AI without the advice of a lawyer, much like I wouldn't really want to rely on my lawyer to use AI to write code for me.

godelski 15 hours ago

  > I think this is probably true for most skilled professions.

I agree, BUT I also find that it's easy for experts to atrophy quickly. When the AI is right 80/90% of the time it lulls you into over confidence.

I find those that are best and make the greatest use are the ones who remain skeptical but also use the tool. The same people who were already nuanced and picky before AI. The same people who already doubted and questioned their own work, and used that suspicion to help prevent them from having over confidence in their own work. If you weren't willing to just "lgtm" with your own code, it's difficult to do that with AI.

(To be clear, I'm not saying perfectionists. Some might call them that because the picky people have higher standards, but a good expert has to also understand that perfection doesn't exist. That's often a driving force in the suspicion! This also tends to cause them to continually improve)

stult 15 hours ago

bluefirebrand an hour ago

vonunov 19 minutes ago

> sysadmin

Another domain where LLMs are very effective at confidently leading people down a messy path. I have a roommate using LLMs to guide him through setting up some ollama stuff in my WSL (I happen to have the half-decent GPU here) and after multiple rounds of the bot trying to get him to do things that were redundant if not in the wrong direction entirely (and vaguely insulting as a matter of course), I had to write "ground truths" along these lines, and probably more as I find them:

  We are using systemd. ~/.bashrc or similar dotfiles should not be used to start services/processes automatically. Do not "sudo" anything in ~/.bashrc.

[Yes, it did that]

  A systemd service should be created for any processes/services that need to run automatically and persistently. The current output of `systemctl list-unit-files | grep enabled` is available at [ . . . ]  

  sshd is already enabled + running and listening on 0.0.0.0:22 and [::]:22. ~/.ssh perms are already 700 and ~/.ssh/authorized_keys perms are already 600. Public key authentication is already enabled in sshd and ~/.ssh/authorized_keys already contains pubkeys ENDING as follows: . . . 

  tailscaled is already enabled + running; the tailscale address for [host] is [addr]

  It is not necessary to fix connectivity to any 192.168.0.0/16 ; tailscale interface should be used for any traffic to [host] or other hosts involved in the project; hosts/nodes lacking tailscale interface should be assigned one

[roommate + bot spent 45 minutes on trying to configure their way through NAT when not having to do that is almost the entire point of tailscale. It was just (essentially) like, "You're absolutely right. We have tailscale set up, so we don't need to be able to ssh to that other interface at all. Not troubleshooting that would have saved 45 whole minutes. Oh well, now what?"]

Maybe it's just me, but I'm not inclined to trust the judgment of something that can't keep this kind of thing straight, which I know is to some degree a matter of having all the needed info in the context window. But maybe it would be able to do that if it didn't waste tokens telling me to cd into the same directory that I'm already in every 2 minutes, or chmod .ssh/ again, or (when it really needs to burn some tokens) blow away the .venv and pull a bunch of modules again just to "start clean".

ChrisMarshallNY 16 hours ago

> I wouldn't really want to rely on my lawyer to use AI to write code for me.

Yet that is exactly what a lot of C-Suiters (many of whom are lawyers), are doing.

xiaoyu2006 15 hours ago

tiahura 5 hours ago

zuzululu 16 hours ago

im not so sure

i think devs overestimate their own role and underestimate others

i am seeing lawyers and doctors roll out their own software with AI

but we dont have their training and experience

thatcat 15 hours ago

causal 7 hours ago

stackghost 16 hours ago

It's like that in engineering, for sure. My background is in aerospace and there are lots of things that a reasonably technically-inclined random can probably do passably. It takes an engineer to know which tasks those are, though.

I would imagine it's similar in law, in that it takes a lawyer or judge to know where the foot guns lie.

ubercore 4 hours ago

stult 15 hours ago

IME so far (as both a lawyer and a software engineer), LLM error rates when drafting code and legal documents are reasonably comparable, but it's more problematic in the legal context because legal documents do not benefit from many of the structural safeguards available for code. For legal documents, there are no automated tests, no static typing, no test environments, no logging/observability instrumentation, no sandboxing.

The time lag between drafting and "deployment" also makes for much less effective, much more expensive debugging loops. You can deploy your code to prod in seconds, see an error pop up in the logs, and immediately start debugging. But it will take at a minimum days and frequently as long as several years before an error in a contract or a court filing will be detected, and often the error is beyond correction at that point. Thus, the errors are both more difficult to detect and to resolve.

And the consequences of error are often much greater, both because they are not correctable and because a legal error may risk someone's life, liberty, or substantial property. Although that's not categorically the case, obviously bugs in certain safety critical systems can be as bad or even worse than legal mistakes. But in general, most software is lower stakes than most legal writing.

On the flip side, LLMs do seem to do a better job with basic style and structure for legal documents compared to code. Things like following IRAC format, citing assertions of law (although hallucination remains an issue), and writing comprehensible sentences. These would be the equivalents in code to best practices like good comments, cohesion, consistent use of design patterns, test coverage, clear variable names, DRY, etc. Although the better performance on those more qualitative metrics may just be because even the longest legal documents are typically simpler in structure and have fewer lines of text than a large, complex codebase. Or maybe it's because LLMs are trained on natural language text more than on code. Or because natural language is more forgiving than code, in that minor variation in diction or grammar is unlikely to have any significant effect on how the document is interpreted, whereas even single character errors in code can have enormous effects.

Otterly99 4 hours ago

There is also one thing I would like to add, and you can correct me if you disagree: coding benefits much more from thorough planning. Now, I exclusively work by first writing a plan that has well-defined steps and goals, which can of course change over time.

It seems to me like it would be more difficult to achieve with legal documents and, in my experience at least, writing a concrete plan has been the decisive factor that make my AI coding robust (plus all that you mentionned).

Hfuffzehn 5 hours ago

This is a very good comment. But notice how even in software engineering there is still disagreement about these structural safeguards.

So yes, we can say the LLM created bad code when it does not compile or fails prewritten tests.

But experts might disagree what good comments, good cohesion, appropriate use of design patterns, appropriate test coverage or clear variable names are.

So what are we suppossed to train the LLMs towards? Somebody still has to decide what "good" is.

causal 5 hours ago

Hidden gem of a comment, thanks for writing

calvinmorrison 15 hours ago

Well this is largely the fault of law itself. especially english style law. A legal, parseable code, in which not every single tiny municipality (some less than 1 square mile) has their own set of rules and laws, not all published or available - but which citizens are expected to abide by of course - how could we expect AI to do well and not some typical TV southern lawyer who knows the judge?

stult 14 hours ago

Merad 14 hours ago

> Some things seem harmless, i.e. drafting a will

Absolutely not harmless if you're the executor of an estate forced to deal with a screwed up AI will. I just handler my dad's estate this spring. It's a frustrating and confusing process even with the simplest of estates.

hparadiz 5 hours ago

I recently had to file to become an estate admin with no will at all. And it was literally cheaper for me to fly 3000 miles to do it in person than it was to pay a lawyer. Because lawyers are frankly greedy scumbags half the time. They don't offer an appropriate cost for the service..instead the conversation immediately goes to "how much" money is in the accounts and suddenly they want a percentage of your father's estate for filing two pieces of paper.

And in my experience if you do actually pay a lawyer for something they will act like you're not worth their time and will literally role their eyes at you when you're trying to explain the minor details of a case because they are too lazy to listen and zone in like I would when doing my job.

b40d-48b2-979e 14 hours ago

Most people don't have anything that could even be called an "estate".

jcranmer 13 hours ago

acdha 13 hours ago

nocoiner 13 hours ago

toss1 13 hours ago

_heimdall 15 hours ago

I wouldn't consider drafting a will to be harmless. If its done poorly the next of kin could have to deal with a huge headache and potentially months or years of probate proceedings.

grogenaut 13 hours ago

I had a very well crafted will from my parents, one of whom was a very good lawyer hiring other good lawyers. It was still a pain in the ass for many of the reasons they were trying to make it easy for us.

One thing I learned, just bite the bullet and re-write the whole fucking will instead of making riders.

Piecing the will together from riders was terrible. Al the clauses fell away everyone got older. The final will could have been 8 pretty clear pages.

The other part that is hard is just knowing all of the things that happen with assets and a passing. Luckily we had another lawyer and financial folks to advise us. It was still a lot and not that easy to find details. This was pre-ai that would have helped walk through his shit.

onlyrealcuzzo 6 hours ago

As someone who's been sued frivolously...

Believe it or not...

A lot can go wrong if you have real life human lawyers draft a legal document.

rayiner 16 hours ago

I would think that LLMs would be better at avoiding foot-guns. That’s a situation where you have a list of well known rules and potential pit falls, and the work of the lawyer is to apply those to a fact pattern. That’s something that has been hard to automate programmatically, because the fact patterns are similar but different. LLMs, however, seem to excel at applying general principles to differing fact patterns.

atmavatar 15 hours ago

Instead, the LLMs create entirely new foot guns like citing non-existent cases. You can't go more than a week without encountering another news report of a lawyer submitting an AI-generated legal brief rife with bogus case citations, which even includes briefs submitted to state supreme courts.

e.g., https://www.npr.org/2026/04/03/nx-s1-5761454/penalties-stack...

HappMacDonald 16 hours ago

I would categorize this in the "expertise that people internalize but never figure out how to verbalize" department, and that is a department we have no way to teach an LLM because if nobody is writing out those unspoken, subconscious rules then the LLM has nothing to read about them in its training data.

visarga 9 hours ago

tomjakubowski 10 hours ago

galaxyLogic 14 hours ago

goodmythical 15 hours ago

I don't know the source off hand, but I've seen llms hallucinating case citations in order to "prove" their premises.

can't get more foot gun than "well according to [fiction] it is a well established practice (that the defendent is guilty)"

dylan604 16 hours ago

But can an LLM come up with questions like what the definition of is is? Seems to me there's a lot of "depends on how you read it" type of stuff that lawyers excel at finding novel interpretations. So what coders thinking of as rules are much less straight forward to understand when it comes to laws

rayiner 15 hours ago

xmcp123 14 hours ago

I think that's actually a perfect analogy to AI writing code. Drafting a will seems like not a big deal, until that will is accepted as "good enough" and is then in court and under fire.

teiferer 11 hours ago

> drafting a will

Such a document may not make a difference to the person that eventually will have died, but it can make or break the life of generations to come in countries that are so heavily optimized for dynasty building like the US.

RataNova 7 hours ago

I think that's the right intuition. Legal AI feels especially dangerous because the output can look competent while hiding jurisdiction-specific footguns

conception 13 hours ago

This is why I can’t see how college grads are going to survive the AI apocalypse. domain experts driving LLMs are super powerful because they can spot where they make mistakes. Juniors don’t have that insight and the LLMs then cost them productivity.

geraneum 12 hours ago

> domain experts driving LLMs are super powerful because they can spot where they make mistakes

I don’t know if that’ll be true for long. I just had my colleague who’s a very competent engineer IMO hand me a frontier model vibed PR to review (after reviewing it himself, he claims) which contained random variable assignments, conditionals that do nothing, etc. He’d never do such a thing before. People become too comfortable and get confirmation bias as well.

knollimar 16 hours ago

I'm afraid since claude cheats in benches, what will it do with law?

datsci_est_2015 15 hours ago

Hmm, what’s the law equivalent of using docker to bypass sudo?

knollimar 9 hours ago

godelski 15 hours ago

Cheat.

Or worse, use historical data to determine the laws of today.

dgellow 10 hours ago

The same in every other domains. It’s happening now, not in a future tense

15155 6 hours ago

> drafting a will

Tell me you've never been the executor of an estate in the United States without telling me.

hparadiz 5 hours ago

I think going through this process has made me uniquely qualified to write one.

prpl 16 hours ago

there’s really no limit to how many times and ways you can review something with AI, except dollars.

Boss0565 16 hours ago

cannot IMAGINE letting ai write my will rn.

jay_kyburz 16 hours ago

I imagine it's really hard to spot a comma in the wrong place, or a missing sentence in a 10 page contract unless you wrote it yourself, or you assembled it from some battle tested templates.

pojzon 11 hours ago

To give you some example of what can happen if you use AI in legal battle you can look at Valve vs Rothchild case [1].

TL;DR Its never a good idea and it will bite you.

1. https://finance.yahoo.com/news/valve-wins-trial-against-pate...

aristofun 6 hours ago

In general it is not surprising. Even if this particular study is bad.

There are certain areas of law work that are about analyzing large amounts of texts, drawing conclusions and writing other texts based on that and nothing more. That is literally the bread of LLMs.

Those types of lawyers should be the first in line for unemployment, not programmers, not even close.

alansaber 4 hours ago

"That is literally the bread of LLMs." correct. However, programming has a large number of advantages RE LLM use compare to law:

You can execute the logic, and set up loops from the output. You can set up more useful RL. It's easier to generate synthetic training data. It naturally supports tool use and agent parallelism. It's easier to integrate with APIs (with what few APIs the court systems provide). Programming explicitly encodes abstractions at the function, module levels etc that are easier to KG/reason/build upon than text chunks.

iterance an hour ago

Just because it is theoretically the bread and butter of LLMs does not mean LLMs are capable of doing the job. It still needs to be proven, setting prior beliefs aside. Law is a life-critical system and deserves our highest level of scrutiny.

nickburns 4 hours ago

'Bread *and butter'. The English expression requires the second part—but otherwise fits perfectly in your well-stated point, with which I wholeheartedly agree.

Source: AAL.

aristofun 3 hours ago

Thank you! As a non native speaker I was not sure if “and butter” is a mandatory part but didn’t want (nor had time) to llm the comment for the sake of authenticity :) TIL

nickburns 3 hours ago

conartist6 5 hours ago

I see the same problem with AI in both programming and law though.

AI is like a scab on a wound: it's a temporary filler, it rushes in to fill a void, but it's not going to be the final solution.

Models showed us that there was huuuge unmet demand for literacy, both in software and in law. But now we have a choice to either address the systemic causes of the unmet demand, or just try to paper over them with layers and layers of AI scab.

bluefirebrand 2 hours ago

> But now we have a choice to either address the systemic causes of the unmet demand, or just try to paper over them with layers and layers of AI scab.

Yeah, but in my experience it won't come down to "which is the better solution" but "which is cheaper/easier"

So I look forward to lots of layers of papered over AI scabs in the future. It won't be cheaper in the long run, but it will pump someone's quarterly numbers enough that they get a promotion before the problem they introduce come back to them

NoboruWataya 4 hours ago

These are academics. Not to disparage them or their work at all but it is very different to the transaction or litigation work that is done in BigLaw. It is a lot more focused on analysing and summarising existing texts, which are themselves more easily available for LLMs to train on (statutes, case law, legal journals, textbooks). As such it is probably the easiest legal work to LLM-ify but also the least valuable, because I assume law professors aren't getting paid nearly as much as BigLaw lawyers. So this approach won't scale. Not to say AI won't crack BigLaw but it will be a different challenge.

scotty79 4 hours ago

LLMs answered student questions of the top of their heads, without any refresher look into the case law. And systems that were primied with the case law like NotebookLM underperformed when compared to baseline LLMs that you'd as anything about anything.

It's not about what LLMs can or are suited to do. This study shows strengths of what's already in them, innately.

epolanski 5 hours ago

The more I see the evolution, the more it looks to me that any knowledge workers is going to be impacted.

streetfighter64 5 hours ago

> analyzing large amounts of texts, drawing conclusions and writing other texts based on that and nothing more

The same could be said about programming. Or if you want to be even more reductive, looking at a screen and pressing buttons to make the correct lights light up https://xkcd.com/722/

aristofun 3 hours ago

Philosophically or metaphorically speaking - yes.

But in my comment it is literally what some subset of lawyers do.

Literally is much more tangible and risky in terms of real impact on employment etc.

finnborge 14 hours ago

I understand why the conversation on this article looks like it does, but the study is specifically focused on the potential for LLMs to operate as tutors for law students. I enjoy the extrapolation out to whether LLMs will replace lawyers, but did not find that to be discussed in the study itself.

In the framing of using LLMs as legal tutors, with the implication of lowering the cost of legal training, this seems like a socially-positive outcome. Furthermore, it feels kind of intuitive to me that any contemporary system operating with an LLM and access to legal reference material will be prepared to answer _student-originated questions_ comprehensively and with breadcrumbs or direct references to educational/source materials, as seems to have been found in the study.

The authors explicitly and intentionally emphasize that many legal questions require contextualization, as opposed to some discrete calculated answer. The result of the study implies that the LLM-based systems were capable of using what many of us here understand to be the "stochastic best-fit algorithmic generation" of a contemporary language model to adequately contextualize a student's question, providing insight into the trade-offs or complications implicit in the question, while then, critically, _meeting the professional standards of legal educators in explaining that complexity to a student_.

Realistically, I would hope this provides some confidence to readers of HN that they can actually ask a legal question to an LLM and expect the response will explain the complexity of the law in relation to the question. This is great news, and is likely the minimal pre-work any of us should do before actually consulting a lawyer, if time permits.

On the other hand, I do _not_ think that this study provides any indication that an LLM is prepared to actually provide direct legal counsel. Possibly in the same way that a legal textbook does not replace legal counsel, or perhaps more accurately, the same way that stumbling upon a legal case study for approximately the same situation you're in doesn't guarantee you'll have the same result.

scotty79 4 hours ago

> On the other hand, I do _not_ think that this study provides any indication that an LLM is prepared to actually provide direct legal counsel

I think it indicates that LLMs are smart enough to be used in the context of law education.

quantisan 15 hours ago

I'm surprised Stanford Law would go along with this over-reaching press release title. How about "For common first-year contracts-law questions, law professors preferred AI-generated answers to professor-generated answers"

mchl-mumo 11 hours ago

The revised title is spot on. It's odd to me how academics are trying to sound like top research labs' CEOs trying to pump valuations by overreaching claims.

goodcanadian 5 hours ago

It is rarely the academics writing the press release. It is even rarer that the author of the press release chooses the title.

chewbacha 16 hours ago

My best guess is that Gemini was trained on the textbooks that the questions are meant to test against, thus they are probably better at explicit recall of those questions or related questions.

This is a pretty limited introductory course based on what it says in the methods of the paper itself.

runarberg 15 hours ago

That and the research is done by Stanford’s HAI institute with an obvious bias and the paper is curiously missing a conflict of interest statement.

EDIT: just found out that Google is a major donor to HAI. So this research is at least partially funded by Google. Which is probably the reason the authors fail to declare no conflict of interest.

ulrischa 7 hours ago

By its very nature, the field of law is ideally suited for AI language models. Fundamentally, everything is based on interconnected texts. I believe that even larger waves of layoffs could loom here than in the IT sector. However, it is likely that a more powerful lobby will be at work here—one that will grossly inflate the perceived value of their work and shield it from outside intrusion.

grosswait 6 hours ago

He who makes the rules…..makes the rules.

tiahura 5 hours ago

As a lawyer, I think your intuition is right re llms. Law is the wordplay that llms thrive at.

However the waves are starting and they ARE going to be huge. Corporate clients are insisting on AI. They don’t want to pay an associate hours to draft anything to be reviewed by a partner. They want top partner to use AI and just proofread.

applicative 15 hours ago

What the LLM cannot do is explain why it said what it said, when cross-examined. It simply hallucinates the best account of why someone would have said such a thing as it said, same as it can give a probable account of why someone else said something different. The question 'But why did you say this not that ...?' does not lead it to make explicit its grounds for what it said, but just to make a new more complicated statement.

U4E4 14 hours ago

This is true in the naive case.

There are however LLM context building techniques that anchor completions in data structures that persist the structure of claims that support the conclusion contained in a completion. Lots of different patterns exist —organizing logic in language is a rich domain— but the one I’ve liked the most is something called a Claim Dependency Graph that models the relationships between atomic claims as graph edges.

There’s a whole suite of operations you can perform on these structures, and “reconstruct how you came to this conclusion” is absolutely one of them.

mdlman 13 hours ago

I’d love to read more about these type of patterns. Do you have any recommendations?

snk 3 hours ago

xattt 15 hours ago

A human has a motive that exists that frames the thought being expressed. An LLM is going to be creating a “de novo” thought in response to a line of questioning.

Paradigma11 5 hours ago

Psychology has shown that a lot of those motives are just post hoc narratives, similar to LLM.

ashdksnndck 15 hours ago

Same is probably true of humans. In a conversation, we often respond from instinct, then work backwards to a rationalization only when asked. For more considered thoughts, if we’re lucky, we can remember our “reasoning traces” but that’s as deep as our introspection goes. Unless we’re neuroscientists, we don’t even know how many neurons we have, let alone have any understanding of how they generate our thoughts. Motivated reasoning impairs our introspection further, and then dishonesty and communication errors prevent us from relaying the limited remaining information to each other.

Model interpretability work has advanced a lot. Arguably we already can explain AI decision-making better than human brains.

applicative 15 hours ago

No, it happens in the immediate context, where e.g. we say 'No I meant Meredith Jones, not Meredith Smith'- and the possibility of this elaboration is actually part of ordinary communication. I did mean Meredith Jones, not Meredith Smith - thus the use of the past tense The LLM will just give the best answer for what one might have meant, completely reopening calculation.

The point is familiar but there are good illustrations in the Atlantic article by a book editor. At first it seems abstract AI hate, but then she gets to the details. AI text cannot be edited. https://www.theatlantic.com/technology/2026/05/how-to-tell-a... or https://archive.ph/YJsGK

BDPW 14 hours ago

Nonsense, some of my friends are lawyers and they're able to give you consistent interpretations on why they think about a certain aspect of a law a certain way. The whole thing is that they work with this the entire time, so they have a really consistent 'head model' of how things work and why and how considerations should be weighted/ordered/whatever. LLMs just do not have this, there's no consistent underlying reasoning (the 'reasoning' traces in LLMs are really inconsistent)

j45 15 hours ago

LLMs hallucinate, because humans hallucinate.

Asking the LLM in a way where it annotates its sources, it can greatly increase the pattern matching to closely simulate logic, just like in humans.

I understand the question of why did you say this, not that, I have seen other ways of asking that which do not seem to trigger the LLMs over-response in the other direction.

latentsea 15 hours ago

Humans hallucinate because they take shrooms or have schizophrenia.

applicative 15 hours ago

No, the hallucination of its reasons follows immediately from the technique of probabilistic inference. You can see this in real time, just ask 'why did you use this word, not that word?' It is in the position of a desperate liar. All its responses are essentially 'rationalizations'

damnesian 2 hours ago

Does the "outperforming" conclusion incorporate the appropriateness of decisions? Or just if things are technically correct. Without human eyes on cases, things could easily get very off track. AI can do a lot of data wrangling, but there is no conscience.

TrackerFF 7 hours ago

In many (most?) countries you can defend yourself, waive your court appointed attorney. You are of course highly discouraged to do so. But sometimes people do it, mostly for smaller claims where they don't want to rack up legal bills for things which might cost more than what is at stake.

But, it makes me wonder, will clients be able to use these AI-attorney systems in the future, in the court. Where they basically either just parrot what the model is instructing them to do, or - I dunno - give the model permission to speak for them (while waiving liabilities).

I have no doubt that some complex AI system can perform better than a bottom-tier, overworked lawyer.

bonesss 7 hours ago

Pro se litigants are hyper vulnerable to LLM hallucinations.

One wrong advice clump and, like a step onto the wrong path while hiking, all subsequent steps go in the wrong direction. And sycophancy tuning means marginal one-sides takes get presented as sure-fire things.

I’m of the opinion that the big wins aren’t in using the LLMs to do the work (legal, in this case), but rather to refine and improve the dialog and presentation from all parties. A court-centric LLM that could give likely procedural needs to a litigant, and a law-firm-centric LLM could help a pro se litigant create a meaningful and refined set of questions for lawyer consideration, condensed and targeted, saving all parties time and confusion while meeting the clients linguistic needs ‘where they are’.

All the lawyers know things LLMs never will, the law is interpreted, and the written part isn’t engineering grade facts but suggestions interpreted in context. Arguably this is a racket and a thin veneer of plausible deniability for authoritarian rule. But as the law stands even with federal statues and citations from the courts website, practicing lawyers will frequently end up explaining that in this county/country/court/jurisdiction The Way of Things is different.

TrackerFF 6 hours ago

I think it could work for some things. Years before LLMs became capable of doing anything substantial, people were selling "legal services" via websites where people could dispute trivial stuff like parking tickets, and what have you in the small courts.

Those services were usually just based on NLP + simple decision trees, and people actually won their cases.

Of course, doing huge corporate contract disputes, IP disputes, M&A, and whatever will probably be out of question for a good while. Same with more serious criminal cases where the stakes are very high.

But I think there's potential for automating away less serious cases, especially where there's good structure.

And of course, it all depends on what kind of legal system one is situated in. Immediately I'd think that Civil Law would be easier for AI lawyers, as its inherent structure is a better fit for machine reasoning. So I'd expect to see more AI products start in Civil Law countries.

15155 6 hours ago

> Arguably this is a racket and a thin veneer of plausible deniability for authoritarian rule.

The fact that Lexis and WestLaw have such an iron grip on the entirety of the US legal system is exactly why general LLMs are completely unequipped to be useful in this domain.

rockskon 14 hours ago

I do question at what point AI could be useful as a teaching aid.

The quality of LLMs depends heavily on, among other things, how you word your questions.

Knowing the correct questions to ask is not something most students know how to do given that it tends to require a fair bit of pre-existing domain knowledge.

Danox 2 hours ago

Sure it does AI multiple IPOs incoming...

piker 7 hours ago

Having been a law student and practicing lawyer, it's clear to me that law professors aren't really representative of much if any part of private practice. Most of the things they think and reason about are quite theoretical and academic, and it doesn't surprise me that the models would regurgitate a more average response which most human graders would prefer.

That's the entire point, though!

The legal academy is supposed to have outlying opinions on things and present novel philosophical answers to questions. (And questions to answers!) So in addition to the statistical arguments against this paper made elsewhere, to me it doesn't real much new information.

mchl-mumo 11 hours ago

16 is such a small number for what they phrase as an important finding. It really couldn't be much harder to coordinate with 100+ professors.

galaxyLogic 15 hours ago

I'm going to need some legal help for my startup. But I can't pay much. So I figured I will ask AI all relevant questions, as well as forms filled etc. Perhaps even create a patent-application for me.

THEN I find a human lawyer and give AI's answers to them and say "Can you find any errors in this? Can you improve it?" .

That way I think my legal bills should be smaller because the AI has already done most of the work. What do you think? Which LLM is best for legal work?

apparent 13 hours ago

I think that within a few years, most lawyers will expect that clients will have run contracts through an LLM prior to sending them to outside counsel. Emails will be along the lines of:

Please see attached contract we received from [counterparty]. ChatGPT says blah, blah and blah should be revised. What do you think? Is there anything else that we should change?

galaxyLogic 12 hours ago

Right. That will reduce workload for the lawyers. But will their fees then go down? I'm kinda worried that if I don't give them the LLM produced legal docs for review they will just use the LLM themselves and then charge me for the work the LLM did :-)

It's bit like with doctors, you'll want a second opinion, if you can afford it.

apparent 11 hours ago

dlahoda 15 hours ago

i use codex to do initial research and draft texts (in typst). i use files-output skill so that all research contexts are rendered into files md files.

i do second phase on codex, by asking to download all pdfs and extract all text of laws it references. can repeat fully local research step.

after i ask gemini to find issues and criticize.

UPDATE: there many legal skills on github to try, not used so any yet

galaxyLogic 14 hours ago

Are you a lawyer yourself?

KnuthIsGod 15 hours ago

In the hands of a domain expert, AI is useful. In the hands of the naive, it is a foot gun.

I killed my Arch installation and was stuck at the GRUB prompt.Unwilling to brush up my rusty knowledge of GRUB syntax, I asked Gemini for help. The commands Gemini suggested would have wiped my hd...

Once Gemini was told that I was using BTRFS, the suggestion from Gemini looked a bit more sane, but still looked incorrect to me.

It was only after I informed Gemini that I was using a NMVE with BTRFS that it finally produced a sane command.

throw7 16 hours ago

Oh, a "Human-Cented" study by AI lover:

Julian Nyarko

    Professor of Law
    Co-Chair Stanford Law AI Initiative
    Senior Fellow, Stanford Institute for Human-Cented AI (HAI)

LOL!

songting591 4 hours ago

The interesting shift isn't whether AI beats law professors on tests â it's what happens to the value chain after that threshold is crossed.

When AI clears the knowledge bar in a domain, the remaining moat becomes trust, accountability, and local regulatory context. That's actually good news for niche SaaS builders targeting specific jurisdictions: the generic AI layer commoditizes, but the "AI + local compliance + human accountability" bundle still has real pricing power.

Curious whether anyone has seen this play out already in contract review or compliance tooling outside the US.

weatherlite 9 hours ago

It is important for society to understand it is not merely programmers and customer support who are at risk of losing their jobs. Clearly A.I can do much more than just program.

epicureanideal 14 hours ago

One way to make legal services more affordable and accessible would be to put the burden of ensuring the AI legal services are accurate on a private-public partnership with the government.

If a person using the service is given inaccurate legal advice and acts on that advice, the person can't be charged with a crime, can't be given any civil penalties, etc., as long as the law in question is non-obvious.

Obviously if by some exploit, some fundamentally obvious crime (murder, theft, obvious fraud, etc.) is said to be legal, that wouldn't apply, but of course the service should try to prevent those kinds of exploits anyway.

Could limit this to something like business regulations to begin with, or even specifically for small businesses, or contracts within some time limit and dollar amount that would otherwise be coverable by small claims court, etc.

motbus3 7 hours ago

As others pointed. It kind implies it surpasses professors, but reading more carefully it seems more like the mythos situation. There was a single professor or test that it surpasses.

Reading it makes me extremely suspicious on how cherry picked this was

aitchnyu 7 hours ago

Tangential, is there a "test suite/CI" for AI writing legal documents? Long back in terms of AI progress, a lawyer filed something with hallucinated sources. Do new tools prevent this?

RataNova 8 hours ago

I'd read this less as "AI replaces law professors" and more as "AI may be a surprisingly strong first-pass tutor, especially when the student knows enough to question it"

elnatro 11 hours ago

When I see news pieces like this I wonder about the failures. Maybe the failure percentage is low but what happens if a bot gives bad counseling? Who is responsible then?

Attorneys will be using LLMs for convenience but they will not disappear, because there needs to be an ultimately human responsible of the decisions.

francisdavey 4 hours ago

I'm not a law lecturer. I spend most of my time wrangling contracts and advising about data law. But I did a stint of part-time work teaching a masters in law.

My experience then (this was back before "Attention Is All You Need", I hadn't met the output of generative models) was that students tended to produce work that did not have a proper thread of reasoning in it. There was a tendency to repeat things they had read but rehashed in various ways.

Reviewing some of their texts it was clear that much of the writing - by law tutors - was of the same kind. Much was incorrect. The fact that someone at some time had said a particular case was a proposition for something, meant that got repeated from book to book. Many authors simply didn't read their sources or check their references. Students repeated what they had been told incuriously.

Note: this was a graduate level course. Not wet about the ears undergraduates.

The worst material was little potted notes produced for law students. Utterly awful material in most cases.

Anyway, when LLM's became a thing, a lot of what did not feel right about their output and many of their error patterns, reminded me of the experience of teaching masters' students.

One of the saving graces of English court room practice (when I did that sort of thing) was that judges would say to you "where does it say that?" in a case you cited. You had better have them all at your fingertips and know exactly where you had cited. That avoided a lot of hallucination.

Just a random remark which might be of interest.

scotty79 4 hours ago

I'm curious what would be your take on the productions of this year's models.

eichi_uehara 15 hours ago

I beat lawyers twice before generative AI even existed. Recently I asked Gemini a few questions about personal conflicts in everyday life. It's often too conservative, with views too shallow for the problem. So I still handle human conflicts myself. I only outsource the templated stuff like routine chat replies or marketing copy though it saves me huge amount of time. People who quote AI in serious conflicts are too weak to handle them on their own.

IFC_LLC 5 hours ago

This is exactly what LLM designed to do. Double up a lot of data and find connections and patterns in it.

So no wonder on this point.

One thing I want to mention: Law != Justice.

So while LLMs are awesome at the law study they will suck at justice. Just because one has to solve very emotional problems with it at times. And LLMs are not that good at finding the correct emotion.

coldtea 5 hours ago

Also because their reasoning is just a statistical model of whatever they've been fed. No experience of pain, humility, human connection, etc in this.

dguest 7 hours ago

I'm not a lawyer, I program.

My understanding is that Civil Law (most of the world excluding UK, US, AU) is like a program: you feed it a situation, it outputs a decision, every once in a while you edit it.

Common Law (UK, US) isn't really a program, but you could stretch and say it's a state machine that has been running since the country started. Every interaction sets a new precedent and changes the state. But the programming analogy falls apart because no one in the right mind would design such a program.

LLMs might actually be the best example of such a program though: Common Law is basically one long chat with an LLM, hundreds of years long.

Before LLMs came along, a Common Law system seemed to have a finite time limit before it's co-opted by wealthy people with the resources to read the whole history. Now I think maybe can push it a bit further.

But it's still a terrible program.

airstrike 16 hours ago

Yes, LLMs are great at search. That's not news.

gaiagraphia 6 hours ago

Isn't "getting greater" the more accurate representation, though?

In 'critical' industries, the error rate is massively important, and if the quality of search is reaching an acceptable error rate, that's quite big news.

Aperocky 15 hours ago

> rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups.

That's the problem, you never know when the 25% deliver a true stink bomb, and that's not considering prompting - while a fair prompt/question maybe considered objective, it's very easy to stray.

Esophagus4 16 hours ago

Yeah this could be interesting. A lot of the spotlight has been on “law firm stuff” like demand letters and writing contracts…

But imagine if a dev team didn’t have to go engineer -> product manager -> legal team to get a question answered on local data retention requirements. You could ship that much faster.

ares623 16 hours ago

Would you take responsibility for missing details about local data retention requirements?

zuzululu 16 hours ago

honestly if you just avoid EU and China

you can get away with anything

jedberg 16 hours ago

Esophagus4 16 hours ago

Yes.

If the only purpose of asking a lawyer is transferring risk (aka cover your ass) while getting the same advice as an LLM, that’s slowing down delivery for purely bureaucratic reasons.

I’ve seen that mentality at big companies where everyone is scared to stick their neck out and be accountable for a decision. And nothing gets done. Drives me crazy.

But the people who move up are the people who take ownership and get shit done (and are right a lot).

(BTW, I have been at companies that were sued by regulators. They never really punish the individual(s) who were in the room when the decision is made. So your worry is kind of misplaced.)

tipsytoad 9 hours ago

Curious how they do a “blind” preference test. To any evaluator I’m sure it’s quite clear which answer is AI vs human.

himata4113 10 hours ago

There is quite a simple solution for many of the problems described in the comments: Make drafting legal papers a defined interface.

If you think about it and extract sematics of any law you get something that looks familiar, sort of like code. Of course there's some complexities where certain phrases can mean different things, but legal papers in a way are written like they're programming languages already especially when it comes to law.

First we would have to define a language that can handle ambigious operations and we alread y have this with programatic proofs where n should land in x. So in the end I'd assume it would look something like this in a two party dispute:

This is very simplified and pseudo like language, writing out a full contract would be as long as a real contract.

     DEFINE DEFENDANT "A Corp"
     DEFINE PLAINTIFF "B Corp"
     DEFINE CONTRACT  CONTRACT(PLAINTIFF, DEFENDANT, 3054-41-95)

     // attaching extracted requirements, definitions and obligations of contract

     FACT   PLAINTIFF delivered(goods) ON 7054-34-99
     FACT   DEFENDANT paid(0) OF CONTRACT.amount

     CLAIM  breach WHEN obligation(DEFENDANT, "pay") IS NOT satisfied

     PROVE breach:                                                                                                                                                                  
         REQUIRE  PLAINTIFF performed                                                                                                                                               
         REQUIRE  DEFENDANT.paid < CONTRACT.amount                                                                                                                                  
         ASSERT   delay WITHIN reasonable(time)

     IF PROVE(breach):
         AWARD PLAINTIFF (CONTRACT.amount - DEFENDANT.paid) + interest()
     ELSE:
         DISMISS

Then you would run a proof based LLM to generate it into target language and since we already had an example of this from one of the AI labs we know it works. Automatic citations and supporting proof would be automatically populated from reviewed legal -> DSL extracted papers as supporting evidence.

I am sure that many AI labs are working on something similar already and we will see something like that in the near future as proof based llms evolve.

vessenes 13 hours ago

* Gemini 2.5 Pro (no outside resources), and * NotebookLM (not versioned -- with added legal resources).

NotebookLM was considered slightly better than 2.5 Pro by the evaluators.

u1hcw9nx 5 hours ago

After quick look of study details and statistics, it does not look very definitive in one way or another.

I mean, LLM's do OK with tutoring, but it depends more of how unique the questions are, not how difficult they are.

wilg 16 hours ago

> In a blind evaluation of nearly 3,000 anonymized comparisons, professors rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups.

75% win rate seems pretty good!

Paper link: https://law.stanford.edu/wp-content/uploads/2026/06/salinas_...

causal 16 hours ago

I wonder to what degree the AI was just better at communicating. My experience with attorneys is that they are often some of the worst writers.

applicative 15 hours ago

The writing is always fluid and grammatically flawless. This carries much more weight with us than we believe. I know the illusion well from decades of grading college papers. Many of the highest quality students use English as a second language, and I know this, but an American well trained in writing, grammar, spelling always gives an impression of superiority. (Being well trained in writing, grammar, spelling etc is of course high merit, which is how the illusion forms - it is basically an illusion of global 'intelligence')

falcor84 16 hours ago

Yeah, 75% win rate is a ~200 points Elo difference, which is quite massive.

jshier 16 hours ago

I do wish they'd used some more objective criteria. Simply being preferable one of the things LLMs have trained for since the beginning, hence its sycophantic nature.

adornKey 8 hours ago

Maybe sycophantic nature is a good fit for the legal system. A successful lawyer once told me that the most important thing is to know your judge. Objectivity isn't a big thing in court. They'll cite random newspaper articles as evidence and throw out expert opinions - if they like. There might be a way to appeal - but that road often is not functional.

wilg 16 hours ago

What criteria would you use for judging legal arguments?

mitkebes 16 hours ago

mylifeandtimes 16 hours ago

teiferer 11 hours ago

Question is: if a legal question is answered incorrectly by an LLM, who is going to be held responsible?

king_zee 16 hours ago

I think there will be a market for firms that aggressively market themselves as non-AI, and then as more people turn towards that human connection we'll go full circle

rayiner 16 hours ago

Nobody wants to pay their lawyers more than they have to. There will be a huge market for firms that can use AI to avoid charging clients for $1,000/hour junior associates.

zuzululu 16 hours ago

that worked out for artists and translators right ?

citizenpaul 16 hours ago

If you want human connection the legal system is not where you are going to find it, period.

I don't think there will be any such market for "non ai" law. If I'm involved with the legal system I just want out as quick as possible as cheap as possible.

applfanboysbgon 16 hours ago

Bad legal advice will keep you dealing with the legal system for much longer and at much greater cost. Something being cheap and quick upfront doesn't mean it will be cheap and quick by the end of the process.

citizenpaul 13 minutes ago

Esophagus4 16 hours ago

zuzululu 14 hours ago

atleastoptimal 11 hours ago

And this was done with Gemini 2.5

By the time any research study is done on AI is published the models are already 0.5-1 generation ahead. Even this bullish outcome for AI models and their ability to perform useful work does not reflect how good they are now.

iLoveOncall 7 hours ago

The title of the study "Law Professors Prefer AI Over Peer Answers" is VERY different from the title on HackerNews. This is completely clickbait at this point.

lp4v4n 8 hours ago

Honestly it's not surprising that AI provided answers that were flagged less often as "pedagogically harmful" if we take in account that somehow LLMs create an "average" of all knowledge they ingested.

gaiagraphia 16 hours ago

Incredible that the common people will be able to wrestle the right to rule of law away from the bloated legal caste, who have built themselves quite the moat.

The inaccessibility of justice is a huge driver of inequality. Any tools which bridge this gap will help make a more just society.

hparadiz 5 hours ago

The profession is walking into a court room 90 minutes late because you know the judge's work pattern then going "hey Mike, how are the kids" after 22 years in the same jurisdiction. Then they old boys haggle based on how much the lawyer is charging. You are basically paying for access to the social club. Better outcomes when part of the in-group of course.

gaiagraphia 4 hours ago

Would like to plot attitudes to AI against parental incomes or inheritance. If your value derives from having contacts and access to gatekept materials, rather than pure technical expertise, you've got a lot to lose as the walls come crumbling down.

There was another thread about the impact of AI on maths, and one of the arguments was about peer review... Made me wonder whether the writer was more concerned about the established order and gates being upset, or whether there's actually a valid technical criticism.

homeonthemtn 16 hours ago

Personally I think this is very good. One of the hardest things out there is maintaining a society in the face of changing times and it's because law is dense and slow.

I think, in the right hands, this could be huge.

wholinator2 16 hours ago

It turns out everybody has at least one right hand, even the people we trust the least.

gamblor956 12 hours ago

While they provided the questions that professors and LLMs were asked to respond to, they don't include any of the answers from either the humans or the LLMs, so there's no way to independently verify that the LLMs actually returned "better" answers.

Given the number of responses the professors were asked to rate (200 each), they probably graded them the same way that bar exam responses are graded: quickly and superficially. Not surprising that LLMs achieved higher scores in this scenario, since they excel at producing superficially nice answers that don't hold up under scrutiny.

Also...unless statistics has changed in the past 2 decades, the math in the charts doesn't math. That's probably why they're leaving out the actual numerical data. I also wouldn't be surprised if we learn in the coming days that the charts were AI generated.

Eufrat 9 hours ago

What is the point of this conclusion? That law professors like the tone and verbosity of AI slop? Okay?

Leptonmaniac 7 hours ago

I had a similar thought. What if the result, statistical and significance critique aside, mostly means that when it comes to first-year tutoring of law students, the vibe, tone and overall presentation of arguments weighs a lot, maybe even more than the factual arguments themselves?

In such a framing I don't find it surprising at all that teachers prefer the more polished answers generated by AI, because if LLMs are good at one thing, it is being confident in whatever they generate and present it convincingly.

tj_hustler_1966 3 hours ago

interesting

Thaxll 16 hours ago

AI will never convince a jury though.

jojobas 16 hours ago

A couple of acting classes might be cheaper than a lawyer, then you can go all out representing yourself.

xyzal 9 hours ago

This contradicts my anecdata.

Recently, I tasked Opus 4.6 to study a new Czech building permit law in conjunction with some waste disposal regulations and the result was disappointing. The model could not stop drawing conclusions from obsolete regulations in its training dataset, even when given the fulltext of the new law. The usual "you are totally right" also applied and its conclusions were most of the time obviously wrong even to a human with cursory knowledge of the subject.

I ended with studying the relevant regulations myself over the weekend.

cess11 10 hours ago

I skimmed portions of the study but didn't manage to figure out whether this actually measures a preference for confident mediocrity.

t0lo 16 hours ago

Library outperforms student... more news at 9

lern_too_spel 12 hours ago

This was an open book test. The real problem with this study is that winning the most head-to-head preference tests is not the right metric. It doesn't much matter if two answers are right, and one is written a little better than the other. It matters quite a lot if one answer is right and another is wrong.

The authors point out that this other metric was computed in prior work and incorrectly dismiss it as being not as good as winning percentage in head to head competitions. The cited prior work shows that the models fare poorly on that metric. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5166938

apparent 13 hours ago

Except the library outperformed the professors, which is quite a bit more impressive.

34981t 17 hours ago

He is basically an AI professor for law. This study just confirms his existence:

https://juliannyarko.com/

Stanford and its donors of course want to replace anyone but its administrators, so they cheer on such anti-intellectual nonsense.

signatoremo 16 hours ago

This is the state of HN. Created new account. Accused without evidence. Emotional clickbait.

vessenes 13 hours ago

I vibe coded hn10k earlier this year. You could choose to see pages with comments only started by 1k+, 10k+ or 100k+ karma contributors. I'm too lazy to keep it up, but I found 1k and 10k both to be better experiences than "vanilla".

flanked-evergl 11 hours ago

...

frwrfwrfeefwf 11 hours ago

they'll embed it in the weights so it can't be jailbroken

rimliu 11 hours ago

Yes yes, the IPO is near.

t0lo 15 hours ago

More great news from the prestigious university where 40% of students claim they are disabled

https://fortune.com/article/rise-in-elite-students-seeking-a...

and where they wanted to ban words such as "chief", "stupid", "karen" and "American"

https://reason.com/2022/12/21/stanford-elimination-harmful-l...

bko 16 hours ago

Marc Andreessen argued that we've already reached AGI. He says that the top AI models give better answers than 99% of people he has access to, and he has access to some of the best people in their field.

I'm getting more convinced. I mean, sure it makes dumb mistakes sometimes but its a particular set of self serving mistakes, commenting out tests in order to pass. We obv don't want this behavior but I wouldn't say it's dumb.

It'll be like the Turing test, which we just blew past years ago and no one cared. After all the hand-wringing about sentience and rights of the AI if it passes the Turing test, and now we just have AI bots running 24/7 writing slop.

How does everyone else feel?

acdha 16 hours ago

> Marc Andreessen argued that we've already reached AGI. He says that the top AI models give better answers than 99% of people he has access to, and he has access to some of the best people in their field.

He stands to make billions if enough people believe him — unless you also do, consider that you’re the mark. For example, if that was true, it would have to mean that AI companies either aren’t letting customers use the good models or are instructing them to frequently make errors which reveal a fundamental lack of reasoning ability.

Consider also that his wealth means he hasn’t had to defend an idea stringently since the 90s. I wouldn’t be surprised if he does think LLMs give deep answers because it often looks that way until you critically review the response and ask questions like what’s missing which require you to have a decent understanding of the problem domain.

threethirtytwo 3 hours ago

And you stand to lose your job and your identity as a programmer.

He makes billions but he already is a billionaire. Gaining billions more doesn't mean shit. The guy really has nothing to lose and the utility of what he gains contribute little to his life style.

I will tell you this. HN has been comically wrong about everything related to AI. They said driverless cars have no chance of becoming useable. Now Tesla FSD is almost there and I sleep in waymo cars. HN said AI will never code, now everyone uses it to code.

It's fucking stupid. This is one of the smartest forums on the internet but HN becomes next to stupid when predicting AI. Why? Because humans can't face the truth. When the victim of attack is yourself, it doesn't matter how smart you are... you have to scaffold a rationalization to spare yourself as the victim. You have to lie to yourself and tell yourself that you matter.

The truth of it is, while LLMs are not the end game, AI in general is on a trajectory to take over. It shows us how meaningless our skills are... not only as programmers but as artists. That beautiful song you felt had greater meaning? It's all reproducible via an algorithm because it never really had a greater meaning. It was just a pattern.

acdha an hour ago

coldtea 5 hours ago

>Marc Andreessen argued that we've already reached AGI. He says that the top AI models give better answers than 99% of people he has access to, and he has access to some of the best people in their field.

He has access to employees and yes-men. What he actually needs to hear, nobody will tell him, AI even less so. Every shit idea he has, would be "what a bright idea"-ed by both everyone around him and AI.

And of course there's the little matter that he makes money and increases his power by selling AI. What seller doesn't promote their stuff as the greatest ever?

moregrist 16 hours ago

Marc Andreessen has a strong financial incentive to feel this way and to convince others to feel this way.

I also think it’s easy to think that AI gives good answers if you don’t know the field well. In fields where I know the material, the answers are pretty variable and can be quite bad.

threethirtytwo 3 hours ago

HNers have strong incentive to feel the opposite. Humanity in general has strong incentive to feel the opposite.

AI is not only replacing programmers, but art and the meaning of being human itself. It's showing us how trivial all of human creation is as it's just patterns from an algorithm.

paulmist 16 hours ago

Knowing the question is half of the answer. LLMs are great at scoping your context and answering precisely what you asked; it's also why they go off the rails when they misunderstand a part of your question. Incidentally, they're great at "knowing" and reaching for knowledge.

Humans have the advantage of perspective. We always lack some knowledge and answer broadly. This is bad if you have a particular goal in mind, but better if you're just generally learning, because you see more and learn to discriminate the correct from the wrong. And most importantly, being wrong is part of human ingenuity - because sometimes we turn something "obviously" wrong into something right.

foolserrandboy 16 hours ago

He would tell you NFTs were AGIs if it might get you to buy them.

scottfalconer 16 hours ago

Getting the right answers is just half of it, you need to know the right questions to ask. I haven't yet seen AI crack that one.

rvz 16 hours ago

Investor with vested interest in AI companies makes claim of reaching "AGI".

He is one of the last people to listen to about AGI. Unless the term "AGI" means something entirely different to him vs to independent researchers vs to CEOs, since the term has become entirely meaningless.

Hacker News

by Ryan Harman

AI outperforms law professors in Stanford Law study (law.stanford.edu)

godelski 15 hours ago [-]

NuclearPM 2 minutes ago [-]

volkercraig 2 hours ago [-]

dcre an hour ago [-]

FromTheFirstIn 43 minutes ago [-]

gguncth 8 hours ago [-]

wouldbecouldbe 7 hours ago [-]

goolz 7 hours ago [-]

spider-mario 5 hours ago [-]

pezgrande 6 hours ago [-]

nonethewiser 4 hours ago [-]

ambicapter 2 hours ago [-]

amelius 7 hours ago [-]

AlecSchueler 7 hours ago [-]

bobro 3 hours ago [-]

taco_emoji 3 hours ago [-]

Forgeties79 8 hours ago [-]

kenjackson 3 hours ago [-]

nopurpose 7 hours ago [-]

Hfuffzehn 5 hours ago [-]

byzantinegene 7 hours ago [-]

adampunk 36 minutes ago [-]

skywhopper 6 hours ago [-]

0xDEAFBEAD 4 hours ago [-]

internet_points 6 hours ago [-]

Paracompact 11 hours ago [-]

godelski an hour ago [-]

RataNova 7 hours ago [-]

esquivalience 10 hours ago [-]

IshKebab 9 hours ago [-]

epolanski 5 hours ago [-]

saidnooneever an hour ago [-]

hungryhobbit an hour ago [-]

runarberg 40 minutes ago [-]

dragonwriter 2 hours ago [-]

giancarlostoro 2 hours ago [-]

vlan121 3 hours ago [-]

skywhopper 6 hours ago [-]

ALittleLight 13 hours ago [-]

godelski 13 hours ago [-]

Certhas 9 hours ago [-]

runarberg 37 minutes ago [-]

RataNova 7 hours ago [-]

jstummbillig 9 hours ago [-]

greggoB 9 hours ago [-]

jstummbillig 7 hours ago [-]

master-lincoln 8 hours ago [-]

suddenlybananas 9 hours ago [-]

zeristor 9 hours ago [-]

runarberg 13 hours ago [-]

keeda 11 hours ago [-]

tomjakubowski 10 hours ago [-]

runarberg 11 minutes ago [-]

pezgrande 6 hours ago [-]

runarberg an hour ago [-]

net01 5 hours ago [-]

scotty79 4 hours ago [-]

philipwhiuk 6 hours ago [-]

causal 16 hours ago [-]

qingcharles 11 hours ago [-]

em500 10 hours ago [-]

timpera 9 hours ago [-]

mxkopy 9 hours ago [-]

thenickdude 7 hours ago [-]

FeepingCreature 6 hours ago [-]

lawtalkinghuman 6 hours ago [-]

eunos 10 hours ago [-]

skinfaxi 4 hours ago [-]

RataNova 7 hours ago [-]

BartjeD 10 hours ago [-]

lukan 7 hours ago [-]

timpera 9 hours ago [-]

thewebguyd 16 hours ago [-]

godelski 15 hours ago [-]

stult 15 hours ago [-]

bluefirebrand an hour ago [-]

vonunov 19 minutes ago [-]

godelski 15 hours ago

NuclearPM 2 minutes ago

volkercraig 2 hours ago

dcre an hour ago

FromTheFirstIn 43 minutes ago

gguncth 8 hours ago

wouldbecouldbe 7 hours ago

goolz 7 hours ago

spider-mario 5 hours ago

pezgrande 6 hours ago

nonethewiser 4 hours ago

ambicapter 2 hours ago

amelius 7 hours ago

AlecSchueler 7 hours ago

bobro 3 hours ago

taco_emoji 3 hours ago

Forgeties79 8 hours ago

kenjackson 3 hours ago

nopurpose 7 hours ago

Hfuffzehn 5 hours ago

byzantinegene 7 hours ago

adampunk 36 minutes ago

skywhopper 6 hours ago

0xDEAFBEAD 4 hours ago

internet_points 6 hours ago

Paracompact 11 hours ago

godelski an hour ago

RataNova 7 hours ago

esquivalience 10 hours ago

IshKebab 9 hours ago

epolanski 5 hours ago

saidnooneever an hour ago

hungryhobbit an hour ago

runarberg 40 minutes ago

dragonwriter 2 hours ago

giancarlostoro 2 hours ago

vlan121 3 hours ago

skywhopper 6 hours ago

ALittleLight 13 hours ago

godelski 13 hours ago

Certhas 9 hours ago

runarberg 37 minutes ago

RataNova 7 hours ago

jstummbillig 9 hours ago

greggoB 9 hours ago

jstummbillig 7 hours ago

master-lincoln 8 hours ago

suddenlybananas 9 hours ago

zeristor 9 hours ago

runarberg 13 hours ago

keeda 11 hours ago

tomjakubowski 10 hours ago

runarberg 11 minutes ago

pezgrande 6 hours ago

runarberg an hour ago

net01 5 hours ago

scotty79 4 hours ago

philipwhiuk 6 hours ago

causal 16 hours ago

qingcharles 11 hours ago

em500 10 hours ago

timpera 9 hours ago

mxkopy 9 hours ago

thenickdude 7 hours ago

FeepingCreature 6 hours ago

lawtalkinghuman 6 hours ago

eunos 10 hours ago

skinfaxi 4 hours ago

RataNova 7 hours ago

BartjeD 10 hours ago

lukan 7 hours ago

timpera 9 hours ago

thewebguyd 16 hours ago

godelski 15 hours ago

stult 15 hours ago

bluefirebrand an hour ago

vonunov 19 minutes ago

ChrisMarshallNY 16 hours ago

xiaoyu2006 15 hours ago

tiahura 5 hours ago