AI assistance when contributing to the Linux kernel (github.com)

465 points by hmokiguess a day ago

qsort a day ago

Basically the rules are that you can use AI, but you take full responsibility for your commits and code must satisfy the license.

That's... refreshingly normal? Surely something most people acting in good faith can get behind.

pibaker 17 hours ago

I agree this is very sane and boring. What is insane is that they have to state this in the first place.

I am not against AI coding in general. But there are too many people "contributing" AI generated code to open source projects even when they can't understand what's going on in their code just so they can say in their resumes that they contributed to a big open source project once. And when the maintainer call them out they just blame it on the AI coding tools they are using as if they are not opening PRs under their own names. I can't blame any open source maintainer for being at least a little sceptical when it comes to AI generated contributions.

theptip 14 hours ago

I think them stating this very simple policy should also be read as them explicitly not making a more restrictive policy, as some kernel maintainers were proposing.

Applejinx 8 hours ago

matheusmoreira 4 hours ago

On the other hand, it seriously sucks to spend time learning a big codebase and modifying it with care, only to not be given the time of day when you send the patches to the maintainers. Sometimes the reward for this human labor isn't a sincere peer review of the work and a productive back-and-forth to iron out issues before merging, it's to watch one's work languish unnoticed for a long time only for the maintainer to show up after the fact and write his own fix or implementation while giving you a shout out in the commit message if you're lucky.

Can't really blame people for reducing their level of effort. It's very easy to put in a lot of effort and end up with absolutely nothing to show for it. Before AI came along, my realization was that begging the maintainers to implement the features I wanted was the right move. They have all the context and can do it better than us in a fraction of the time it'd take us to do it. Actually cloning someone else's repository and working on it should only be attempted if one is willing to literally fork it and own the project should things go south. Now that we have AI, it's actually possible to easily understand and modify complex codebases, and I simply cannot find the will to blame people for using it to the fullest extent. Getting the AI to maintain the fork is really easy too.

jlarocco 2 hours ago

> I agree this is very sane and boring. What is insane is that they have to state this in the first place.

I don't think it's insane. It seems reasonable that people could disagree about how much attribution and disclosure there should be about AI assistance, or if it's even allowed, etc.

Every document in that `process` directory explains stuff that could be obvious to some people but not others.

cat_plus_plus 2 hours ago

That's a dim view, people also contribute to make projects work for their own needs with hopes to share fixes with others. Like if I make a fix to vLLM to make a model load on particular hardware, I can verify functionality (LLM no longer strays off topic) and local plausibility (global scales are being applied to attention layers), but I can't pretend to understand full math of the overall process and will never have enough time to do so. So, I can be upfront about AI assist and then maintainer can choose to double check, or else if they don't have time, I guess I can just post a PR link on model's huggingface page and tell others with same hardware they can try to cherrypick it.

What's missed is that neither contributors nor maintainers are usually paid for their effort and nobody has standing to demand that they do anything they are not doing already. Don't like a messy vibe coded PR but need functionality? Then clean it up yourself and send improved version for review. Or let it be unmerged. But don't assign work to others you don't employ.

On the other hand, companies like NVIDIA should be publicly taken to task for changing their mind about instruction set for every new GPU and then not supporting them properly in popular inference engines, they certainly have enough money to hire people who will learn vLLM inside out and ensure high quality patches.

lrvick 13 hours ago

It cannot be understated how religiously opposed many in the Linux community are to even a single AI assisted commit landing in the kernel no matter how well reviewed.

Plenty see Torvalds as a traitor for this policy and will never contribute again if any clearly labeled AI generated code is actually allowed to merge.

cinntaile 12 hours ago

Some people are just against change, that's nothing new. If Linus was like them, he would never have started linux in the first place.

sdevonoes 11 hours ago

goatlover 12 hours ago

beAbU 3 hours ago

It cannot be understated how religiously opposed many in the woodworking community are to even a single table saw assisted cut making it's way to a piece of furniture, no matter how well designed.

Plenty see {{some_woodworker}} as a traitor for this policy and will never contribute again if any clearly labeled table saw cuts is actually allowed to be used in furniture making.

agentultra 2 hours ago

Luker88 11 hours ago

Just remember that "reviewed" is not enough to not be considered public domain.

It needs to be modified by a human. No amount of prompting counts, and you can only copyright the modified parts.

Any license on "100% vibecoded" projects can be safely ignored.

I expect litigations in a few years where people argue about how much they can steal and relicense "since it was vibecoded anyway".

shakna 11 hours ago

lrvick 11 hours ago

VorpalWay 10 hours ago

OtomotO 9 hours ago

alfiedotwtf 6 hours ago

martin-t 11 hours ago

oompydoompy74 3 hours ago

I find the strong anti AI sentiment just as annoying as the strong pro AI sentiment. I hope that the extremes can go scream in their own echo chamber soon, so that the rest of us can get back to building and talking about how to make technology useful.

Klonoar 4 hours ago

Reads like a “fuck you and I’ll see you tomorrow” threat.

dxdm 12 hours ago

Sounds dramatic, but it entirely depends on what "many" and "plenty" means in your comment, and who exactly is included. So far, what you wrote can be seen as an expectable level of drama surrounding such projects.

ebbi 13 hours ago

True - on Mastodon there is a very vocal crowd that are against AI in general, and are identifying Linux distros that have AI generated code with the view of boycotting it.

lrvick 11 hours ago

positron26 3 hours ago

What these hardliners are standing for, I have no idea. If the code passes review, we're just arguing about hues of zeros and ones. "AI" is an attribute that type-erases entirely once an engineer pulls out the useful expressions and whips them into shape.

The worst part about all reactionary scares is that, because the behaviors are driven by emotion and feeling as opposed to any intentional course of action, the outcomes are usually counter productive. The current AI scare is exactly what you would want if you are OpenAI. Convince OSS, not to mention "free" software people, to run around dooming and ant milling each other about "AI bad" and pretty soon OSS is a poisonous minefield for any actual open AI, so OSS as a whole just sabotages itself and is mostly out of the fight.

I'm currently in the middle of trying to blow straight past this gatekeepy outer layer of the online discourse. What is a bit frustrating is knowing that while the seed will find the niches and begin spreading through invisible channels, in the visible channels, there's going to be all kinds of knee-jerk pushback from these anti-AI hardliners who can't distinguish between local AI and paying Anthropic for a license to use a computer. Worse, they don't care. The social psychosis of being empowered against some "others" is more important. Either that or they are bots.

And all of this is on top of what I've been saying for over a year. VRAM efficiency will kill the datacenter overspend. Local, online training will make it so that skilled users get better models over time, on their own data. Consultative AI is the future.

I have to remind myself that this entire misstep is a result of a broken information space, late-stage traditional social, filled with people (and "people") who have been programmed for years on performative clap-backs and middling ideas.

So fortunate to have some life before internet perspective to lean back on. My instinct and old-world common sense can see a way out, but it is nonetheless frustrating to watch the online discourse essentially blinding itself while doubling down on all this hand wringing to no end, accomplishing nothing more than burning a few witches and salting their own lands. You couldn't want it any better if you were busy entrenching.

abc123abc123 10 hours ago

Doesn't matter. Linux today is a toy of corporations and stopped being community oriented a long time ago. Community orientation I think these days only exists among the BSD and some fringe linux distributions.

The linux foundation itself, is just one big, woke, leftist mess, with CV-stuffers from corporations in every significant position.

simonask 9 hours ago

oompydoompy74 3 hours ago

I wish everyone could be so rational, well reasoned, and balanced on this subject.

galaxyLogic a day ago

But then if AI output is not under GNU General Public License, how can it become so just because a Linux-developer adds it to the code-base?

jillesvangurp a day ago

AIs are not human and therefore their output is a human authored contribution and only human authored things are covered by copyright. The work might hypothetically infringe on other people's copyright. But such an infringement does not happen until a human decides to create and distribute a work that somehow integrates that generated code or text.

The solution documented here seems very pragmatic. You as a contributor simply state that you are making the contribution and that you are not infringing on other people's work with that contribution under the GPLv2. And you document the fact that you used AI for transparency reasons.

There is a lot of legal murkiness around how training data is handled, and the output of the models. Or even the models themselves. Is something that in no way or shape resembles a copyrighted work (i.e. a model) actually distributing that work? The legal arguments here will probably take a long time to settle but it seems the fair use concept offers a way out here. You might create potentially infringing work with a model that may or may not be covered by fair use. But that would be your decision.

For small contributions to the Linux kernel it would be hard to argue that a passing resemblance of say a for loop in the contribution to some for loop in somebody else's code base would be anything else than coincidence or fair use.

heavyset_go 12 hours ago

nitwit005 21 hours ago

friendzis 14 hours ago

ninjagoo a day ago

Lerc 21 hours ago

mcv 21 hours ago

afro88 a day ago

Same as if a regular person did the same. They are responsible for it. If you're using AI, check the code doesn't violate licenses

rzmmm a day ago

martin-t a day ago

sarchertech a day ago

noosphr a day ago

Tab complete does not produce copyrightable material either. Yet we don't require software to be written in nano.

rpdillon 17 hours ago

Tomte 13 hours ago

There is already lots and lots of non-GPL code in the kernel, under dozens of licenses, see https://raw.githubusercontent.com/Open-Source-Compliance/pac...

As long as everything is GPLv2-compatible it‘s okay.

panzi a day ago

If the output is public domain it's fine as I understand it.

galaxyLogic a day ago

martin-t a day ago

shevy-java a day ago

But why should AI then be attributed if it is merely a tool that is used?

lonelyasacloud 21 hours ago

Having an honesty based tag could be only way to monitor impact or get after a fix in code bases if things go south.

That is at the moment: - Nobody knows for sure what agents might add and their long term effects on codebases.

- It's at best unclear that AI content in a codebase can be reliably determined automatically.

- Even if it's not malicious, at least some of its contributions are likely to be deleterious and pass undetected by human review.

plmpsu a day ago

it makes sense to keep track of what model wrote what code to look for patterns, behaviors, etc.

yrds96 13 hours ago

AI tools can do the entire job from finding the problem, implementing and testing it.

It's different from the regular single purpose static tools.

hgoel 18 hours ago

This is a good point but I'd take it in the opposite direction from the implication, we should document which tools were used in general, it'd be a neat indicator of what people use.

streetfighter64 a day ago

It isn't?

> AI agents MUST NOT add Signed-off-by tags. Only humans can legally certify the Developer Certificate of Origin (DCO).

They mention an Assisted-by tag, but that also contains stuff like "clang-tidy". Surely you're not interpreting that as people "attributing" the work to the linter?

ninjagoo a day ago

  > Signed-Off ...
  > The human submitter is responsible for:
    > Reviewing all AI-generated code
    > Ensuring compliance with licensing requirements
    > Adding their own Signed-off-by tag to certify the DCO
    > Taking full responsibility for the contribution

  > Attribution: ... Contributions should include an Assisted-by tag in the following format:
Responsibility assigned to where it should lie. Expected no less from Torvalds, the progenitor of Linux and Git. No demagoguery, no b*.

I am sure that this was reviewed by attorneys before being published as policy, because of the copyright implications.

Hopefully this will set the trend and provide definitive guidance for a number of Devs that were not only seeing the utility behind ai assistance but also the acrimony from some quarters, causing some fence-sitting.

senko 12 hours ago

> Expected no less from Torvalds

This was written by Sasha Levin referencing a Linux maintainers’ discussion.

sourcegrift 10 hours ago

Of all the documents, this one needed a proper attribution with link to meeting minutes

corbet an hour ago

maxboone 8 hours ago

bsimpson 16 hours ago

Signed-off-by is already a custom/formality that is surely cargo-culted by many first-time/infrequent contributors. It has an air of "the plans were on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard.'" There's no way to assert that every contributor has read a random document declaring what that line means in kernel parlance.

I recently made a kernel contribution. Another contributor took issue with my patch and used it as the impetus for a larger refactor. The refactor was primarily done by a third contributor, but the original objector was strangely insistent on getting the "author" credit. They added our names at the bottom in "Co-developed-by" and "Signed-off-by" tags. The final submission included bits I hadn't seen before. I would have polished it more if I had.

I'm not raising a stink about it because I want the feature to land - it's the whole reason I submitted the first patch. And since it's a refactor of a patch I initially submitted (and "Signed-off-by,") you can make the argument that I signed off on the parts of my code that were incorporated.

But so far as I can tell, there's nothing keeping you from adding "Co-developed-by" and "Signed-off-by Jim-Bob Someguy" to the bottom of your submission. Maybe a lawyer would eventually be mad at you if Jim-Bob said he didn't sign off.

There's no magic pixie dust that gives those incantations legal standing, and nothing that keeps LLMs from adding them unless the LLMs internalize the new AI guidance.

rwmj 13 hours ago

The way you describe it, the developers all did the right thing. You contributed something to the patch, and even if it wasn't in your preferred final form (and it's basically never going to be for a kernel contribution of any significance), you were correctly credited.

If you didn't want to be credited you should have said.

Signed-off-by probably has some legal weight. When you add that to code you are making a clear statement about the origins of the code and that you have legal authority to contribute it - for example, that you asked your company for permission if needed. As far as I know none of this has been tested in court, but it seems reasonable to assume it might be one day.

bsimpson 5 hours ago

zahlman 9 hours ago

sheepscreek 6 hours ago

This is the right way forward for open-source. Correct attribution - by tightening the connection between agents and the humans behind them, and putting the onus on the human to vet the agent output. Thank you Linus.

oytis 10 hours ago

How is one supposed to ensure license compliance while using LLMs which do not (and cannot) attribute sources having contributed to a specific response?

Lapel2742 10 hours ago

> How one is supposed to ensure license compliance while using LLMs which do not (and cannot) attribute sources having contributed to a specific response?

Additionally there seems to be a general problem with LLM output and copyright[1]. At least in Germany. LLM output cannot be copyrighted and the whole legal field seems under-explored.

> This immediately raises the question of who is the author of this work and who owns the rights to it. Various solutions are possible here. It could be the user of the AI alone, or it could be a joint work between the user and the AI programmer. This question will certainly keep copyright experts in the various legal systems busy for some time to come.

It seems that in the long run the kernel license might become unenforceable if LLM output is used?!

[1] https://kpmg-law.de/en/ai-and-copyright-what-is-permitted-wh...

theshrike79 2 hours ago

Either you allow LLM generated + human reviewed code or people start hiding AI use.

...and then people start going "that's AI" on every single piece of code, seeing AI generated code left and right - like normal people claim every other picture, video or piece of text is "AI".

IMO it's a lot better to let people just openly say "this code was generated with AI assistance", but still sign off on it. Because "Your job is to deliver code you have proven to work": https://simonwillison.net/2025/Dec/18/code-proven-to-work/

ipython a day ago

Glad to see the common-sense rule that only humans can be held accountable for code generated by AI agents.

agentultra 2 hours ago

How do the reviewers feel about this? Hopefully it won't result in them being overwhelmed with PRs. There used to be a kind of "natural limit" to error rates in our code given how much we could produce at once and our risk tolerance for approving changes. Given empirical studies on informal code review which demonstrate how ineffective it is at preventing errors... it seems like we're gearing up to aim a fire-hose of code at people who are ill-prepared to review code at these new volumes.

How long until people get exhausted with the new volume of code review and start "trusting" the LLMs more without sufficient review, I wonder?

I don't envy Linus in his position... hopefully this approach will work out well for the team.

rao-v 2 hours ago

A phenomenon I can not explain is the fact that this simple clean statement of a fairly obvious approach to AI assistance somehow took this long and Linus to state so cleanly.

Are there other popular repos with effectively this policy stated as neatly that I’ve missed?

phillipcarter 36 minutes ago

bonzini 2 hours ago

The wording might be more or less lawyerly but the idea is fairly common, e.g. https://openinfra.org/legal/ai-policy (OpenStack).

sarchertech a day ago

This does nothing to shield Linux from responsibility for infringing code.

This is essentially like a retail store saying the supplier is responsible for eliminating all traces of THC from their hemp when they know that isn’t a reasonable request to make.

It’s a foreseeable consequence. You don’t get to grant yourself immunity from liability like this.

zarzavat 11 hours ago

Shield from what exactly? The Linux kernel is not a legal entity. It's a collection of contributions from various contributors. There is the Linux Foundation but they do not own Linux.

If Linux were to contain 3rd party copyrighted code the legal entity at risk of being sued would be... Linux users, which given how widely deployed Linux is is basically everyone on Earth, and all large companies.

Linux development is funded by large companies with big legal departments. It's safe to say that nobody is going to be picking this legal fight any time soon.

sarchertech 2 hours ago

The Linux DCO system was designed to shield Linus and the Linux foundation from copyright and patent infringement liability, so they were certainly worried that it was a possibility.

However, there is no legal precedent that says that because contributors sign a DCO and retain copyright, the Linux Foundation is not liable. The entire concept is unproven.

Large company legal departments aren’t a shield against this kind of thing. Patent trolls routinely go after huge companies and smaller companies routinely sue much larger ones over copyright infringement.

lukeify 10 hours ago

An open-source project receiving open-source contributions from (often anonymous) volunteers is not even close to analogous to a storefront selling products with a consumer guarantee they are backing on the basis of their supply chain.

SirHumphrey a day ago

Quite a lot of companies use and release AI written code, are they all liable?

sarchertech a day ago

1. Almost definitely if discovered

2. Infringement in closed source code isn’t as likely to be discovered

3. OpenAI and Anthropic enterprise agreements agree to indemnify (pay for damages essentially) companies for copyright issues.

theshrike79 2 hours ago

nitwit005 21 hours ago

Yep, and honestly it's going to come up with things other than lawsuits.

I've worked at a company that was asked as part of a merger to scan for code copied from open source. That ended up being a major issue for the merger. People had copied various C headers around in odd places, and indeed stolen an odd bit of telnet code. We had to go clean it up.

LtWorf 9 hours ago

testing22321 17 hours ago

> This does nothing to shield Linux from responsibility for infringing code.

It’s no worse than non-AI assisted code.

I could easily copy-paste proprietary code, sign my name that it’s not and that it complies with the GPL and submit it.

At the end of the day, it just comes down to a lying human.

sarchertech an hour ago

That’s the difference. In practice a human has to commit fraud to do this.

But a human just using an LLM to generate code will do it accidentally. The difference is that regurgitation of training text is a documented failure mode of LLMs.

And there’s no way for the human using it to be aware it’s happening.

newsoftheday a day ago

> All code must be compatible with GPL-2.0-only

How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.

philipov a day ago

You take responsibility. That means if the AI messes up, you get punished. No pushing blame onto the stupid computer. If you're not comfortable with that, don't use the AI.

sarchertech a day ago

There’s no reasonable way for you to use AI generated code and guarantee it doesn’t infringe.

The whole use it but if it behaves as expected, it’s your fault is a ridiculous stance.

philipov a day ago

adikso a day ago

newsoftheday a day ago

> That means if the AI messes up

I'm not talking about maintainability or reliability. I'm talking about legal culpability.

benatkin 14 hours ago

If they merge it in despite it having the model version in the commit, then they're arguably taking a position on it too - that it's fine to use code from an AI that was trained like that.

XYen0n 11 hours ago

Even human developers are unlikely to have only ever seen GPL-2.0-only code.

tmalsburg2 9 hours ago

Humans will not regurgitate longer segments of code verbatim. Even if we wanted to, we couldn’t do it because our memory doesn’t work that way. LLM on the other hand can totally do that, and there’s nothing you can do to prevent it.

johanyc 5 hours ago

tmp10423288442 a day ago

Wait for court cases I suppose - not really Linus Torvalds' job to guess how they'll rule on the copyright of mere training. Presumably having your AI actually consult codebases with incompatible licenses at runtime is more risky.

Luker88 11 hours ago

NIT: All AI code satisfies the GPL license.

Anything generated by an AI is public domain. You can include public domain in your GPL code.

I would urge some stronger requirement with the help of a lawyer. You only need a comment like "completely coded by AI, but 100% reviewed by me" to make that code's license worthless.

The only AI-generated part copyrightable are the ones modified by a human.

I am afraid that this "waters down" the actual licensed code.

...We should start opening issues on "100% vibecoded" projects for relicensing to public domain to raise some awareness to the issue.

manquer 5 hours ago

> Anything new generated by an AI is public domain[1]

Language models do generate character for character existing code on which they are trained on . The training corpus usually contain code which is only source available but is not FOSS licensed .

Generated does not automatically mean novel or new the bar needed for IP.

[1] Even this is not definitely ruled in courts or codified in IP law and treaties yet .

MyUltiDev 3 hours ago

Reading this right after the Sashiko endorsement is a bit jarring. Greg KH greenlit an AI reviewer running on every patch a couple weeks back, and that direction actually seems to be helping, while here the conversation is still about whether contributors will take responsibility for AI code they submit. That feels like the harder side to police. The bugs that land kernel teams in trouble are race conditions, locking, lifetimes, the things models are most confidently wrong about. I have seen agents produce code that compiles cleanly, reads fine on a Friday review, then deadlocks under contention three weeks later. Is this contributor policy supposed to be the long term answer, or a placeholder until something Sashiko-shaped does the heavy filtering on the maintainer side too?

HarHarVeryFunny 6 hours ago

It's a sane policy - human is responsible for what they contribute, regardless of what tools they use in the development process.

However, the gotcha here seems to be that the developer has to say that the code is compatible with the GPL, which seems an impossible ask, since the AI models have presumably been trained on all the code they can find on the internet regardless of licensing, and we know they are capable of "regenerating" (regurgitating) stuff they were trained on with high fidelity.

theshrike79 2 hours ago

Then we get to the Code of Theseus argument, if you take a piece of code and replace every piece of with code that looks the same, is it still the original code?

Is an AI reimplementation a "clean room" implementation? What if the AI only generates pseudocode and a human implements the final code based on that? Etc etc ad infinitum.

Lawyers will be having fun with this philosophical question for a good decade.

dataviz1000 a day ago

This is discussed in the Linus vs Linus interview, "Building the PERFECT Linux PC with Linus Torvalds". [0]

[0] https://youtu.be/mfv0V1SxbNA?si=CBnnesr4nCJLuB9D&t=2003

globular-toast 10 hours ago

Hardly "discussed", perhaps "mentioned". Sebastian is basically an entertainer who can plug things in to sockets.

WhyNotHugo 4 hours ago

Weird that they're co-opting the "Assisted-by:" trailer to tag software and model being used. This trailer was previously used to tag someone else who has assisted in the commit in some way. Now it has two distinct usages.

The typical trailer for this is "AI-assistant:".

aprentic 4 hours ago

I like this. It's an inversion of the old addage, "a poor craftsman blames his tools" and the corollary, "use the right tool for the job" (because a good craftsman chooses the appropriate tool).

You don't get to bang on a screw and blame the hammer.

KronisLV 6 hours ago

This is actually a pretty nice idea:

  Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
I feel like a lot of people will have an ideological opposition to AI, but that would lead to people sometimes submitting AI generated code with no attribution and just lying about it.

At the same time, I feel bad for all the people that have to deal with low quality AI slop submissions, in any project out there.

The rules for projects that allow AI submissions might as well state: "You need to spend at least ~10 iterations of model X review agents and 10 USD of tokens on reviewing AI changes before they are allowed to be considered for inclusion."

(I realize that sounds insane, but in my experience iterated review even by the same Opus model can help catch bugs in the code, I feel like the next token prediction in of itself is quite error prone alone; in other words, even Opus "writes" code that it has bugs that its own review iterations catch)

KaiLetov 14 hours ago

The policy makes sense as a liability shield, but it doesn't address the actual problem, which is review bandwidth. A human signs off on AI-generated code they don't fully understand, the patch looks fine, it gets merged. Six months later someone finds a subtle bug in an edge case no reviewer would've caught because the code was "too clean."

ugh123 13 hours ago

> they don't fully understand, the patch looks fine

I don't get this part. Why is the reviewer signing off on it? AI code should be fully documented (probably more so than a human could) and require new tests. Code review gates should not change

altmanaltman 12 hours ago

I mean the same can happen with human-written code no? Reviewer signs off on it and subtle bug in edge case no one saw?

Or you mean the velocity of commits will be so much that reviewers will start making more mistakes?

dec0dedab0de a day ago

All code must be compatible with GPL-2.0-only

Am I being too pedantic if I point out that it is quite possible for code to be compatible with GPL-2.0 and other licenses at the same time? Or is this a term that is well understood?

compyman a day ago

You might be being too pedantic :)

https://spdx.org/licenses/GPL-2.0-only.html It's a specific GPL license (as opposed to GPL 2.0-later)

philipov a day ago

GPL-2.0-only is the name of a license. One word. It is an alternative to GPL-2.0-or-later.

kbelder 20 hours ago

Right, the final hyphen changes the meaning of the sentence.

"GPL-2.0-only" "GPL-2.0 only"

feverzsj 15 hours ago

Linux is founded by all these big companies. Linus couldn't block AI pushes from them forever.

becquerel 8 hours ago

He's been vibecoding some stuff himself personally, on one of his scuba projects. You could take people as actually believing in the things they do and say.

simianwords 37 minutes ago

This is some ridiculous cope.

paganel 8 hours ago

Correct, in the end big money talks.

themafia a day ago

> All contributions must comply with the kernel's licensing requirements:

I just don't think that's realistically achievable. Unless the models themselves can introspect on the code and detect any potential license violations.

If you get hit with a copyright violation in this scheme I'd be afraid that they're going to hammer you for negligence of this obvious issue.

Joel_Mckay 16 hours ago

US legal consensus has set the precedent that "AI" output can't be copyrighted. Thus, technically no one can really own or re-license prompt output.

Re-licensing public domain uncopyrightable work as GPL/LGPL is almost certainly a copyright violation, and no different than people violating GPL/LGPL in commercial works.

Linus is 100% wrong on this choice, and has introduced a serious liability into the foundation upstream code. =3

https://en.wikipedia.org/wiki/Founder%27s_syndrome

https://www.youtube.com/watch?v=X6WHBO_Qc-Q

kam 15 hours ago

> Being in the public domain is not a license; rather, it means the material is not copyrighted and no license is needed. Practically speaking, though, if a work is in the public domain, it might as well have an all-permissive non-copyleft free software license. Public domain material is compatible with the GNU GPL.

https://www.gnu.org/licenses/license-list.html#PublicDomain

Joel_Mckay 15 hours ago

noosphr 15 hours ago

>Re-licensing public domain work as GPL/LGPL is almost certainly a copyright violation

Remember kids never get your legal advice from hn comments.

Joel_Mckay 15 hours ago

KhayaliY 21 hours ago

We've seen in the past, for instance in the world of compliance, that if companies/governments want something done or make a mistake, they just have a designated person act as scapegoat.

So what's preventing lawyers/companies having a batch of people they use as scapegoats, should something go wrong?

zxexz 14 hours ago

I like this. It's just saying you have responsibility for the tools you wield. It's concise.

Side note, I'm not sure why I feel weird about having the string "Assisted-by: AGENT_NAME:MODEL_VERSION" [TOOL1] [TOOL2] in the kernel docs source :D. Mostly joking. But if the Linux kernel has it now, I guess it's the inflection point for...something.

deadbabe 7 hours ago

How can we automate the disclosure of what AI agent was used in a PR and the extent of code? Would be nice to also have an audit of prompts used, as that could also be considered “code”.

bharat1010 16 hours ago

Honestly kind of surprised they went this route -- just 'you own it, you're responsible for it' is such a clean answer to what feels like an endlessly complicated debate.

lowsong a day ago

At least it'll make it easy to audit and replace it all in a few years.

martin-t a day ago

This feels like the OSS community is giving up.

LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0].

The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which

1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies

2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#.

I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public.

Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.

[0]: https://news.ycombinator.com/item?id=47356000

[1]: http://prize.hutter1.net/

[2]: https://en.wikipedia.org/wiki/ELIZA_effect

[3]: https://skeptics.stackexchange.com/questions/14925/has-a-pro...

ninjagoo 21 hours ago

> Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.

I think you'll find that this is not settled in the courts, depending on how the data was obtained. If the data was obtained legally, say a purchased book, courts have been finding that using it for training is fair use (Bartz v. Anthropic, Kadrey v. Meta).

Morally the case gets interesting.

Historically, there was no such thing as copyright. The English 1710 Statute of Anne establishing copyright as a public law was titled 'for the Encouragement of Learning' and the US Constitution said 'Congress may secure exclusive rights to promote the progress of science and useful arts'; so essentially public benefits driven by the grant of private benefits.

The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?

The more the people that copy your work with attribution, the more famous you'll be. Now that's the currency of the future*. [1]

You'll do it for the kudos. [2][3]

  *Post-Scarcity Future. 
  [1] https://en.wikipedia.org/wiki/Post-scarcity
  [2] https://en.wikipedia.org/wiki/The_Quiet_War, et. al.
  [3] https://en.wikipedia.org/wiki/Accelerando

martin-t 20 hours ago

> The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?

Yes.

I have 2 issues with "post-scarcity":

- It often implicitly assumes humanity is one homogeneous group where this state applies to everyone. In reality, if post-scarcity is possible, some people will be lucky enough to have the means to live that lifestyle while others will still by dying of hunger, exposure and preventable diseases. All else being equal, I'd prefer being in the first group and my chance for that is being economically relevant.

- It often ignores that some people are OK with having enough while others have a need to have more than others, no matter how much they already have. The second group is the largest cause of exploitation and suffering in the world. And the second group will continue existing in a post-scarcity world and will work hard to make scarcity a real thing again.

---

Back to your question:

I made the mistake of publishing most of my public code under GPL or AGPL. I regret is because even though my work has brought many people some joy and a bit of my work was perhaps even useful, it has also been used by people who actively enjoy hurting others, who have caused measurable harm and who will continue causing harm as long as they're able to - in a small part enabled by my code.

Permissive licenses are socially agnostic - you can use the work and build on top of it no matter who you are and for what purpose.

A(GPL) is weakly pro-social - you can use the work no matter what but you can only build on top of it if you give back - this produces some small but non-zero social pressure (enforced by violence through governments) which benefits those who prefer cooperation instead of competition.

What I want is a strongly pro-social license - you can use or build on top of my work only if you fulfill criteria I specify such as being a net social good, not having committed any serious offenses, not taking actions to restrict other people's rights without a valid reason, etc.

There have been attempts in this direction[0] but not very successful.

In a world without LLMs, I'd be writing code using such a license but more clearly specified, even if I had to write my own. Yes, a layer would do a better job, that does not mean anything written by a non-lawyer is completely unenforceable.

With LLMs, I have stopped writing public code at all because the way I see it, it just makes people much richer than me even richer at a much faster rate than I can ever achieve myself. Ir just makes inequality worse. And with inequality, exploitation and oppression tends to soon follow.

[0]: https://json.org/license.html

ninjagoo 17 hours ago

KK7NIL a day ago

> I strongly object to anthropomorphising text transformers (e.g. "Assisted-by").

I don't think this is anthropomorphising, especially considering they also include non-LLM tools in that "Assisted-by" section.

We're well past the Turing test now, whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming.

davemp 17 hours ago

> We're well past the Turing test now

Nope, there is no “The” Turing Test. Go read his original paper before parroting pop sci nonsense.

The Turing test paper proposes an adversarial game to deduce if the interviewee is human. It’s extremely well thought out. Seriously, read it. Turing mentions that he’d wager something like 70% of unprepared humans wouldn’t be able to correctly discern in the near future. He never claims there to be a definitive test that establishes sentience.

Turing may have won that wager (impressive), but there are clear tells similar to the “how many the r’s are in strawberries?” that an informed interrogator could reliably exploit.

martin-t a day ago

Would you say "assisted by vim" or "assisted by gcc"?

It should be either something like "(partially/completely) generated by" or if you want to include deterministic tools, then "Tools-used:".

The Turing test is an interesting thought experiment but we've seen it's easy for LLMs to sound human-like or make authoritative and convincing statements despite being completely wrong or full of nonsense. The Turing test is not a measure of intelligence, at least not an artificial one. (Though I find it quite amusing to think that the point at which a person chooses to refer to LLMs as intelligence is somewhat indicative of his own intelligence level.)

> whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming

It absolutely makes a difference: you can't own a human but you can own an LLM (or a corporation which is IMO equally wrong as owning a human).

Humans have needs which must be continually satisfied to remain alive. Humans also have a moral value (a positive one - at least for most of us) which dictates that being rendered unable to remain alive is wrong.

Now, what happens if LLMs have the same legal standing as humans and are thus able to participate in the economy in the same manner?

zbentley a day ago

williamcotton 8 hours ago

"Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0]."

That LLM response is describing a specific project with full attribution.

martin-t 6 hours ago

And it proves the code is stored (in a compressed form) in the model.

williamcotton 4 hours ago

tmp10423288442 a day ago

On https://news.ycombinator.com/item?id=47356000, it looks like the user there was intentionally asking about the implementation of the Python chardet library before asking it to write code, right? Not surprising the AI would download the library to investigate it by default, or look for any installed copies of `chardet` on the local machine.

martin-t a day ago

The comment says "Opus 4.6 without tool use or web access"

user34283 11 hours ago

For [0], it was supposedly shown to do it when specifically prompted to do so.

Despite agentic tools being used by millions of developers now, I am not aware of a single real case where accidental reproduction of copyrightable code has been an issue.

Further, some model providers offer indemnity clauses.

It seems like a non-issue to me, practically.

shevy-java a day ago

Fork the kernel!

Humans for humans!

Don't let skynet win!!!

aruametello a day ago

> Fork the kernel!

pre "clanker-linux".

I am more intrigued by the inevitable Linux distro that will refuse any code that has AI contributions in it.

pawelmurias 9 hours ago

Tardux Linux

baggy_trough a day ago

Sounds sensible.

spwa4 a day ago

Why does this file have an extension of .rst? What does that even mean for the fileformat?

jdreaver a day ago

https://en.wikipedia.org/wiki/ReStructuredText

This format really took off in the Python community in the 2000's for documentation. The Linux kernel has used it for documentation as well for a while now.

adikso a day ago

reStructuredText. Just like you have .md files everywhere.

SV_BubbleTime 16 hours ago

Everyone missed a great opportunity to lie to you and tell you that the Linux kernel now requires you to program in rust.

bitwize a day ago

Good. The BSDs should follow suit. It is unreasonable to expect any developer not to use AI in 2026.

vips7L 5 hours ago

It’s perfectly reasonable. We’ve been doing it for decades. It’s completely unreasonable to expect every developer to use “ai”, especially when it comes at such a heavy monetary cost.

NetOpWibby 21 hours ago

inb4 people rage against Linux

SV_BubbleTime 16 hours ago

Scroll down, some nerds have no chill.

NetOpWibby 14 hours ago

Good grief

gnarlouse 14 hours ago

I wonder if this is happening because Mythos

rwmj 13 hours ago

Interesting that coccinelle, sparse, smatch & clang-tidy are included, at least as examples. Those aren't AI coding tools in the normal sense, just regular, deterministic static analysis / code generation tools. But fine, I guess.

We've been using Co-Developed-By: <email> for our AI annotations.