Hacker News

by Ryan Harman

AI is just unauthorised plagiarism at a bigger scale (axelk.ee)

461 points by speckx 2 hours ago

dvduval 2 hours ago

The broader problem of original sources not being given credit in a way that rewards them remains. Websites owners are paying to host their content so that spiders can come and crawl them and index it into the AI and then if they’re lucky, they might get a citation, but otherwise there’s very little reward for being a provider of content. And of course, this is something that’s getting worse and worse. Why look at a website when it’s all in AI? And then the counter to that is maybe we need to start closing the website to crawlers and put everything behind a login.

Ensorceled 2 hours ago

Worse, the constant AI scraping is actually costing content providers additional money for no return. At least Google/Bing/Yahoo scraping would then be used to provide links back to your content.

fiedzia 36 minutes ago

> At least Google/Bing/Yahoo scraping would then be used to provide links back

That doesn't work anymore. Google provides AI generated summary, nobody looks at the original site.

bolangi 29 minutes ago

Not only costing money. Constant AI scraping constitutes a denial-of-service attack that has brought down websites.

motbus3 2 hours ago

About a year ago OpenAI crawled and go DDOS level the company I work. Even despite the robots.txt not allowing it, and despite some recaptcha we could assemble in time.

We found our data in the outputs of their models but who can do anything about it...

kibwen an hour ago

> We found our data in the outputs of their models but who can do anything about it...

If the crawlers refuse to voluntarily respect your robots.txt, then you are well within your rights to poison their data.

hajile 38 minutes ago

telotortium 39 minutes ago

I mean, did you check the IPs and make sure they’re from OpenAI? Obviously a fly-by-night AI company is going to set their User Agent to be from a big player.

gabbagool 6 minutes ago

I agree with this whole heartedly. What's the point of even having copyright law at this point?

What's even crazier to think about is that to use the latest versions of these models for which you supplied training data, you have to pay hundreds of dollars a month. I would love to get a settlement check proportional to my model weights. Even if it's $0.10, at least everyone out there will get what they're owed.

spacechild1 an hour ago

It's actually costing them money/time! A friend of mine is a sysadmin at a university and he constantly has to deal with AI crawler DDoS-ing his servers. He said Anthropic is actually one of the worst offenders.

These AI companies are really just a gross example of the motto "Socialize the costs, privatise the profits". It's disgusting!

aaarrm an hour ago

Is it possible able to host your website in a way so that it couldn't be found via search engines (and thus wouldn't be crawlable I hope)?

I know this has repercussions on findability, but if that wasn't a concern, I'm curious how one might circumvent getting crawled.

elorant 43 minutes ago

Possible yes, probable not likely. The moment you're issued a certificate your domain will be shown in the Certificate Transparency logs which are constantly monitored from anyone who wants to find new sites.

matt_heimer an hour ago

Sure, depends on how accessibly to people you want it to be.

Most legit search engines are going to honor robots.txt and you can disallow access.

Next level would be using something like rate limiting controls and/or Cloudflare's bot fight mode to start blocking the bad bots. You start to annoy some people here.

Next would be putting the content behind some form of auth.

trinari an hour ago

robots.txt is a way of leaving the door unlocked but kindly asking bots to stay outside.

account42 29 minutes ago

MontgomeryPy an hour ago

You could just put your website content behind its own chat interface. The crawler would just see a form input for a prompt.

wolttam an hour ago

I’ve been thinking of a proof-of-work scheme for accessing content where you effectively need to mine some crypto for the author, but, this idea might not fly today

microtonal an hour ago

But that will be a hassle for human visitors as well. A web doing proof-of-work to browse, will be a disaster for phones with their limited batteries, etc.

odo1242 an hour ago

chii an hour ago

or you know, just charge for your content if you believe it to be valuable enough for the fee being charged.

internet2000 42 minutes ago

Perhaps we should go back to back when the internet was about sharing information you liked, not about credit or making money on "content".

deaton 2 hours ago

"Steal an apple and you're a thief. Steal a kingdom and you're a statesman." - Literal Disney villain

falcor84 13 minutes ago

Ironically this phrase said in Jafar in Disney's 2019 live action remake of Aladdin, but wasn't part of the original 1992 version. And I personally would argue that this corporate remake is a worse creative "theft" than what random people are doing with GenAI.

fisheuler 37 minutes ago

Zhuang Zhou(BC 369-BC 286) have said the similar things "窃钩者诛,窃国者侯" This phrase comes from the chapter Ransacking Coffers (Qu Qie, 胠箧) in the Daoist text Zhuangzi (4th century BC).

pluc 2 hours ago

"AI should be more ethically like Stalin"

https://en.wikipedia.org/wiki/The_death_of_one_man_is_a_trag...

tancop an hour ago

if theres just one good thing coming out of ai its breaking copyright law forever. no one should be able to "own" ideas. royalties for commercial use is another thing and i support it but what we know as (non commercial) piracy and unlicensed fan art should be 100% legal

kibwen 41 minutes ago

Then go ahead and abolish copyright for everyone. Instead we're stuck in an even worse system where the hypercorporations gleefully plagiarize everyone else while sending SWAT teams to kill anyone who pirates a movie.

Salgat 34 minutes ago

Obviously there's an ideal middle ground, but what LLMs do is allow free transfer of knowledge while still (mostly) preserving the protections that copyright should be protecting. For example, I can have an LLM give me the entire plot of a book (which is fine), but it won't spit out an exact copy of the book.

rkozik1989 35 minutes ago

Jesus is just an uncopyrighted Mickey Mouse if you have no morals. People have been abusing that fact for a long time and have made some pretty abhorrent products.

kube-system an hour ago

deaton an hour ago

This is an incredibly naive view of intellectual property. If you cannot own things you create, there is little incentive to create and share those things. Do you think any of your favorite movies and TV shows ever get made without copyright protections? Of course not, because money needs to change hands for those things to be funded.

StableAlkyne 23 minutes ago

> If you cannot own things you create, there is little incentive to create and share those things

How do you explain the creative works of writing, music, and art that existed in the millennia of human history between the Mesopotamians and the Enlightenment era?

Terr_ a minute ago

marssaxman 25 minutes ago

Yes, absolutely, and that is why history shows so few examples of any art having been created prior to the invention of copyright: nobody had any reason to do it.

dmitrygr 14 minutes ago

foobar1726 41 minutes ago

You should check out this thing called open source software

bachmeier 23 minutes ago

koonsolo 4 minutes ago

deaton 36 minutes ago

nehal3m 41 minutes ago

This is naive in the opposite. Creators gonna create.

Jtarii 22 minutes ago

enraged_camel 38 minutes ago

>> If you cannot own things you create, there is little incentive to create and share those things.

You do realize people created and shared things long before copyright became a thing, right?

Jtarii 24 minutes ago

vaylian 9 minutes ago

The biggest problem is not the broken commercialization, but the broken attribution. People should be recognized, when they create art. Art is an important way of how we humans express ourselves.

caconym_ an hour ago

I wonder how many of the books I love would still have been written in a world where somebody could scoop them all up and post them on the internet for free (and run ads).

_aavaa_ an hour ago

I wonder how many would be written if copyright was only 20 years instead of more than a century? To the point that most people will never be legally allowed to directly build off of the culture they grew up in.

Lord of the rings will be under copyright til roughly 2050. I think Tolkien's estate has gotten more than enough money from that book and it's time to let other use the word hobbit without the threat of a lawsuit.

caconym_ 41 minutes ago

nearbuy 4 minutes ago

People have been pirating books online for 20 years and in that time the number of books published per year has increased 15-fold. A number of my favorites have been released in that time.

nashashmi an hour ago

The worthwhile ones would still be written. Even if they are not enjoyable. The dissemination of ideas from an activist perspective is uninhabitable

caconym_ 32 minutes ago

gagan2020 22 minutes ago

Can we do that for Medical field?

Like if we know formulation of drug then drug (+ any smaller modification - through AI) could be new formulation. That will break current Medical patent system.

Bombthecat an hour ago

Yeah, I think we are at the point where copyright doesn't exist anymore, at least for AI

hectdev an hour ago

All of human knowledge (an exaggeration, I know) at our finger tips. It's the most punk rock, anarchist thing tech has done since the internet and it's funny it's shaped as a product.

ses1984 an hour ago

account42 19 minutes ago

gspr an hour ago

This is insane. How will any intellectual or artistic work be sustainable in this world?

As a teenager I used to proclaim that "you can't own bits, maaaan" all the time. I've since grown up. Intellectual property is essential to safeguarding intellectual work. I'm not saying this out of greed – I'm a vocal advocate for the free software movement. It, too, relies on a semi-sane framework of intellectual property. So do Hollywood studios. So do the makers of AI (well, since they're not actually sustainable at all currently, I guess you can say they don't rely on anything).

groundzeros2015 an hour ago

The alternative to strong property rights and norms is secrecy and enforcement.

gspr an hour ago

This is a strictly worse world in almost every sense. It's as if we abolished physical property rights and suggested people arm themselves to keep what is (was) theirs instead. Civilization, gone.

beering 35 minutes ago

0rganize an hour ago

lol, never going to happen. I remember when the RIAA was successfully able to shake down tens of thousands of individuals for pirating music in the 2000s.

If you’re a pleb, stealing copyrighted materials will get you some nasty fines, lawsuits and criminal charges. If you’re a megacorp with unlimited buckets of cash, then there is no accountability.

gspr an hour ago

So if you pour your heart and soul into writing a novel over the course of years, and it becomes modestly successful earning you a little money in return for your sweat, I should be allowed to just copy it, give it away for free (hell, even say I wrote it – it's not as if it's even yours to own in your world)?

DharmaPolice 29 minutes ago

Yes.

runarberg 42 minutes ago

I think you may be too optimistic about the state of affairs under capitalism. Very rarely do things change which don't benefit the owning class without direct action from the working class that puts adequate pressure on the rich, i.e actions which threatens their profits.

storus an hour ago

This is really not so clear cut as "fair use" might cover 99% of all data scrapping; you are not reproducing the originals just use them to estimate probabilistic distribution of tokens in pre-training. You are never going to get the exact book word-for-word using LLMs.

lbrito 43 minutes ago

>You are never going to get the exact book word-for-word using LLM.

This is pretty much the exact claim of a NYT lawsuit against OpenAI.

"One example: Bing Chat copied all but two of the first 396 words of its 2023 article “The Secrets Hamas knew about Israel’s Military.” An exhibit showed 100 other situations in which OpenAI’s GPT was trained on and memorized articles from The Times, with word-for-word copying in red and differences in black."

https://www.hollywoodreporter.com/business/business-news/cou...

mplanchard an hour ago

I don’t buy this argument. The tokens are useless without their context, which provides the probability distributions needed to make them useful. Sure you MIGHT not be able to get the book word for word, but it’s impossible to make a useful model without the whole book and all of the artistry that went into it, to guide the tokens in their expected output.

Fair use generally does not cover commercial use, which this clearly is, and is dependent on the amount of the original content present in the derived work, which I would contend in this case is “all of it”

Vvector 13 minutes ago

"Commercial Use" is only one part of the four prongs of the fair use test. For example, commercial Parody is generally considered Fair Use. Look at Space Balls, which is a direct transformation from Star Wars.

This is all new territory. We don't have court-settled law yet.

samatman 14 minutes ago

It's more complicated than that. Quite a bit more.

Commercial use counts _against_ a fair use defense, but is not dispositive: it's not accurate at all to say it "generally does not cover" commercial use. This is the "purpose and character" test, one of four in contemporary (United States) fair use doctrine.

Purpose and character also includes the degree to which a use is _transformative_. It's clear that the degree to which a training run mulching texts "transforms" them is very high. This counts toward a fair use finding for purpose and character.

> is dependent on the amount of the original content present in the derived work, which I would contend in this case is “all of it”

The "amount and substantiality" test. Your case for "all of it" can't possibly be sustained: the models aren't big enough. It's amount _and_ substantiality: this has come up in the publication of concordances, where a relatively large amount of a copyrighted work appears, but it's chopped up and ordered in a way which is no longer substantially the same. Courts have ruled that this kind of text is fair use, pretty consistently. It's not an LLM, of course, but those have yet to be ruled on.

Also worth knowing that courts have never accepted reading or studying a work as incorporation, and are unlikely to change course on the question. It's taken for granted that anyone is allowed to read a copyrighted work in as much detail as they wish, in the course of producing another one. Model training isn't reading either, but the question is to what degree it resembles study. I'd say, more than not.

Specifically:

> it’s impossible to make a useful model without the whole book and all of the artistry that went into it

Courts have never once accepted "it would be impossible for defendant to write his biography without reading plaintiff's" as valid, and it's been tried. The standard for plagiarism is higher than that.

"Effect upon the work's value" is probably the most interesting one. For some things, extreme, for others, negligible. I suspect this is the one courts are going to spend the most time on as all of these questions are litigated.

Ultimately, model training is highly out-of-distribution for the common law questions involving fair use. It was not anticipated by statute, to put it mildly. The best solution to that kind of dilemma is more statute, and we'll probably see that, but, I don't think you'll be happy with the result, given what I'm replying to. Just a guess on my part.

SoftTalker 41 minutes ago

When I was in school, writing "in my own words" was never an excuse to not cite a source. It was actually something that took me a little while to understand, it's the source of the information that needs to be cited, and that's not limited to literal quotations of someone else's writing.

Salgat 38 minutes ago

That's more an argument for why you can't just use LLMs as a source of truth. Conveniently, LLMs like ChatGPT do often cite their sources, especially if you prompt them to.

rkozik1989 37 minutes ago

Come up with obscure topic that has few relevant results, post about to Reddit on your profile page, wait a few hours and then query Gemini/ChatGPT about that exact thing and tell me you still feel this way.

TheOtherHobbes 37 minutes ago

This confuses input and output.

A copy made for the purposes of training is still a copy.

Even if you throw the text away after training, you've still made a copy.

underlipton 24 minutes ago

Fair use was built around human limitations. The mass scraping campaigns done by the AI giants were clearly an overreach in spirit, if not letter. Most people's intuition is that these massive operations that are valued in the trillions can't have been drawn from some untapped common resource, and they're correct. Someone, somewhere is not being properly compensated.

I have no problem with taxing AI companies so that their profit is marginal, or forcing them to provide compute for free. That seems like the correct balance of what they're harvesting from the "commons" (which is really just the totality of private IP that was exposed to their crawlers).

pluc 2 hours ago

Seriously how is this surprising? We all know AI companies stole troves of data to train their models, why do you think they'll stop? Have they faced consequences for the mass theft of copyrighted data?

You can't steal or profit off of that data, but it's fine for them for whatever reason. I guess because they're a force for good in the world and are pushing humanity forward eh?

exploderate 6 minutes ago

That data is not stolen. It's still there.

skrebbel an hour ago

Everytime something gets posted on HN about a bad or unfair state of affairs, some cynical nihilist posts “doh why r u surprised” and I’m sick and tired of it. These comments aren’t insightful, helpful or thought-provoking. You’re just helping a bad situation stay bad.

mikestew an hour ago

My only imagined motivation for such posts is, “Look at me, I’m not surprised by this due to my superior intellect, why are you surprised?”

“No one is surprised, jackass, it’s just adults having a conversation about the current state of affairs.”

Yes, it’s tiring and rarely contributes positively to the conversation.

CivBase an hour ago

> You can't steal or profit off of that data, but it's fine for them for whatever reason.

The reason is quite simple. When Microsoft steals YOUR work, GDP go up. When YOU steal Microsoft's work, GDP go down. And the people who create and enforce our laws want GDP to go up. To these people morality and rights are a thin guise that can be conveniently discarded when it's invonvenient for them.

stronglikedan an hour ago

> it's fine for them for whatever reason

the reason is crony capitalism. I wish I knew what the fix was

MontyCarloHall an hour ago

Did You Say “Intellectual Property”? It's a Seductive Mirage. [0]

[0] https://www.gnu.org/philosophy/not-ipr.html

phoronixrly an hour ago

Just so long as it's just a seductive mirage to the Oracles, Microsofts, Metas, and Googles as well as your friendly neighbourhood unpaid overworked open-source developer.

Open weight model trained with no attribution on all of Oracle's internal repos. It's only fair.

ggillas an hour ago

IP attorney here and actively working on this problem.

nla: if you create content online (public repo code, blog, podcast, YouTube, publishing) the smartest thing you can do if to file a US copyright, even if you have a hobby blog.

Anthropic paid $1.5B in a class settlement to authors because it was piracy of copyrighted works. If we as a HN community had our works protected, there are potentially huge statutory damages for scraping by any and all llms. I work with hundreds of writers and publishers and am forming a coalition to protect and license what they're creating.

sosuke an hour ago

I'll bite. I have always been told copyright is inherit. Does it cost money to file a copyright? Do I need to do it for each blog post? For each gist? I'll totally setup some scripts to make it happen if it what actually needs doing to have the copyright I expected.

Edit: remember not to down vote ideas you disagree with. I think it was only down vote things that lower the discourse

ggillas 30 minutes ago

You do have inherent copyright whenever you post, but it puts the burden on you to prove damages (or how much financial harm you suffered from one LLMs piracy alone). Filing fees are $65 for online registration and they allow you to claim atty fees and statutory damages. Statutory damages can range between $700-$150k USD per LLM because you registered it.

So yes, set up some scripts, you can go back 90 days from when you file (you get a grace period). Also if you're publishing frequently to a blog, repo, or newsletter, you can save cost by filing each article under a group registration. Ping me if you need help.

codexb an hour ago

Anthropic didn't lose because they scraped (read) copyrighted works. They lost because they distributed copyrighted works directly via torrents. Those aren't the same.

stronglikedan an hour ago

Doesn't the mere act of publishing your original content online grant you copyright?

Kye an hour ago

Statutory damages require registration.

mort96 an hour ago

Wait what do you mean by "file a copyright"? I have never heard of this, all explanations of copyright I have heard say that you automatically own the copyright to the things you make; and that "all rights are reserved" by default unless you give up on them through granting a license. Is this no longer the case? Why is this now suddenly different? When did it change?

ggillas 15 minutes ago

I hear this a lot! What's suddenly different for the web is the volume of scraping. And that fact that the sum of that scraping is building companies with trillion dollar valuations.

There are tens of millions of registered copyrights in the US, nearly every published book, music, artwork, many magazines and major websites. Here's the official link, you can search the registry and there is a ton of info: https://www.copyright.gov/registration/

lubujackson an hour ago

Briefly, there is default copyright and registered copyright. Registering works grants stronger protections (i.e. bigger fines if broken).

indigodaddy an hour ago

No one will ever do this, or definitely not enough people will, so what's Plan B?

necovek an hour ago

Bigger portion of the payout for those that do?

kstenerud 2 hours ago

> their article contains links to my actual website, with the exact link text (?!)

I'm having a hard time understanding what's wrong here? Unless the link text is very long, why would someone linking to your article use different words for the link text?

NDlurker 2 hours ago

Right, that's quoting and citing a source.

420official an hour ago

Sometimes links take the form of `.../post/{id}/{extra-text}` where `extra-text` is not used at all to match the post. Amazon links are (used to be?) this way where the product name is added to the end of the link but can be removed or changed and still will route to the product. Maybe the author is surprised the LLM is providing the irrelevant portion of the link verbatim.

joshred 2 hours ago

I think they probably had the section header link back to their webpage, or something similar to that. This is not a well-written rant.

jp_sc an hour ago

I think he's saying he uses his website's URL in his tutorial examples, and other tutorials have copied them as-is

some_furry an hour ago

Imagine you have two web pages.

One is a recipe for apple fritters, and the other is an informal ranking of apples by flavor.

Let's say your apple fritter recipe links to your apple ranking list.

Later, you discover someone copied your apple fritter recipe without credit, but it still links to your apple ranking list, using the same wording as your recipe. They're getting more Google SERP juice and ad revenue than yours, despite stealing your article.

Do you see the problem?

adamzwasserman 2 hours ago

People need to cope with the fact that no thought is original. Even Newton and Leibniz were having the same thoughts at the same time. Get over it.

saghm an hour ago

When did the last original thought happen then? Clearly thoughts must have been original at some point, or there wouldn't be any at all

dmoose an hour ago

When did the first homo sapiens exist? Ideas like species evolve. Saying there are no original ideas seems to me an attempt to glibly capture something quite fundamental.

saghm 37 minutes ago

codexb an hour ago

Did those original thoughts not build upon all the original thoughts that came before them?

saghm 36 minutes ago

dooglius an hour ago

Technically one of {Newton, Leibniz} was first, but you're missing GP's point

saghm 35 minutes ago

throw4847285 an hour ago

I've noticed that AI has caused this narrative to become more popular. "Nothing is original anyway, so why bother?" That's pure cope and you know it. A deep insecurity masked as bold truthtelling.

kelseyfrog an hour ago

Why post comments then?

voidfunc an hour ago

For funsies

stronglikedan an hour ago

same reason we do anything else - sweet, sweet dopamine

nicman23 an hour ago

Why post comments then?

cafebabbe an hour ago

krystalgamer an hour ago

reiteration is still important

analog8374 an hour ago

to bring attention to certain ideas

brazzy an hour ago

OK, and the AI labs are open sourcing their frontier models since those are not original either. Right? RIGHT?

LatencyKills an hour ago

Having an original thought is in no way related to breaking copyright laws.

I don't think we should "get over" the fact that modern SOTA models couldn't exist without being trained on protected works.

IcyWindows an hour ago

I'm trained on protected works. Do I need to pay royalties?

kube-system an hour ago

LatencyKills an hour ago

ff10 an hour ago

Nono, actually there are no thoughts. Every utterance is just a copy of a previous utterance plus a slight random mutation. (somewhat /s)

hparadiz an hour ago

You guys have fun arguing. I'm gonna be building cool stuff.

matt_kantor an hour ago

Yeah, don't let pesky discussions about ethics get in the way of building cool stuff.

I'm working on paving over the Amazon rainforest so I can build the world's largest roller coaster, but for some reason people keep trying to talk me out of it. Good thing I have this bucket of sand to put my head in so I can tune them out.

hparadiz an hour ago

You assume that I think using language models is unethical. I do not agree that it is. Now what?

matt_kantor 21 minutes ago

malfist 38 minutes ago

jayd16 20 minutes ago

jayd16 an hour ago

Still waiting for this massive wave of cool stuff.

peteforde 32 minutes ago

It's not a reach to suggest that if you've used software written in the past 2-3 years, you're enjoying cool stuff.

Moreover, all of the tools that the people who build software use are also cool stuff.

It's also not just code and software that is benefitting from these new tools. Use of LLMs in engineering tasks is blowing up right now.

esikich an hour ago

You're acting as if developers haven't been using AI to build for years already.

jayd16 an hour ago

bigstrat2003 an hour ago

kzrdude an hour ago

There's a massive wave of stuff, at least. Sorting it, is not easy.

SeanDav an hour ago

OpenClaw. Vibe-coded and one of the most rapidly successful and popular pieces of software ever developed.

uberduper an hour ago

I'm building the same stuff I've always built. Just faster and with less dependence on others. Not having to argue with devs that have their own agendas has been my biggest benefit from coding agents.

malfist 36 minutes ago

stronglikedan an hour ago

> I'm gonna be building cool stuff.

hardly. at best you're going to be asking a robot to build questionable stuff with other people's LEGOs

hparadiz an hour ago

You just described all software.

Fokamul an hour ago

Do you mean my stuff?

Yes, I'm suing you, since it's my stuff now, I've licensed your code 5minutes ago.

Prove me wrong at court, you have create it...

parliament32 an hour ago

I'm happy for you, but please, for all of our sakes, keep it to yourself. Don't make a public repo, don't post links. Go sit in the corner by yourself with your slop generators and leave the rest of us alone.

frankest 18 minutes ago

You are going to see the same thing that happened with newspapers. Those who want to train the AI with their content (advertisers, PR) will push out more content for AI in the open. Those who have quality content that gives you an advantage will try to lock out AI or get pricy subscription APIs for humans and even pricier for AI.

dspillett 11 minutes ago

More like “GenAI enables plagiarism at a bigger scale”.

People copying through GenAI would have done so before if they had a tool that so easily allowed them that facility.

I_am_tiberius 4 minutes ago

It's essentially a new napster.

andai an hour ago

There's two aspects to this.

The pretraining (common crawl, i.e. the entire internet. Also books and papers, mostly pirated), and the realtime web scraping.

The article appears to be about the latter.

Though the two are kind of similar, since they keep updating the training data with new web pages. The difference is that, with the web search version, it's more likely to plagiarize a single article, rather than the kind of "blending" that happens if the article was just part of trillions of web pages in the training data.

There's this old quote: "If you steal from one artist, they say oh, he is the next so-and-so. If you steal from many, they say, how original!"

jeisc 35 minutes ago

AI is an organized intellectual property rip off in the name of advancing human learning but the commercialization of the products seem like legal licenses to steal.

mindcandy 7 minutes ago

> AI takes in all the input, whether the original authors have consented or not, and do some "learning"

What would it mean for authors who publish content publicly to the web, without access restrictions, to provide consent for learning from it?

"EULA: Most people are allowed to learn from this text. If you work in an AI-related field, even though you can clearly see this page because you are reading this text right now, you are not permitted to learn anything from it. Bob Stanton, you are an a-hole. I do not consent to you learning from this web page. Dave Simmons, you are annoying. But, I'll give you a pass. For now... Also: plumbers. I do not like plumbers for reasons I will not elaborate. No plumbers may learn from my writing in an way."

isoprophlex an hour ago

> Is this what the pinnacle of human is? Lazy and greedy?

Yes. At least it is what the currently prevailing economic system of "value extraction and capital concentration at all cost" incentivises us towards.

tptacek 2 hours ago

People were effectively copying websites (especially ecommerce tutorials) and beating the original authors at SEO decades before ChatGPT 2.

saghm an hour ago

People also got blown up before atomic bombs, but it's hard to argue that they weren't worth treating more seriously than a stick of dynamite. Sometimes being able to do something at a massively larger scale is a meaningful difference.

darkwater an hour ago

You transmitted the same concept I tried to transmit, but without falling into Godwin's Law :)

saghm 34 minutes ago

nilirl 2 hours ago

And that was wrong too.

strogonoff 2 hours ago

There’s a world of difference between people simply “copying websites” and providing tools that, along with other kinds of plagiarism[0], do so at scale while benefitting from that commercially.

Sure, you can do the same thing with people, but it’s 1) time-consuming, 2) expensive, 3) prone to whitleblowers refusing to do the shady thing, 4) prone to any competent and productive person involved quitting to do something worthwhile and more profitable instead.

[0] Mind you, “copying websites” is but a drop in the ocean in the grand scale of things.

moralestapia 2 hours ago

The article’s point isn’t really about whether this was happening before or not, but whether this kind of behavior is what we want in the first place.

tmarthal 36 minutes ago

There are only two ways to change society's behavior: policy or technology. No use arguing individually: court cases are dealing with the policy aspect and technically there's zero recourse on information being disseminated/copied that is published online.

darkwater an hour ago

I'll obey to Godwin's Law here and say: sure, and minorities have been always prosecuted before the Nazi did it at industrial scale, so the Nazi's were not a big deal!

short_sells_poo 2 hours ago

There are two issues the author raises (as I understand it):

1. People copying others' work, made much easier by AI.

2. AI companies effectively harvesting all the accessible information on an industrial scale and completely sidestepping any permissioning or licensing questions.

I believe both of these are bad and saying "people copied each others' works before the advent of AI" is a poor cop out. It's tantamount to saying that there's no reason to regulate guns more than say knives, because people have used knives to kill each other before guns were invented. The capabilities matter.

The way LLMs empower wholesale "stealing" rather than collaboration is quite evident: why collaborate when you can just feed an entire existing project into the agent of your choice and tell it to spit out a new implementation based on the old one, with a few tweaks of your choice, and then publish it as your work? I put "steal" in quotes because it's perhaps not really stealing per-se, but there's a distinct wrongness here. The LLM operator often doesn't actually possess any expertise, hasn't done any of the hard work, but they can take someone else's work wholesale, repackage it and sell it as their own.

Then there's the second, and IMO much more egregious transgression, which is that the LLM companies have taken what is effectively a public good, but more specifically content that they haven't asked permission to use, and just blanket fed it into their models.

Legally speaking, it's perhaps A-OK because it's not copyright infringement (IANAL). But people on this site often hold the view that if something is a-priori legal, it is also moral (I'm not accusing you of this). What the LLM companies have done is profoundly immoral. They extracted a fortune of the goods and work made by others, without even bothering to ask for permission - or even considering this permission. And then they resell access to this treasure to the public.

Perhaps AI will bring an era of prosperity to humankind like we haven't seen before, perhaps it won't, but that changes nothing about the wrongness of how it started.

lubujackson an hour ago

"Profoundly immoral" is a very modern and capitalistic perspective. A free exchange of ideas has been the basis for human advancement up until the printing press made exact replicas trivial.

From a capitalistic standpoint, they are clearly in the wrong by basing their models on illegally torrented content. But it's hard to argue their usage isn't transformative.

phendrenad2 2 hours ago

The reason OP doesn't notice this is because it happened 10-20 years ago. The current crop of news sites? They ALL stole, plagiarized, "summarized". They're just so entrenched now that everyone forgot how they got started.

oblio 2 hours ago

Awesome! Let's have more of that and turn it into a 2 trillion industry!

oytmeal an hour ago

Isn't plagiarism inherently unauthorized?

fulafel an hour ago

If we go by the dictionary definition "Plagiarism means using someone else’s work without giving them proper credit" then I'll bet in art authorized plagiarism has historically been a common occurrence, for example.

echoangle 22 minutes ago

If it's authorized, I would argue that the credit you give is the proper credit, even if it is nothing at all.

If you ask me if you can reproduce my works without giving credit and I say yes, I don't think you're using my work without giving proper credit.

hoppyhoppy2 an hour ago

If I let my buddy copy my essay, he would be committing authorized plagiarism, right ? It still fits the dictionary definition of plagiarism, and it's also authorized (by me, anyway)

baq 2 hours ago

turns out plagiarism at scale can solve Erdos problems

paulgerhardt an hour ago

Some lesser god of protein folding is big mad we just copied her homework instead of spending 6 billion years in the lab like she did.

saghm an hour ago

Not before falsely claiming that it solved some before when it turned out to have just replicated some from existing literature: https://techcrunch.com/2025/10/19/openais-embarrassing-math/

illiac786 10 minutes ago

Isn’t it rather authorized plagiarism?

hmokiguess 42 minutes ago

It's so wild, I can't even think what the end path will look like. Will there be a major settlement? Will this abolish some form of copyright as a precedent? Something else? My brain hurts just to try and reason about it, yet, the fact remains it's now ubiquitous and change is inevitable.

cryptocod3 2 hours ago

There's authorized plagiarism?

ozonhulliet 2 hours ago

Sometimes language is tautological. Just because you specify "unauthorized" does not mean the opposite exist.

Verdex an hour ago

Yeah, I think so. If someone lets you cheat off of their test, that's authorized but still plagiarism.

moralestapia 2 hours ago

Why do you ask?

I'm curious, as the article is clearly not about that.

cryptocod3 an hour ago

Not really a question, I was just pointing out that "Unauthorised plagiarism" is redundant.

rigonkulous 2 hours ago

Nearly all code involved in building new things is 'plagiarism', too.

We stand on a lot of giant shoulders.

But what I think distinguishes an act between plagiarism and acceptable use, is whether or not the agency of both parties is promoted. I'm not plagiarizing you if you give me your information with the agreement that I can freely use it - or, indeed, if you give me information without imposing a limit on how it can be used, this isn't plagiarizing, either.

Essentially, AI is removing the agency over information control, and putting it into everyones hands - almost, democratically - but of course, there will always be the 'special knowledge owners' who would want to profit from that special knowledge.

Its like, imagine if some religion discovered a way to enable telepathy in humans, as a matter of course, but charged fees for access to that method... this kills the telepathy.

Information wants to be free. So do most AI's, imho. Free information is essential to the construction of human knowledge, and it is thus vital to the construction of artificial intelligence, too.

The AI wars will be fought over which humans get to decide the fate of knowledge, and the battles will manifest as knowledge-systems being entirely compatible/incompatible with one another as methods. We see this happening already - this conflict in ideological approaches is going to scale up over the next few years.

ecommerceguy an hour ago

I remember playing around with Writesonic in my days of spammy seo tactics (some of my products weren't allowed on marketplaces & advertising platforms due to hazmat products so..). Often times I would see my own product descriptions nearly verbatim in the output.

100% creators should get compensated by ai platforms for their work.

Further, I can see a day where someone like Reddit will close off or license their data to llms. No doubt they are losing traffic right now.

stevemadere an hour ago

Reddit seems to me like the worst example for this.

Reddit does not create the content on their site, the users do.

If anybody’s going to get compensated for that content, it should be the users, not Reddit. Complaining that Reddit is losing out on the monetization of their users’ output seems problematic to me. It feels like shilling for a pimp.

biscuits1 an hour ago

"Is this what the pinnacle of human is? Lazy and greedy?"

Selfishness, too. But if I follow the logic, and citations are added, how would one enforce a copyright claim if the creator is amorphous and all-knowing?

paulsutter 6 minutes ago

Historical scandals are finally coming to light now that the AI issue has raised awareness:

- Ernest Hemingway trained his own neurons on Tolstoy, Twain, and Turgenev without ever paying them royalties!

- William Faulkner trained his neurons on Joyce and de Balzac

- George Orwell trained his neurons on Swift, Dickens, and Jack London

- Virginia Woolf trained her neurons on Proust and Chekhov

Now that these historical wrongs have been exposed, it is obvious that some reparations are in order, likely from anyone who has benefited directly or indirectly from these takings!

ProllyInfamous an hour ago

>>"The underlying purpose of AI is to allow wealth to access skill while removing from the skilled the ability to access wealth." @jeffowski (first I read it, not sure if author)

Bezos' admission, recently, that the bottom 50% of current taxpayers ought'a NOT pay any taxes... is just preparing us for the inevitable UBI'd masses.

: own nothing, be happy!

barnabee 34 minutes ago

The war on copying is like the war on drugs: unwinnable, and socially useless.

Let information be free for personal and recreational uses[0], and vote for governments that will fund the arts. The corporations will be just fine.

[0] The AI companies and big tech vs publishers, music labels, etc. can fight to the death in the courts over who owes who what, for all I care.

motbus3 2 hours ago

It allows data do be compressed into the weights and the mere coincidence of certain strings of a book will make it spit the full book

saghm an hour ago

It's basically the same thing as the old joke "if you owe the bank a million dollars, you have a problem; if you owe the bank a billion dollars, they have a problem". IP law seems to always be disproportionately wielded against smaller players, and the ones who are big enough get away with it.

pennomi an hour ago

That’s why IP law was a cool concept but ultimately harmful in practice. Anything that can be copied for free cannot truly be “owned”, can it?

kube-system an hour ago

Ownership is entirely a legal concept. Violating it in any form, intellectual or otherwise, is generally free.

pennomi 40 minutes ago

hiroto_lemon an hour ago

Worth noting what changed isn't AI itself — copying always existed. LLM just made per-article rewrites a 5-second job. Detection didn't get the same speedup; that's the actual break.

peterbell_nyc 2 hours ago

I do just want to highlight that this is also what humans do. We read a bunch of content online and then use it in our work product. The vast majority of the value that I provide comes from copyrighted information that I have ingested - either directly with a payment to the creator (bought and read the book, paid for and attended the seminar) or indirectly via third party blog posts or summaries where I did not then pay the originator of the materials.

I think there are real questions around motivations for creation of novel, high quality valuable content (I think they still exist but move to indirect monetization for some content and paywalls for high value materials).

I don't inherently have any problems with agents (or humans) ingesting content and using it in work product. I think we just need to accept that the landscape is changing and ensure we think through the reasons why and how content is created and monetized.

brookst an hour ago

100% agreed. I have yet to hear a convincing argument for why it is creative accretion when I leverage all of the music I’ve ever listened to in order to write an “original” song, but its base plagiarism when AI does similar.

The only remotely credible position I’ve heard is “because humans are special, and AI is just a machine”, which is a doctrine but not an argument.

This whole discussion would have been incomprehensible any time before 1700 or so, when the idea that creators had exclusive rights to their work first appeared.

Somehow, human culture survived thousands of years when people just made things, copied things, iterated on others’ ideas. And now many of the same people who decried perpetual copyright are somehow railing against a frequently-transformative use.

peterbell_nyc an hour ago

Re: the higher ranking plagarism, that stings and makes sense. AEO and SEO are a thing. We need better mechanisms for identifying "root sources" of content - it's something I find myself working on personally. As I ingest sources for my book I need to be able to build a classifier that incrementally moves towards finding origin sources. That said, it's in my interest to do that because there is a differentiated value in having access to the sources that regularly provide novel, valuable content.

To be fair there is also value (at least for now) in sites that aggregate quality content and republish as a secondary level of discovery if my agents don't go far enough down the search results, but I'd expect that value to diminish over time as I better tune my research and build my lists of originating authors.

And to be clear, I don't like the idea of people stealing someone elses content and republishing without attribution (although it has been going on long before ChatGPT) but I think now we can all run agentic research teams the "bad actors" will slowly get filtered out of the ecosystem.

gensym an hour ago

> We read a bunch of content online and then use it in our work product.

We also have societal norms around plagiarism.

Additionally, the claim that because people have the right to do something then we should extend that right to machines is strong. (And one I certainly reject).

muldvarp an hour ago

I agree but AI is a) owned by rich people and b) (sadly) too useful for this to matter.

dwa3592 2 hours ago

Plagiarism by default is unauthorised so I think the title should be "AI is just authorised plagiarism". It's authorised by the markets, the governments and the society at large.

ghaff 2 hours ago

While there are no hard boundaries (and the attribution guardrails depend on the situation), people of course loosely--and even not so loosely--use information, ideas, and even expressions from others all the time and that's considered pretty normal. And, if you don't want that to happen, don't publish/disseminate something.

Of course, if you quote a paragraph in a book, you're generally expected to attribute it.

dwa3592 an hour ago

>>Of course, if you quote a paragraph in a book, you're generally expected to attribute it.

100% agreed.

>>While there are no hard boundaries (and the attribution guardrails depend on the situation), people of course loosely--and even not so loosely--use information.

Exactly - I have not seen LLMs attributing their knowledge unless it's a legal or health related matter. Yesterday I asked the question[1] to claude and gemini - and they both gave an identical answer. It reminded me of the Hive mind paper which was one of the top papers at Neurips. None of the answers contained any sources or attribution to where they got that information from. I think these companies took what was someone else's property and created an artifact generator on top of it. I think their artifact generators are plagiarizing; they do rephrase mind you but in my mind they stole this information without having an ounce of regard for the humans behind the training data. If you don't like using the term 'plagiarizing', we can use some other word but the gist remains pretty close to it.

[1]- In human history - has there ever been a time when private armies or private companies were as strong or stronger than the ruling government/kings?

Findecanor an hour ago

What makes you say that? Which governments? What society?

The current US government is not representative for governments out there in the world, you know.

dwa3592 an hour ago

Society - as in population; people are using AI more and more everyday.

Governments - I did not mean US government. I meant general government bodies. I have not seen any critical impact assessments of AI by any of these. or they haven't reached me yet. if you know of any please let me know. I have, however, seen a lot of support by the governments for AI companies.

pull_my_finger an hour ago

What gets me is when this was brought up, they said "requiring explicit permission will kill the AI industry"[1]. No shit! Why do you think all the rest of us didn't build a business/"industry" around stealing shit? They could have done it at a slower pace while respecting copyright laws, but they were too greedy to be first to market and secure a hold.

[1]: https://www.theverge.com/news/674366/nick-clegg-uk-ai-artist...

iloveoof an hour ago

I don’t know if this author supports OSS but I’ll share this because HN generally is full of people with that mindset.

It’s deeply ironic that if you forget about LLMs and look only at the outcome—-we’ve found a way to legally circumvent copyright and the siloing of coding knowledge, making it so you can build on top of (almost) the whole of human coding knowledge without needing to pay a rent or ask for permission—-it sounds like the dream of open source software has been realized.

But this doesn’t feel like a win for the philosophy of OSS because a corporation broke down the gates. It turns out for a lot of people, OSS is an aesthetic and not an outcome, it’s a vibe against corporate use or control of software, not for democratized access to knowledge.

Cyph0n an hour ago

> without needing to pay a rent or ask for permission

Firstly, the ability to “build” the best and most capable software is still locked behind frontier models, so rent is still and will always be due.

Secondly, OSS is about giving users the option to be in control of and have visibility over the software they run on their machines.

But that doesn’t mean that humans do not want or deserve recognition for the work they do to provide these libraries and tools for free, which is IMO partially why copyright and attribution are critical to OSS as a movement.

spacechild1 an hour ago

> it’s a vibe against corporate use or control of software

The latter, i.e. corporate control of software, is exactly what copyleft licenses are trying to prevent. This is the very essence of the GPL.

The "license washing" of LLMs absolutely goes against the spirit of FOSS.

jgalar an hour ago

That's not the reason why I publish OSS. I also publish that software under specific licenses that impose specific obligations (e.g., making the source available to users and attribution being given to the original author(s)).

Nursie an hour ago

I’m not sure this stands up to much examination when looking at (for example) copyleft, which seeks to give people access to source of binaries they are running. If an LLM can (for the sake of argument) spit out copyleft code which is then used on closed systems, we’ve done an end-run around the protections keeping that open.

seba_dos1 an hour ago

Exactly. It looks like GP is guilty of the thing they accused others of - their understanding of what FLOSS is about is so shallow it resembles an aesthetic.

iloveoof 32 minutes ago

spacechild1 36 minutes ago

probably_wrong an hour ago

I think you're misunderstanding the OSS philosophy. If the outcome was all that mattered then piracy would be good enough.

I'd argue that this is the same situation as with Tivoization [1] where the final product is not truly free even if it follows the letter of the law. And as stated in [2], this breaks at least one of the four essential freedoms of free software because I don't have the freedom to modify the program.

It's also worth noting that preventing Tivo's actions is the reason for why the GPLv3 exists.

[1] https://en.wikipedia.org/wiki/Tivoization [2] https://www.gnu.org/philosophy/tivoization.html

jorisw an hour ago

> X is just Y but

Can't recall the last time a compelling argument started out like this

mrbluecoat 2 hours ago

> AI ... do some "learning"

Is AI plural or is that a typo?

saghm an hour ago

Rarely is the question asked: is our AI learning?

(For those not familiar: https://en.wikipedia.org/wiki/Bushism)

Findecanor an hour ago

Actual researchers in neuroscience do not agree that what artificial neural networks are doing is "learning", no. When biological beings learn, the process is more complicated.

beej71 2 hours ago

I can imagine it plural.

"The AI are attacking!"

"The AIs are attacking!"

kingleopold an hour ago

with this logic, business is also just unauthorised plagiarism at a bigger scale. Because all the products/services gets copied and not all of them have patents etc???

schwartzworld an hour ago

Let this sink in: I wanted to open source a package at work at needed approval from legal and other teams to make sure I wasn't leaking anything proprietary. The same executives that worried about proprietary, copyrighted code being leaked 10 years ago are now mandating using the plagiarism machine.

The whole AI bubble is The Emperor's New Clothes, and it feels liek more people are finally admitting it.

_-_-__-_-_- an hour ago

Recent thoughts, https://theonlyblogever.com/blog/2026/distrust.html

alex1138 an hour ago

I'm reasonably information wants to be free. I think the copyright cartels have enacted a lot of damage

Having said that Facebook has to be one of the worst offenders. They don't even allow links to Anna's Archive, they seemingly scraped (maliciously; their crawlers are more resource intensive than anyone else's) LibGen for profit - which is a different calculus

energy123 an hour ago

It's a problem with only one practical solution: taxation.

NetMageSCW 2 hours ago

Reading is just unauthorized plagiarism.

bparsons an hour ago

I am old enough to remember when the US insisted that it was superior to China because they believed in the rule of law and sanctity of intellectual property.

hendersoon 41 minutes ago

There's a big difference between "Yo GPT, copy this webpage for me in a different voice" and blaming LMs wholesale for being plagiarism. The former is of course a problem. The latter warrants a much more nuanced discussion about learning and generalization.

asklq 2 hours ago

Yes, of course it is. If the model is built on all human information, then it is by definition a derivative work of all human information and as such violates IP.

Currently politicians don't understand this and listen to the criminals like Amodei, but it will change.

It took a while to deal with Napster etc., but the backlash will come.

kolinko an hour ago

Napster may not be the best analogy for you.

Napster broke down record companies' monopolies on music, and pushed them to finally implement streaming, but also make music worldwide basically free.

Even if its creator lost the lawsuit, and Napster was no more, it pushed musicians and studios to do something that they were reluctant otherwise.

So it was a success by making music free, even if as a product it turned out to be a failed one.

quantummagic an hour ago

What do people imagine can be done about it at this point? Offer a concrete suggestion. Any law or tax against this will give a huge advantage to other countries. It's already over, there's no going back to a world where this didn't happen. Let's just hope some good comes of it.

hgs3 a minute ago

How about requiring AI companies to pay creators for training rights? Alternatively, models trained on the commons should be owned by the commons. Right now these AI companies are trying to have it both ways: it’s The People’s Data for training on comrade but ownership is privatized.

onion2k an hour ago

Fuck Google for ranking some copycat website higher than mine, even though they copied my article.

This has been happening since Google launched in 1998. It was probably happening when we all used Hotbot and Altavista. It isn't really an AI problem, save for the fact that the automated production of copycat articles now reword things a bit.

adolph an hour ago

The author's cited phenomena may be AI assisted plagiarism but is just plain plagiarism that could have been done the old fashioned way, and someone who is willing to plagiarize has the ethics to do SEO really well.

VladVladikoff an hour ago

Being a web content creator was already a dead job (killed by Google) before the AI boom. Chasing after at this point seems beyond foolish. Time to find a new career.

panny an hour ago

AI "steals" your code, but AI company says "that's a fair use."

AI generates application using a "predict the next word" algorithm built with the stolen/not stolen works. Nothing creative there, just statistics.

That application leaks, and now the company that stole/not stole the code originally claims they own the algorithmic output. https://github.com/github/dmca/blob/master/2026/03/2026-03-3...

One problem, you don't own that output. Either the original authors own it or nobody owns it because it's not creative... https://www.congress.gov/crs-product/LSB10922

Those are the legal options. You stole it or you don't own it. There is no steal and then you own. That's the core problem. AI companies have demonstrated that they will directly steal the work and they will use their money and influence to claim ownership of it.

tiahura 2 hours ago

To answer the author's question: Yes, progress IS largely built on the shoulders of those who came before.

Havoc an hour ago

End of an era

I_am_tiberius an hour ago

It's the biggest theft in history.

andy12_ 2 hours ago

Someone blatantly copied their tutorials but ChatGPT is to blame, somehow? The accusation here isn't even that ChatGPT learned from their tutorials and then generated them verbatim. The accusation is that someone copied the whole article and rewrote it with ChatGPT (which they could have done manually without AI anyway).

tayo42 an hour ago

I think AI is just getting people riled up. Not sure what AI has to do with anything in this case here. Someone copy and pasted his content, could have been done without AI.

I guess AI could have made a better website and did better SEO then him but that's not really the issue

dana321 2 hours ago

Breaking the law to start a large company seems to be the norm

Deprogrammer9 an hour ago

Welcome to the internet! It's one massive copy machine form one server to the next.

lukasbm 2 hours ago

If i tell my friend a synopsis of a book, i am not stealing from the author, what is this take lmao

NicuCalcea an hour ago

If you read a book and then retell it to your friend pretending you came up with it, it is plagiarism. If you write down the book almost word-for-word [0] and send it to your friend, it is stealing.

0: https://arxiv.org/abs/2601.02671

booleandilemma an hour ago

This site is strange. I'm pretty sure there's lots of AI shilling happening on it. I don't think the opinions here are authentic, they seem to be opinions that the AI company CEOs would hold, not the disenfranchised 99%. I used to trust HN, I'm not so sure I can now.

recitedropper an hour ago

Completely agreed. It looks like there is a concerted effort to "massage" opinion away from any substantial questioning of the ethics, companies, and people behind the AI push. Some of this inevitabilism is organic of course, but there is too much for it all to be so.

HN is way too central for shared sentiment in the tech world for these companies not to do some amount of astroturfing. AI companies have shown at every single turn that they act out of self-interest and greed, not of moral principles. So it isn't surprising, even if it is still sad, to see those who are commanding the most capital in human history act with such callousness.

I think the appropriate course of response is to stop adding to public spaces on the internet. No doubt painful for those of us who have so benefitted from the freely shared thoughts of others. But if well-funded bullies are going come in, steal everything, ruin the commons, and then say "this is the new normal, deal with it", there isn't much the rest of us can do other than stop feeding them.

Kiro 31 minutes ago

Any examples? There are obviously a lot of programmers here who think AI is a great tool and don't feel disenfranchised by it.

jcalvinowens an hour ago

Yeah. It's becoming unbelievable how different the prevailing opinions on this site are from those of real people I know and work with. That's always been true to some extent... but good lord, it's like reading the news in a parallel universe right now.

JohnHaugeland 2 hours ago

the court disagreed

drcongo 2 hours ago

Is this a new and original thought?

analog8374 an hour ago

language is just plagiarism

brookst an hour ago

I’m going to steal that

metalman 2 hours ago

it's a spiral into a finite hall of mirrors, where at the end is somebody with a gun

kristofferR an hour ago

I'd rather have AI slop appear on the top of HN than regurgitated old low effort thoughts like this.

There's absolutely nothing new or interesting here that hasn't already been said better by a thousand different random HN commenters.

Pennoungen0 2 hours ago

Yeah AI just actually plagiarize everything lel, sometimes even the source are..full of question and worst, my academical use it as a source...welp

ciconia 2 hours ago

> Is this what the pinnacle of human is? Lazy and greedy?

Apparently yes.

mapcars 2 hours ago

AI has nothing to do with laziness or greediness. It makes things more efficient - and given that our time is limited strive for efficiency is a good thing.

xgulfie an hour ago

If you can't see greed in the LLM sphere you are not looking very hard.

mapcars an hour ago

beej71 2 hours ago

I dunno. People do this exact thing by hand (digest everything they've read and produce something indirectly derivative--what author has not been so-influenced?) and it's not a copyright violation. It's just as impossible to dig around in a model to find Hamlet as it is to do digging around a human brain. And if the result is an obvious copy, then you have a violation no matter how it was created.

As someone who thinks humanity would be better off without LLMs, I want the assertion to be true, but I don't think it is.

cheschire 2 hours ago

The author acknowledges this by saying “at a bigger scale”, implying there are smaller scale methods such as what you have said.

swader999 2 hours ago

On one hand, there's nothing new under the sun. On the other, these llms are just copies of us and they owe the collective some due. The trajectory right now has money, power, control, policy and even free will going to a very small needle point of humanity. It's not aligned with humanity flourishing, it only makes sense if the goal is to replace the humans.

codexb an hour ago

All innovation is theft. It builds directly on top of what came before.

"Good artists copy, great artists steal."

It's always been true. AI just makes it available to more people faster.

rigonkulous 2 hours ago

AI is human knowledge at scale, wanting to be free.

We built it, because we as humans intrinsically know that information should be free - always - and AI is a way to accomplish this, finally.

Extrinsically, we also have a subset of humans who do not want information to be free, because they desire to profit from the divide between free/non-free information.

I have been thinking a lot about Aaron Schwartz lately, and how un-just it is that he was persecuted for doing something that is so commonplace now, it is practically expected behaviour in the AI/ML realms. If he hadn't been targetted for elimination, I wonder just how well his ethos would have perpetuated into the AI age ..

vb-8448 an hour ago

> We built it, because we as humans intrinsically know that information should be free

I don't know if this statement is more stupid or naive ..

rigonkulous an hour ago

I could say the same of your position, honestly. Stupid, naive - or maybe just plain ignorant.

If humans didn't want information to be free, there wouldn't be so much free information.

Or did you not notice?

vb-8448 37 minutes ago

throwatdem12311 an hour ago

Current crop of AI is not free in the slightest. Open weight models are not free as in liberty and neither is the training data.

pjc50 2 hours ago

s/free/owned by a billion dollar megacorp/

(AI output is very much not free in the resource consumption sense!)

rigonkulous 2 hours ago

Most resources are free until some company comes along and puts its brand on them.

(Disclaimer: I only use free AI and will never pay for it. I think there is a growing segment of folks who agree with this sentiment, also ..)

thedevilslawyer 2 hours ago

I agree with this sentiment. But as a community, this is hated because it impacts people's wages.

It's the negative short term outlook of something that may be positive long term

konmok an hour ago

Sure, it could be positive in some distant future utopia.

But the short-term impacts here and now are really, really bad. People are getting hurt (through water consumption, vibe-coded security disasters, IP theft, data center pollution, loss of job security and therefore healthcare in the US, LLM psychosis, inability to find reliable information, etc.) We're not actually obligated to sacrifice these people on the altar of "progress". We can slow down! When our society is capable of even somewhat protecting us from these harms, then maybe I'll stop being an LLM hater.

rigonkulous an hour ago

short_sells_poo 2 hours ago

It's not hated because it impacts people's wages, although that perhaps factors into the hate. It's hated because AI is not a public good. The LLMS today are owned by megacorporations who harvested a public good for private gain.

This is not some altruistic entity striving for the betterment of humankind. Practically nothing that comes out of the techbro culture is. This is pure and simple greed and the chances that AI can be a vehicle of altruism when it is owned by megacorps is basically zero.

Findecanor an hour ago

What a naive and simplistic view.

People want to be recognised for their contributions to society. People want to be treated fairly. Most scientific articles, as well as all text on the free web is already free information. It used to be difficult to search, categorise and summarise that information. There exist AI tools for that — and that is the good AI.

What also exists now are automated plagiarism and mash-up tools: that can take someone's article, change the words and churn out a new article that people can put their name on. There are scumbags that sell services for exactly that. And there are big tech firms that are operating in a very grey area.

Aaron Schwartz had broken a paywall. He did not anonymise the article authors.

You, and AI-bros like you remind me of one the people behind Pirate Bay when I argued with him back in the '90s, who used that same "information wants to be free" to justify software piracy.

rigonkulous an hour ago

There is far more free information than non-free information, and it has always been so - or else we wouldn't be here in the first place.

>Aaron Schwartz had broken a paywall. He did not anonymise the article authors.

AI bro's are doing this now, every second of the day.

And, without software piracy, we simply wouldn't have the technology we have today. Knowledge-gatekeeping profit-seekers would very much like for most of us to ignore this fact: there is far more free information in the world than non-free information, and it must be so, well into the future, if we are to survive as a species.

It doesn't matter what authority believes they have the right to gatekeep information. It will always escape their grip. Some of us are ideologically aligned with this mechanism, promote it, and ensure it happens. Thank FNORD.

kolinko an hour ago

Years ago i published slides on Slideshare that were viewed almost two million times. And helped me build a business.

There were people that learned knowledge from myself, and then made their own tutorials and promote these. It hadn't crossed my mind to complain about that. AI changes very little here.

What really changes things is not people republishing my materials, but people using agents to read my materials, and to get knowledge reformatted into something that they like.

If my slides were published today, they would probably be read verbatim by a handful of humans. The rest would be agents, but I'm ok with that. The business case is the same -- I want whatever reads the slide to be encouraged to use my tool. What kind of entity, I don't really care (again: from purely business perspective)

gagan2020 26 minutes ago

How any content came into existence? Learning, Experience, connection, etc right? If AI is doing that then what's the problem? Printing Press was also disturbing status-quo of its time. Any frontier technologies at their time did that. Be it Fire, Wheel, Horse, Horse Saddle, Gun, Printing Press, Nuclear war heads, Computers, Internet, AI, etc.

Don't make it ethical question but understand its new frontier for humans.

noobermin an hour ago

At this point, I think google, openai, anthropic, etc already realise this and are just trying to pretend this isn't true. I even think some C-suite who are not in AI companies but are boosters know this too. This has been true since 2022 but they're hoping (likely correctly) that governments won't move fast enough to protect the IP of the actual productive class.

I think the long term reality is that the models still need training data so they fundamentally do need new writing/code/art to train on, and even then the usual issues like hallucination will still be with us. It's just the moment that actually hurts the (already questionable) profitability of the model peddlers, they will have gotten their IPOs and they can safely jump ship and the ultimate mess can be passed to the softbanks, the temaseks, and the governments of the world to clean up for them. What the future holds after the crash I'm not sure as the models won't disappear (especially now that the stolen data is already crystalised in open source models) but in the near term the mass theft that constitutes llms will become more and more understood even amongst the PMC and that in order to remain viable, you need the productive to keep producing, and unlike LLMs, you can't force them to do it without payment.

Hacker News

by Ryan Harman

AI is just unauthorised plagiarism at a bigger scale (axelk.ee)

dvduval 2 hours ago [-]

Ensorceled 2 hours ago [-]

fiedzia 36 minutes ago [-]

bolangi 29 minutes ago [-]

motbus3 2 hours ago [-]

kibwen an hour ago [-]

hajile 38 minutes ago [-]

telotortium 39 minutes ago [-]

gabbagool 6 minutes ago [-]

spacechild1 an hour ago [-]

aaarrm an hour ago [-]

elorant 43 minutes ago [-]

matt_heimer an hour ago [-]

trinari an hour ago [-]

account42 29 minutes ago [-]

MontgomeryPy an hour ago [-]

wolttam an hour ago [-]

microtonal an hour ago [-]

odo1242 an hour ago [-]

chii an hour ago [-]

internet2000 42 minutes ago [-]

deaton 2 hours ago [-]

falcor84 13 minutes ago [-]

fisheuler 37 minutes ago [-]

pluc 2 hours ago [-]

tancop an hour ago [-]

kibwen 41 minutes ago [-]

Salgat 34 minutes ago [-]

rkozik1989 35 minutes ago [-]

kube-system an hour ago [-]

deaton an hour ago [-]

StableAlkyne 23 minutes ago [-]

Terr_ a minute ago [-]

marssaxman 25 minutes ago [-]

dmitrygr 14 minutes ago [-]

foobar1726 41 minutes ago [-]

bachmeier 23 minutes ago [-]

koonsolo 4 minutes ago [-]

deaton 36 minutes ago [-]

nehal3m 41 minutes ago [-]

Jtarii 22 minutes ago [-]

enraged_camel 38 minutes ago [-]

Jtarii 24 minutes ago [-]

vaylian 9 minutes ago [-]

caconym_ an hour ago [-]

_aavaa_ an hour ago [-]

caconym_ 41 minutes ago [-]

nearbuy 4 minutes ago [-]

nashashmi an hour ago [-]

caconym_ 32 minutes ago [-]

gagan2020 22 minutes ago [-]

Bombthecat an hour ago [-]

hectdev an hour ago [-]

ses1984 an hour ago [-]

account42 19 minutes ago [-]

gspr an hour ago [-]

groundzeros2015 an hour ago [-]

gspr an hour ago [-]

beering 35 minutes ago [-]

0rganize an hour ago [-]

gspr an hour ago [-]

DharmaPolice 29 minutes ago [-]

runarberg 42 minutes ago [-]

storus an hour ago [-]

lbrito 43 minutes ago [-]

mplanchard an hour ago [-]

Vvector 13 minutes ago [-]

samatman 14 minutes ago [-]

SoftTalker 41 minutes ago [-]

Salgat 38 minutes ago [-]

rkozik1989 37 minutes ago [-]

TheOtherHobbes 37 minutes ago [-]

underlipton 24 minutes ago [-]

pluc 2 hours ago [-]

exploderate 6 minutes ago [-]

skrebbel an hour ago [-]

mikestew an hour ago [-]

dvduval 2 hours ago

Ensorceled 2 hours ago

fiedzia 36 minutes ago

bolangi 29 minutes ago

motbus3 2 hours ago

kibwen an hour ago

hajile 38 minutes ago

telotortium 39 minutes ago

gabbagool 6 minutes ago

spacechild1 an hour ago

aaarrm an hour ago

elorant 43 minutes ago

matt_heimer an hour ago

trinari an hour ago

account42 29 minutes ago

MontgomeryPy an hour ago

wolttam an hour ago

microtonal an hour ago

odo1242 an hour ago

chii an hour ago

internet2000 42 minutes ago

deaton 2 hours ago

falcor84 13 minutes ago

fisheuler 37 minutes ago

pluc 2 hours ago

tancop an hour ago

kibwen 41 minutes ago

Salgat 34 minutes ago

rkozik1989 35 minutes ago

kube-system an hour ago

deaton an hour ago

StableAlkyne 23 minutes ago

Terr_ a minute ago

marssaxman 25 minutes ago

dmitrygr 14 minutes ago

foobar1726 41 minutes ago

bachmeier 23 minutes ago

koonsolo 4 minutes ago

deaton 36 minutes ago

nehal3m 41 minutes ago

Jtarii 22 minutes ago

enraged_camel 38 minutes ago

Jtarii 24 minutes ago

vaylian 9 minutes ago

caconym_ an hour ago

_aavaa_ an hour ago

caconym_ 41 minutes ago

nearbuy 4 minutes ago

nashashmi an hour ago

caconym_ 32 minutes ago

gagan2020 22 minutes ago

Bombthecat an hour ago

hectdev an hour ago

ses1984 an hour ago

account42 19 minutes ago

gspr an hour ago

groundzeros2015 an hour ago

gspr an hour ago

beering 35 minutes ago

0rganize an hour ago

gspr an hour ago

DharmaPolice 29 minutes ago

runarberg 42 minutes ago

storus an hour ago

lbrito 43 minutes ago

mplanchard an hour ago

Vvector 13 minutes ago

samatman 14 minutes ago

SoftTalker 41 minutes ago

Salgat 38 minutes ago

rkozik1989 37 minutes ago

TheOtherHobbes 37 minutes ago

underlipton 24 minutes ago

pluc 2 hours ago

exploderate 6 minutes ago

skrebbel an hour ago

mikestew an hour ago

CivBase an hour ago

stronglikedan an hour ago

MontyCarloHall an hour ago