YouTube as Storage (github.com)

148 points by saswatms 13 hours ago

repeekad 13 hours ago

I once asked one of the original YouTube infra engineers “will you ever need to delete the long tail of videos no one watches”

They said it didn’t matter, because the sheer volume of new data flowing in growing so fast made the old data just a drop in the bucket

arjie 12 hours ago

Kwpolska 11 hours ago

Of course videos disappear for copyright, ToS violations, or when the uploaders remove them. They do not disappear just because nobody watched them.

Gigachad 9 hours ago

leephillips 7 hours ago

MagicMoonlight 6 hours ago

Now that they can harvest it all for AI training, that decision was the cheapest and greatest thing they ever did.

Imagine trying to pay for all that content, nobody on earth would be able or willing to supply it.

wasmainiac 12 hours ago

I wonder if that still holds true? The volume of videos increases exponentially especially with AI slop, I wonder if at some point they will have to limit the storage per user, with a paid model if you surpass that limit. Many people who upload many videos I guess some form of income off YouTube so it wouldn’t that be that big of a deal.

weird-eye-issue 12 hours ago

What they said only holds true because the growth continues so that the old volume of videos doesn't matter as much since there's so many more new ones each year compared to the previous year. So the question is more about whether or not it will hold true in the long term, not today

raincole 10 hours ago

pogue 12 hours ago

I assume it's an economics issue. As long as they continue making money off the uploads to a higher extent than it costs for storage, it works out for them.

throw_await 12 hours ago

ranger_danger 12 hours ago

I wonder if anyone has ever compiled a list of channels with abnormally large numbers of videos? For example this guy has over 14,000:

https://www.youtube.com/@lylehsaxon

HeliumHydride 12 hours ago

jl6 9 hours ago

One day, it will matter. Not even Google can escape the consequences of infinite growth. Kryder's Law is over. We cannot rely on storage getting cheaper faster than we can fill it, and orgs cannot rely on being able to extract more value from data than it costs to store it. Every other org knows this already. The only difference with Google is that they have used their ad cash generator to postpone their reality check moment.

One day, somebody is going to be tasked with deciding what gets deleted. It won't be pretty. Old and unloved video will fade into JPEG noise as the compression ratio gets progressively cranked, until all that remains is a textual prompt designed to feed an AI model that can regenerate a facsimile of the original.

asah 9 hours ago

You can see how Google rolls with how they deleted old Gmail accounts - years of notice, lots of warnings, etc. They finally started deletions recently, and I haven't heard a whimper from anyone (yet).

flux3125 8 hours ago

1313ed01 5 hours ago

dyauspitr 7 hours ago

It depends. At the rough 2 PB of new data they get a day that’s about 10 sq ft of physical rack space per day. Each data center is like 500,000 sq feet so each data center can hold 120 years of YouTube uploads. They’re not going to have to restrict uploads anytime soon.

semitones 6 hours ago

ntoskrnl_exe 9 hours ago

Wouldn't it also be a performance nightmare?

The energy bill for scanning through the terabytes of metadata would be comparable to that of several months of AI training, not to mention the time it would take. Then deleting a few million random 360p videos and putting MrBeast in their place would result in insane fragmentation of the new files.

It might really just be cheaper to keep buying new HDDs.

dev1ycan 9 hours ago

This is why they removed searching for older videos (specific time) and why their search pushes certain algorithmic videos, other older videos when found by direct link are on long term storage and take a while to start loading.

joecool1029 9 hours ago

eMPee584 6 hours ago

stogot 9 hours ago

S3 allows delete and is efficient here. I’m sure Google can figure it out

They allow search by timestamp, I’m sure YouTube can write algo to find zero <=1 view

moffkalast 9 hours ago

Besides with their search deteriorating to the point where a direct video title doesn't result in a match, nobody can see those videos anyway and they don't have to cache them.

sfn42 9 hours ago

Smalltalker-80 10 hours ago

Thechnically cool, but ToS state: "Misuse of Service Restrictions - Purpose Restriction: The Service is intended for video viewing and sharing, not as a general-purpose, cloud-based file storage service." So they can rightfully delete your files.

ilaksh 10 hours ago

Its interesting that this exact use case is already covered in their ToS. I wonder when the first YouTube as storage project came out, and how many there have been over the years.

Valkryst 9 hours ago

At-least as far back as 2017 when I wrote Schillsaver: https://github.com/Valkryst/Schillsaver

None of us, in the original discussion threads, knew of it being done before then IIRC.

kingstnap an hour ago

The idea of exploiting someone else's server to store files is incredibly old.

https://en.wikipedia.org/wiki/GMail_Drive

When Google launched Gmail (2004) with a huge 1GB storage quota, Richard Jones released GMailFS to mount a Gmail account as a standard block device.

altmanaltman an hour ago

I mean, it is pretty likely they figured out it could be a pretty obvious possible misuse before anyone actually started doing it.

j-bos 12 hours ago

This ia really cool but also feels like a potential burden on the commons,

vasco 12 hours ago

That great commons that are the multi trillion dollar corporations that could buy multiple countries? They sure worry about the commons when launching another datacenter to optimize ads.

asah 9 hours ago

no the "commons" in this case is the fundamental free-ness of YT - if abused then any corporations will have to shut it down...

OTOH I'm 100.0% sure that google has a plan, been expecting this for years and in particular, has prior experience from free Gmail accounts being used for storage.

justinclift 8 hours ago

agnishom 11 hours ago

You are right, but YouTube is also a massive repository of human cultural expression, whose true value is much more than the economic value it brings to Google.

anjel 10 hours ago

komali2 11 hours ago

cheonn638 11 hours ago

> That great commons that are the multi trillion dollar corporations that could buy multiple countries?

Exactly which countries could they buy?

Let me guess: you haven’t actually asked gemini

cheschire 11 hours ago

gregoryl 11 hours ago

RobotToaster 10 hours ago

russfrank 11 hours ago

ninjagoo an hour ago

Interestingly, this is a specific implementation of a more general idea - leverage social media to store encrypted content, that requires decoding through a trusted app to surface the actual content.

AI tools can use this as a messaging service with deniability. Pretty sure humans already use it in this way. In the past, classifieds in newspapers were a similar messaging service with deniability.

thrdbndndn 12 hours ago

I don't get how it works.

> Encoding: Files are chunked, encoded with fountain codes, and embedded into video frames

Wouldn't YouTube just compress/re-encode your video and ruin your data (assuming you want bit-by-bit accurate recovery)?

If you have some redundancy to counter this, wouldn't it be super inefficient?

(Admittedly, I've never heard of "fountain codes", which is probably crucial to understanding how it works.)

brandonli28 3 hours ago

Hey there, Brandon here (developer). I've uploaded an explanation video here, which might be useful to watch :D

https://youtu.be/l03Os5uwWmk?si=nJDwz4s7_E4WFOwC

Jaxan 12 hours ago

Yes it is inefficient. But youtube pays the storage ;-). (There is probably a limit on free accounts, and it is probably not allowed by the TOS.)

genidoi 11 hours ago

Right, you just pay daily in worrying when, not if, youtube will terminate your account and delete your "videos".

madmads 11 hours ago

sdenton4 6 hours ago

Yeah, I would assume that transcodes kill this eventually...

zokier 12 hours ago

Also, how to get your google account banned for abuse.

newqer 12 hours ago

Just make sure you have you have a bot network storing the information in with multiple accounts. Also with with enough parity bits (E.g. PAR2) to recover broken vids or removed accounts.

compsciphd 11 hours ago

par2 is very limited.

It only support 32k parts in total (or in reality that means in practice 16k parts of source and 16k parts of parity).

Lets take 100GB of data (relatively large, but within realm of reason of what someone might want to protect), that means each part will be ~6MB in size. But you're thinking you also created 100GB of parity data (6MB*16384 parity parts) so you're well protected. You're wrong.

Now lets say one has 20000 random bit error over that 100GB. Not a lot of errors, but guess what, par will not be able to protect you (assuming those 20000 errors are spread over > 16384 blocks it precalculated in the source). so at the simplest level , 20KB of errors can be unrecoverable.

par2 was created for usenet when a) the size of binaries being posted wasn't so large b) the size of article parts being posted wasn't so large c) the error model they were trying to protect was whole articles not coming through or equivalently having errors. In the olden days of usenet binary posting you would see many "part repost requests", that basically disappeared with par (then quickly par2) introduction. It fails badly with many other error models.

e145bc455f1 11 hours ago

catlikesshrimp 6 hours ago

wellf 11 hours ago

Or.... backblaze B2

metroholografix 35 minutes ago

willis936 10 hours ago

encom 9 hours ago

ranger_danger 2 hours ago

There are already channels with millions of AI-generated videos on them.

pcthrowaway an hour ago

Brilliant, but I hope it doesn't hasten Youtube's use of AI to "enhance" videos automatically: https://news.ycombinator.com/item?id=46169554

polotics 12 hours ago

Wot no steganography? Come on pretty please with an invisible cherry on top! :-) Here to get you started: https://link.springer.com/article/10.1007/s11042-023-14844-w

zahlman 3 hours ago

That's harder to sneak through video compression artifacts.

blackhaz 12 hours ago

Has anyone got an example how such a video looks like? Really curious. Reminds me of the Soviet Arvid card that could store 2 GB on an E-180 VHS tape.

https://en.wikipedia.org/wiki/ArVid

equinumerous 5 hours ago

Mostly just noise. This is an example data video from the creator: https://www.youtube.com/watch?v=tIRXaQWjiA8

(YouTube video for this project: https://www.youtube.com/watch?v=l03Os5uwWmk)

esskay 2 hours ago

I imagine something like Reddit might make for better storage than this. It'd be pretty trivial to set up a few accounts with private subs too just store encrypted text based data. Not fast or anything but surely easier to work with.

xnx 10 hours ago

An idea as old as YouTube. Here's on implementation: https://github.com/therealOri/qStore

predkambrij 9 hours ago

brandonli28 3 hours ago

Hey there, Brandon here (developer). I've uploaded an explanation video here for anyone that's interested, which might be useful to watch :D

https://youtu.be/l03Os5uwWmk?si=nJDwz4s7_E4WFOwC

madduci 13 hours ago

Love this project, although I would never personally trust YT as Storage, since they can delete your channel/files whenever they want

rzzzt 11 hours ago

Upload to other video sharing sites for redundancy. RAIVS!

iberator 10 hours ago

Stop ruining the internet end exploiting free resources

rzzzt 9 hours ago

qwertox 12 hours ago

The explainer video on the page [0] is a pretty nice explanation for people who don't really know what video compression is about.

[0] https://www.youtube.com/watch?v=l03Os5uwWmk

KellyCriterion 8 hours ago

I can remember the years when YouTube was used by Contentdistributors by uploading high quality material protected with a password :-D

shevy-java 7 hours ago

Interesting idea. But I actually think we need to overcome Google. Google has become such a huge problem in so many domains. There need to be laws for the people; Google controls way too much now. YouTube should become a standalone company.

ranger_danger 12 hours ago

Other examples of so-called "parasitic storage": https://dpaste.com/DREQLAJ2V.txt

nunobrito 6 hours ago

What kind of storage level can be expected from this method for 10 minutes of video?

nubinetwork 9 hours ago

How do you manage to get youtube to not re-encode the video, trashing the data?

neals 9 hours ago

Flashing a bunch of qr codes should do it

the_dude_ 12 hours ago

reminds me of gmail fs, https://en.wikipedia.org/wiki/GMail_Drive very interesting project explanation video on youtube

j45 6 hours ago

This is a digital version of a cassette tape to load and save data, love it!

https://www.tapeheads.net/threads/storing-data-on-your-analo...

andrewstuart 12 hours ago

How does it survive YouTube transcoding.

finalhacker 12 hours ago

after compression, all data lost.

sneak 13 hours ago

Something at this link crashes both MobileSafari and iOS Firefox on my device.

Hamuko 12 hours ago

The GitHub link? Works fine in Safari on my M4 iPad Pro.

sneak 3 hours ago

Yup. Even after a device reboot at that time, too. Still doing it a half day later. Odd.