Node.js needs a virtual file system (blog.platformatic.dev)
136 points by voctor 5 hours ago
indutny 4 hours ago
Taking the question of whether this would be a useful addition to Node.js core or aside, it must be noted that this 19k LoC PR was mostly generated by Claude Code and manually reviewed by the submitter which in my opinion is against the spirit of the project and directly violates the terms of Developer's Certificate of Origin set in the project's CONTRIBUTING.md
syrusakbary an hour ago
Fully disagree with this take. Not allowing AI assistance on PRs will likely decimate the project in the future, as it will not allow fast iteration speeds compared to other alternatives.
Note aside, OpenJS executive director mentioned it's ok to use AI assistance on Node.js contributions:
I checked with legal and the foundation is fine with the DCO on AI-assisted contributions. We’ll work on getting this documented.
[1]: https://github.com/nodejs/node/pull/61478#issuecomment-40772...indutny 39 minutes ago
I appreciate hearing your point of view on this. In my opinion the future of Open Source and AI assisted coding is a much bigger issue, and different people have different levels of confidence in both positive and negative outcomes of LLM impact on our industry.
It is great to have a legal perspective on compliance of LLM generated code with DCO terms, and I feel safer knowing that at least it doesn't expose Node.js to legal risk. However it doesn't address the well known unresolved ethical concerns over the sourcing of the code produced by LLM tooling.
szmarczak 16 minutes ago
> Not allowing AI assistance on PRs will likely decimate the project in the future, as it will not allow fast iteration speeds compared to other alternatives.
It's not an AI issue. Node.js itself is lots of legacy code and many projects depend on that code. When Deno and Bun were in early development, AI wasn't involved.
Yes, you can speed up the development a bit but it will never reach the quality of newer runtimes.
It's like comparing C to C++. Those languages are from different eras (relatively to each other).
mixologic 3 hours ago
Worth noting that mcollina is a member of the Node.js Technical Steering Committee
everlier 2 hours ago
We call it a slip slop at work, it's ok to slip some slop if it's "our" slop :-)
digikata 3 hours ago
Large PRs could follow the practices that the Linux kernel dev lists follow. Sometimes large subsystem changes could be carried separately for a while by the submitter for testing and maintenance before being accepted in theory, reviewed, and if ready, then merged.
While the large code changes were maintained, they were often split up into a set of semantically meaningful commits for purposes of review and maintenance.
With AI blowing up the line counts on PRs, it's a skill set that more developers need to mature. It's good for their own review to take the mass changes, ask themselves how would they want to systematically review it in parts, then split the PR up into meaningful commits: e.g. interfaces, docs, subsets of changed implementations, etc.
dakiol 2 hours ago
Nobody wants to review AI-generated code (unless we are paid for doing so). Open source is fun, that's why people do it for free... adding AI to the mix is just insulting to some, and boring to others.
Like, why on earth would I spent hours reviewing your PR that you/Claude took 5 minutes to write? I couldn't care less if it improves (best case scenario) my open source codebase, I simply don't enjoy the imbalance.
goalieca 3 hours ago
> With AI blowing up the line counts on PRs,
Well, the process you’re describing is mature and intentionally slows things down. The LLM push has almost the opposite philosophy. Everyone talks about going faster and no one believes it is about higher quality.
digikata 3 hours ago
tracker1 an hour ago
dotancohen 2 hours ago
epolanski 4 hours ago
Do as I say, not as I do.
On a more serious note, I think that this will be thoroughly reviewed before it gets merged and Node has an entire security team that overviews these.
indutny 3 hours ago
As someone who was a part of the aforementioned security team I'm not sure I'd be interested in reviewing such volume of machine generated code, expecting trap at every corner. The implicit assumption that I observed at many OSS projects I've been involved with is that first time contributions are rarely accepted if they are too large in volume, and "core contributor" designation exists to signal "I put effort into this code, stand by it, and respect everyone's time in reviewing it". The PR in the post violates this social contract.
epolanski 3 hours ago
athorax 3 hours ago
How exactly does it violate the Developer's Certificate of Origin clause?
indutny 3 hours ago
The submitted code must adhere to either of (a), (b), (c), and separately a (d) clause of: https://github.com/nodejs/node/blob/main/CONTRIBUTING.md#dev...
If submitter picks (a) they assert that they wrote the code themselves and have right to submit it under project's license. If (b) the code was taken from another place with clear license terms compatible with the project's license. If (c) contribution was written by someone else who asserted (a) or (b) and is submitted without changes.
Since LLM generated output is based on public code, but lacks attribution and the license of the original it is not possible to pick (b). (a) and (c) cannot be picked based on the submitter disclaimer in the PR body.
athorax an hour ago
Dylan16807 14 minutes ago
benatkin 24 minutes ago
charcircuit 3 hours ago
wccrawford 5 hours ago
I'm not convinced that allowing Node to import "code generated at runtime" is actually a good thing. I think it should have to go through the hoops to get loaded, for security reasons.
I like the idea of it mocking the file system for tests, but I feel like that should probably be part of the test suite, not Node.
The example towards the end that stores data in a sqlite provider and then saves it as a JSON file is mind-boggling to me. Especially for a system that's supposed to be about not saving to the disk. Perhaps it's just a bad example, but I'm really trying to figure out how this isn't just adding complexity.
Normal_gaussian 3 hours ago
node -e "new Function('console.log(\"hi\")')()"
or more to the point node -e "fetch('https://unpkg.com/cowsay/build/cowsay.umd.js').then((r) => r.text()).then(c => new Function(c + 'console.log(exports.say({ text: \"like this\"}))')())"
that one is particularly bad, because umd messes with the global object - so this works node -e "fetch('https://unpkg.com/cowsay/build/cowsay.umd.js').then((r) => r.text()).then(c => new Function(c)()).then(() => console.log(exports.say({ text: 'oh no'})))"phendrenad2 2 hours ago
Well there you have it.
I had to laugh, because the post you're replying to STRONGLY reminds me of this story, https://news.ycombinator.com/item?id=31778490 , in which some people on the GNOME project objected to thumbnails in the file-open dialog box because it might be a "Security issue" (even though thumbnails were available in the normal file browser, something those commenters probably should have known about, but didn't, but they just had to chime in anyway).
TheRealPomax 4 hours ago
But then you go "hang on, doesn't ESM exist?" and you realize that argument 4 isn't even true. You can literally do what this argument says you can't, by creating a blob instead of "writing a temp file" and then importing that using the same dynamic import we've had available since <checks his watch> 2020.
dfabulich 3 hours ago
A virtual filesystem makes it possible for the ESM you import to statically import other files in the virtual filesystem, which isn't possible by just dynamically importing a blob. Anything your blob module imports has to be updated to dynamically import its dependencies via blobs.
notnullorvoid 4 hours ago
There's also a module expression proposal, that would remove the need to use blob imports.
lacoolj 2 hours ago
Using Claude for code you use yourself or at your own company internally is one thing, but when you start injecting it into widely-shared projects like this (or, the linux kernel, or Debian, etc) there will always be a lingering feeling of the project being tainted.
Just my opinion, probably not a popular one. But I will be avoiding an upgrade to Node.js after 24.14 for a while if this is becoming an acceptable precedent.
PaulHoule 5 hours ago
Would be nice if node packages could be packed up in ZIP files so to avoid the security/metadata tax for small file access on Windows.
MarleTangible 5 hours ago
The number of files in the node modules folder is crazy, any amount of organization that can tame that chaos is welcomed.
koolba 4 hours ago
And if you thought malware hiding in a mess of files was bad, just wait till you see it in two layers of container files.
PaulHoule 4 hours ago
Dangeranger 4 hours ago
There are alternative package managers like Yarn that use zip files as a way to store each Node package.[0]
chrisweekly 3 hours ago
Strong recommendation to use PNPM instead of yarn or npm. IME (webdev since 1998) it's the only sane tool for stewardship of an npm dependency graph.
See https://pnpm.io/motivation
Also, while popularity isn't necessarily a great indicator of quality, a quick comparison shows that the community has decided on pnpm:
Normal_gaussian 2 hours ago
PaulHoule 3 hours ago
... and of course JAR files in Java are just ZIP files with a little extra metadata and the JVM can unpack them in realtime just fine.
buttsack an hour ago
When npm decided to have per-project node_modules (rather than shared like ruby and others) and human readable configs and library files I think the goal was to be a developer friendly and highly configurable, which it is. And package.json became a lot more than that as a result, it’s been a great system IMO.
Combined with a hackable IDE like Atom (Pulsar) made with the same tech it’s a pretty great dev exp for web devs
fmorel 5 hours ago
I remember when Firefox started putting everything into jars for similar reasons.
https://web.archive.org/web/20161003115800/https://blog.mozi...
zadikian 3 hours ago
Would accessing deps directly from a zip really be faster? I'd be a little surprised but not terribly, given that it's readonly on an fs designed for RW. If not, maybe just tar?
pie_flavor 6 minutes ago
You just cat the exe with the zip file, then it is all loaded into memory at the same time on process init. This is how e.g. LÖVE does game code packaging. (It can't be tar, because this trick only works because the PKZIP descriptor is at the end of the file.)
pverheggen 2 hours ago
You can always use virtualized Linux to avoid the NTFS penalty (WSL2, VS Code dev containers, etc.)
hrmtst93837 2 hours ago
Moving your whole workflow into WSL or nested containers just to dodge NTFS is a band-aid. Then you get flaky file watchers, odd perms, and a dev setup that feels like a workaround piled on top of another workaround. A fast Node VFS would remove a lot of this nonsense.
pverheggen an hour ago
MBCook 4 hours ago
It’s insane to me that node works how it does. Zip files make so much more sense, I really liked that about Yarn.
sheept 4 hours ago
Would it work to run a bundler over your code, so all (static) imports are inlined and tree shaken?
butz 21 minutes ago
How about trying to reduce dependencies? 11ty is going in correct direction, dropping significant chunk of various dependencies or replacing them with packages with no dependencies or using platform features, that becomes readily available.
mg 4 hours ago
You can’t import or require() a module
that only exists in memory.
You can convert it into a data url and import that, can't you?afavour 3 hours ago
What happens to relative imports?
doctorpangloss 4 hours ago
Yeah but Claude didn't suggest that when it wrote this blog post and did all the work so...
syrusakbary an hour ago
Funnily enough, we just released Edge.js, which uses Wasmer under the hood for sandboxing Node.js apps.
With it, you have a virtual fs automatically, just by using the `node:fs` package (or any other filesystem calls!)
We wrote about this in depth here: https://wasmer.io/posts/edgejs-safe-nodejs-using-wasm-sandbo...
szmarczak 9 minutes ago
HN comments isn't a place to advertise your product.
austin-cheney 5 hours ago
Most of the 4 justifications mentioned sound like mitigations of otherwise bad design decisions. JavaScript in the browser went down this path for the longest time where new standards were introduced only to solve for stupid people instead of actually introducing new capabilities that were otherwise unachievable.
I do see some original benefits to a VFS though, bad application decisions aside, but they are exceedingly minor.
As an aside I think JavaScript would benefit from an in-memory database. This would be more of language enhancement than a Node.js enhancement. Imagine the extended application capabilities of an object/array store native to the language that takes queries using JS logic to return one or more objects/records. No SQL language and no third party databases for stuff that you don't want to keep in offline storage on a disk.
iainmerrick 3 hours ago
Why would you want a language enhancement for that, rather than just writing it in JS code? (or perhaps WASM)
dotancohen 2 hours ago
> I think JavaScript would benefit from an in-memory database.
That database would probably look a lot like a JSON object. What are you suggesting, that a global JSON object does not solve?austin-cheney 2 hours ago
Whether it is an object, array, something else, or a combination thereof is a design decision. It is not so much about the design of the structure, which should be determined by execution performance considerations, but how information is added, removed and retrieved. Gathering one or more records from a JSON object, or array index, by value of some child property somewhere in a descendant structure of the instance index always feels like a one-off based upon the shape of the data. That could just be a query which is more elegant to read and yet still achieves superior execution performance compared to a bunch of nested loops or string of function array methods.
The more structures you have in a given application and the larger those structures become in their schemas the more valuable a uniform storage and retrieval solution becomes.
curtisblaine an hour ago
sorted maps with log(n) access.
duped 2 hours ago
> As an aside I think JavaScript would benefit from an in-memory database.
isn't that just global state, or do you mean you want that to be persistent?
gnarbarian 25 minutes ago
one of the reasons I prefer deno is the availability of indexeddb (and all the other great stuff that comes with it out of the box)
sidewndr46 an hour ago
Don't all projects eventually grow to encompass service discovery?
torginus an hour ago
Why do people keep reinventing OS features?
There's Docker, OverlayFS, FUSE, ZFS or Btrfs snapshots?
Do you not trust your OS to do this correctly, or do you think you can do better?
A lot of this stuff existed 5, 10, 15 years ago...
Somehow there's been a trend for every effing program to grow and absorb the features and responsibilities of every other program.
Actually, I have a brilliant idea, what if we used nodejs, and added html display capabilities, and browser features? After all Cursor has already proven you can vibecode a browser, why not just do it?
I'm just tired at this point
williamstein an hour ago
This exact thing solves a huge problem with SEA binaries as he points out in his post. You can include complicated assets easily and skip an ugly unpack step entirely. This is very useful.
ryandrake an hour ago
One of the worst is media players that all insist on grafting their own "library" on top of my already-working OS filesystem. So I can't just run the media player and play files. No, that would be too simple. I have to first "import" my media into a "library" abstraction and then store that library somewhere else on my filesystem. Terrible!
mohsen1 3 hours ago
Yarn, pnpm, webpack all have solutions for this. Great to see this becoming a standard. I have a project that is severely handicapped due to FS. Running 13k tests takes 40 minutes where a virtual file system that Node would just work with it would cut the run time to 3 minutes. I experimented with some hacks and decided to stay with slow but native FS solution.
What I really want is a way of swapping FS with VFS in a Node.js program harness. Something like
node --use-vfs --vfs-cache=BIG_JSON_FILE
So basically Node never touches the disk and load everything from the memoryNormal_gaussian 3 hours ago
The way to do this today is to do it outside of node. Using an overlay fs with the overlay being a ramfs. You can even chroot into it if you can't scope the paths you need to be just downstream from some directory. Or, just use docker.
mohsen1 3 hours ago
making that work cross platform is pure pain
Normal_gaussian 3 hours ago
Normal_gaussian 4 hours ago
yarn pnp is currently broken on Node v25.7+;
- https://github.com/yarnpkg/berry/issues/7065
- https://github.com/nodejs/node/issues/62012
This is because yarn patches fs in order to introduce virtual file path resolution of modules in the yarn cache (which are zips), which is quite brittle and was broken by a seemingly unrelated change in 25.7.
The discussion in issue 62012 is notable - it was suggested yarn just wait for vfs to land. This is interesting to me in two ways: firstly, the node team seems quite happy for non-trivial amounts of the ecosystem to just be broken, and suggests relying on what I'm assuming will be an experimental API when it does land; secondly, it implies a lot of confidence that this feature will land before LTS.
chrisweekly 3 hours ago
Strong rec to choose PNPM over yarn. I just posted this in a peer comment: https://news.ycombinator.com/item?id=47415173
Not spamming, not affiliated, just trying to help others avoid so much needless suffering.
Normal_gaussian 2 hours ago
This is quite spammy; you could mitigate it by explaining what you think the "needless suffering" is. Having been using npm, pnpm, and yarn for many years the only benefit I find with pnpm is a little bit of speed when using the cli, but not enough that I notice; I've outlined the major yarn benefit to me 'in a peer comment' (which I didn't realise was you when I answered) https://news.ycombinator.com/item?id=47415660
I expect yarn to have a real competitor sooner rather than later that will replace it; and I do wonder if it is this vfs module that will enable it.
zadikian 3 hours ago
I just use npm because I like to stay as vanilla as possible. Glad that alternatives exist though.
Normal_gaussian 2 hours ago
notnullorvoid 4 hours ago
I could see something like this being useful if it could be passed to workers to replace any fs access inside the worker.
gwbas1c 2 hours ago
Can you dynamically load code via eval?
(I know, I know, it's ugly and has its own set of problems)
ozlikethewizard 4 hours ago
I'm not convinced this needs to be in core Node, but being able to have serverless functions access a file system without providing storage would definitely have some use cases. Had some fun with video processing recently that this would be perfect for.
adzm 3 hours ago
How does electron do this with its packaged files? I suppose it does not work with module resolution?
verdverm an hour ago
Separate the valid critiques on other comments, Go's io.FS interface is really nice for making these sorts of things. Is there something like this in Node already? (with base implementations like host and in memory)
themafia an hour ago
> You can’t import or require() a module that only exists in memory.
Sure you can. Function() exists and require.cache exists. This is _intentionally_ exploitable.
bronlund 2 hours ago
Yeah. That’s what we need. More Node.
westurner 4 hours ago
Is node::vfs the new solution for JupyterLite filesystems?
From https://github.com/jupyterlite/jupyterlite/issues/949#issuec... :
> Ideally, the virtual filesystem of JupyterLite would be shared with the one from the virtual terminal.
emscripten-core/emscripten > "New File System Implementation": https://github.com/emscripten-core/emscripten/issues/15041#i... :
> [ BrowserFS, isomorphic-git/lightningfs, ]
pyodide/pyodide: "Native file system API" #738: https://github.com/pyodide/pyodide/issues/738 re: [Chrome,] Filesystem API :
> jupyterlab-git [should work with the same VFS as Jupyter kernels and Terminals]
pyodide/pyodide: "ENH Add API for mounting native file system" #2987: https://github.com/pyodide/pyodide/pull/2987
moralestapia 5 hours ago
>Let me be honest: a PR that size would normally take months of full-time work. This one happened because I built it with Claude Code.
The node.js codebase and standard library has a very high standard of quality, hope that doesn't get washed out by sloppy AI-generated code.
OTOH, Matteo is an excellent engineer and the community owes a lot to him. So I guess the code is solid :).
petcat 5 hours ago
Are people still building new projects on Node.js? I would have thought the ecosystem was moving to deno or bun now
dzogchen 5 hours ago
I don't really understand what the value proposition of Bun and Deno is. And I see huge problems with their governance and long-term sustainability.
Node.js on the other hand is not owned or controlled by one entity. It is not beholden to the whims of investors or a large corporation. I have contributed to Node.js in the past and I was really impressed by its rock-solid governance model and processes. I think this an under-appreciated feature when evaluating tech options.
packetlost 5 hours ago
Deno has some pretty nice unique features like sandboxing that, afaik, don't exist in other runtimes (yet). It's enough of a draw that it's the recommended runtime for projects like yt-dlp: https://github.com/yt-dlp/yt-dlp/issues/14404
worksonmine 4 hours ago
zamadatix 4 hours ago
If one gets nothing from them directly, they've at least been a good kick to get several features into Node. It's almost like neovim was to vim, perhaps to a lesser extent.
zadikian 3 hours ago
Note that Bun was recently acquired by Anthropic.
gavmor 3 hours ago
Faster, no transpilation, dev-ex sugar.
pier25 3 hours ago
I agree about the governance and long-term sustainability points but if you don't see any value in Bun or Deno is probably because (no offense) you are not paying attention.
jitl 5 hours ago
loud people on twitter are always switching to the new hotness. i personally can't see myself using bun until its reputation for segfaults goes away after a few more years of stabilizing. deno seems neat and has been around for longer, but its node compatibility story is still evolving; i'm also giving it another year before i try it.
_flux 4 hours ago
Wow, I thought you were exaggerating, but no: https://github.com/oven-sh/bun/issues?q=is%3Aissue%20state%3...
Open 80, closed 492.
petcat 2 hours ago
zadikian 3 hours ago
Yes people are using Node.js, most likely the majority.
rrr_oh_man 5 hours ago
Why?
kitsune1 5 hours ago
The delusion in this comment is insane.
pier25 3 hours ago
The Node team has lost the plot IMO.
By far the most critical issue is the over reliance on third party NPM packages for even fundamental needs like connecting to a database.
afavour 3 hours ago
What would a Node-native database connection layer look like? What other platforms have that?
Databases are third party tech, I don’t think it’s unreasonable to use a third party NPM module to connect to them.
mike_hearn 3 hours ago
Most obviously, Java has JDBC. I think .NET has an equivalent. Drivers are needed but they're often first party, coming directly from the DB vendor itself.
Java also has a JIT compiling JS engine that can be sandboxed and given a VFS:
https://www.graalvm.org/latest/security-guide/sandboxing/
N.B. there's a NodeJS compatible mode, but you can't use VFS+sandboxing and NodeJS compatibility together because the NodeJS mode actually uses the real NodeJS codebase, just swapping out V8. For combining it all together you'd want something like https://elide.dev which reimplemented some of the Node APIs on top of the JVM, so it's sandboxable and virtualizable.
LunaSea 3 hours ago
pier25 3 hours ago
Bun provides native MySQL, SQlite, and Postgres drivers.
I'm not saying Node should support every db in existence but the ones I listed are critical infrastructure at this point.
When using Postgres in Node you either rely on the old pg which pulls 13 dependencies[1] or postgres[2] which is much better and has zero deps but mostly depends on a single guy.
adzm 3 hours ago
ksherlock 3 hours ago
Perl has DBI. PHP has PDO.
Spivak 3 hours ago
beart 3 hours ago
Outside of sqlite, what runtimes natively include database drivers?
pier25 3 hours ago
Bun, .NET, PHP, Java