The Prompt API (developer.chrome.com)
156 points by gslin 10 hours ago
haberman 7 hours ago
This API seems perfect for an idea I've had for a while: a de-snarkifier for social media.
Social media can be intellectually stimulating and educational, but it's also easy to get sucked into ideological sniping and flamewars, even if you didn't go looking for it. The emotional and intellectual energy spent flaming strangers on the Internet is a complete waste of human capital.
With an API like this, I assume you could have a browser extension that could de-snarkify content before showing it to you. You could ask the LLM to preserve all factual content from the post, but to de-claw any aggressive or snarky language. If you really wanted to have fun, you could ask it to turn anything written in an aggressive tone into something that sounds absurd or incompetent, so that the more aggressive the post, the more it would make the author look silly.
This could have a double benefit. For the reader, it insulates them from the personal attacks of random strangers on the Internet. Don't get me wrong, there is a time and a place for real, charged arguments about important issues that affect us all. But there is little to be gained from having those fights with strangers; on the contrary, I think it poisons the body politic when strangers are screaming at each other.
For the writer, it takes away any incentive to be snarky or rude. If other people filter their content this way, there's no point in trying to be mean to them, and no "race to the bottom" for who can be more nasty.
whatarethembits 14 minutes ago
Kinda looking forward to something like this, as it has the potential to remove empty junk calories from the internet, hopefully leading to SIGNIFICANTLY less use of today's popular platforms.
My wish list:
- Eliminate ALL clickbait titles and ads. I only want to see a dry factual title.
- For any given topic, I only care about the main article (with the option to only see a summary, unless its a high quality blog) and couple of substantive comments, rest is junk I don't want to see.
The current state of popular social media sites has meant that I don't use it at all (except HN, which is trending in the same direction due to saturation with AI), but every other week or so I end up wasting a few hours, which I'd like to avoid entirely.
Ideally this would lead to 98% of content filtered/summarised out, and over time only use the internet for looking things up with intention. I want this to remove majority of "entertainment" value from the internet (by default) so that time/energy can be refocused in real life and high quality sources (books) only.
seanhunter 5 minutes ago
I actually have built myself a personal AI agent that does this for nthe main news headlines and for a summary of my personal email (sadly I can’t run it on work email yet). It can extract any actions required from a mail and make them into tasks, and also has a killer feature - a “sort out my email” button that archives all the emails it classifies as FYI, spam, mailing list or moot (it has classifiers for this), first producing a one-pager markdown summary of the whole lot in one shot, leaving all emails marked “action required” or “Urgent”. Email summaries are deliberately dry and factual with all advertising false urgency removed.
I can manually “hold” emails so they don’t go in the “sort out my email” woodchipper. It’s been life-changing.
nsilvestri 7 hours ago
This is the Soylent of written communication. Full nutritional value with an unremarkable flavor.
haberman 6 hours ago
That is unironically exactly what I want from social media.
I want the option to engage with the substance of new developments in the world, technology, etc. without the drama. I don't want to be drawn into the drama of strangers (who could, for all I know, just be bots or ragebaiting AIs).
If I want drama, there's plenty of it on TV, or I could talk to my friends about what is going on with people I actually know.
The anti-pattern, in my mind, is logging on to engage with substantive content and to be inadvertently drawn into flamewars with strangers.
jychang 6 hours ago
Are humans supposed to enjoy the "flavor" of diarrhea, as the result of giving every village idiot a microphone so they can spew shit from their mouths?
Sure, you might say this sort of thing is boiling flavor out of your food, but... boiling the bacteria out of what you consume isn't a bad thing.
mplanchard 25 minutes ago
samrus 6 hours ago
encrux 5 hours ago
For YouTube, this already exists and I‘m using it. The extension is caller DeArrow and aims to reduce sensationalism via crowdsourcing, though I wouldn’t be surprised if top contributors are bots using LLMs.
niek_pas 2 hours ago
Man, that before-after slider on the home page makes me so sad... YouTube used to just be random people sharing cool stuff, and those de-sensationalized titles really brought me back to that time for a second! Cool stuff.
sebzim4500 an hour ago
For people like me had tried it in the past and found it annoying, note that it now has a 'casual' mode where it only changes the truly useless titles and leaves reasonable ones alone.
netcan 6 hours ago
I think it's an interesting idea to explore.
But... It's the type of idea that is unpredictable as it comes into contact with reality. If it works, it probably works very differently from the initial idea of how it will work.
haberman 6 hours ago
I 100% agree with this. I am certain that I cannot foresee how this would play out in reality.
jychang 6 hours ago
Yeah, I 100% agree with the caution in this comment.
I see the merit in such a proposal. It's the linguistic equivalent to boiling the food you consume, instead of eating it raw with all the associated bad stuff.
The problem is, as you said, that this plan is unlikely to be as rosy as it's portrayed and probably has a lot of drawbacks in real life.
Interesting to think about and explore, though.
netcan 6 hours ago
dotancohen 6 hours ago
Though I hate the idea of this, I can see it becoming popular in some use cases, such as schools with "safe places".
jurgenburgen 7 hours ago
On the other hand it would make all comments sound the same and further dilute internet content into average slop.
sidkhanooja 7 hours ago
on reflection, i would appreciate average slop more than the occasional heinous slop people say when they are opinionated..
dotancohen 6 hours ago
altmanaltman 6 hours ago
Don't you think its better to just curate your social media and follow communities where the default is not toxicity? This is basically a distortion layer for reality and will just encourage more echo chambers.
Also what is toxic to one person is not toxic to another depending on their subjective choices. How will you solve for this without everyone just seeing what they want to see even if reality is not like that? I feel that will just enhance the problems of social media than reduce it.
It kind of falls apart when you start to think of edge cases rather than "hey this tool will keep morons off my feed!" mentality
domenicd 5 hours ago
I led the design effort on this API, before retiring. Here's my writeup on some of the considerations that went into it: https://domenic.me/builtin-ai-api-design/
comboy 4 hours ago
How do you envision short term and long term target usage of it?
And do you guys communicate between other browsers when doing something like this to try to settle on something common? I don't mean W3C but practically, it's a small world after all.
domenicd 3 hours ago
I can't speak for "you guys" anymore, as I'm retired, but from my personal perspective/recollection:
The target usage for the prompt API is anything that would benefit from the general capabilities of a language model, and can't be encompassed by the more-specific APIs for summarization/writing/rewriting. Realistic use cases currently are things like sentiment analysis, keyword extraction, etc. I have a number of ideas on how to integrate it into my current retirement project around Japanese flashcards, e.g. generating example sentences. If the small (~10 GiB) model class keeps getting smarter, the class of things possible on-device in this way gets larger and larger over time.
We definitely communicated with other browsers. There were the standing WebML Community Group meetings at the W3C every few weeks. There were async discussions like https://github.com/mozilla/standards-positions/issues/1213 and https://github.com/WebKit/standards-positions/issues/495 . (Side note, I love the contrast between Mozilla's helpful in-depth feedback and WebKit's... less helpful feedback.) There was also a bit of a debacle where the W3C Technical Architecture Group tried to give "feedback" but the feedback ended up being AI-generated slop... https://github.com/w3ctag/design-reviews/issues/1093 .
But overall, yeah, the goal with the prompt API, as with all web APIs, is to put something out there for discussion as early as possible, and get input from the broad community, especially including other browsers, to see if it's something that they are interested in collaborating on. https://www.chromium.org/blink/guidelines/web-platform-chang... (which I also wrote) goes into how the Chromium project thinks about such collaboration in general.
meander_water 3 hours ago
This looks like it uses Gemini Nano under the hood. But the latest Gemma4 E2B and E4B models appear to be much better, so you'd probably be better off deploying quantized versions through an extension for now.
- Gemini Nano-1: 46% MMLU, 1.8B
- Gemini Nano-2: 56% MMLU, 3.25B
- Gemma4 E2B: 60.0% MMLU, 2.3B
- Gemma4 E4B: 69.4% MMLU, 4.5B
Sources:
- https://huggingface.co/google/gemma-4-E2B-it
- https://android-developers.googleblog.com/2024/10/gemini-nan...
domenicd 3 hours ago
I no longer have any inside knowledge, but from my time on this team they were very quick about getting the latest small (Google) models into Chrome. I expect that if Gemma 4 (or its equivalent Gemini Nano) isn't already in Chrome, then it will be soon.
Note that the article here was last updated 2025-09-21, and as of that time it was already on Gemini Nano 3.
meander_water 2 hours ago
Thanks for the insider info! Do you know if there are any published benchmarks for Nano 3?
avaer 8 hours ago
It works, I've shipped this as a "local inference"/poor person's ollama for low-end llm tasks like search. The main win is that it's free and privacy preserving, and (mostly) transparent to users in that they don't have to do anything, which is great for giving non-technical users local inference without making them do scary native things.
But keep in mind the actual experience for users is not great; the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back. That's unfixable until operating systems start reliably shipping their own prebaked models that an API like this could plug into.
zozbot234 2 hours ago
> But keep in mind the actual experience for users is not great; the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back.
With MoE models, you could fetch expert layers from the network on demand by issuing HTTP range queries for the corresponding offset, similar to how bittorrent downloads file chunks from multiple hosts. You'd still have to download shared layers, but time to first token would now be proportional to active-size rather than total-size. Of course this wouldn't be totally "offline" inference anymore, but for a web browser feature that's not a key consideration.
NitpickLawyer 2 hours ago
> With MoE models, you could fetch expert layers from the network on demand
This is a common misconception, probably due to the unfortunate naming. Expert layers are not "expert" at any particular subject, and active-size only refers to the activated layers per token. You'd still need all (or most of all) the layers for any particular query, even if some layers have a very low chance of being activated.
All in all, you'd be better off with lazy loading the entire model, at least you'd know you have the capability to run inference from then on.
zozbot234 2 hours ago
Yokohiii 8 hours ago
> That's unfixable until operating systems start reliably shipping their own prebaked models that an API like this could plug into.
Maybe the next big thing will be some software subscription premium offers with a bunch of 5090s as an extra.
paganel 4 hours ago
> operating systems start reliably shipping their own prebaked models
Here's to hoping that that dystopia will never happen.
subhobroto 8 hours ago
> It works, I've shipped this as a "local inference"/poor person's ollama for low-end llm tasks like search
fantastic!
> the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back
sure but does this mean the model is lazily downloaded? that is, if I used this and I am the first time the model was called, the user would be waiting until the model was downloaded at that point?
that sounds like a horrible user experience - maybe chrome reduces the confusion by showing a download dialog status or similar?
also, any idea what the on disk impact is?
avaer 7 hours ago
The model download is lazy and cached, so it's a one-time cost presumably across all origins (I assume so since the alternative would be a trivial DoS waiting to happen).
So it's once per browser, not once per site.
You can track the download state yourself and pop whatever UI you want.
tastroder 7 hours ago
chrome://on-device-internals reports "Model Name: v3Nano Version: 2025.06.30.1229 Folder size: 4,072.13 MiB" on a random Windows machine I just checked.
subhobroto 5 hours ago
why_is_it_good 7 hours ago
> Storage: At least 22 GB of free space on the volume that contains your Chrome profile.
dotancohen 6 hours ago
taejavu 7 hours ago
subhobroto 7 hours ago
jfoster 5 hours ago
Doesn't sound great, but consider how much better this is than every webpage trying to load their own models.
If it turns out useful enough I'm sure browsers will just start including it as (perhaps optional?) part of installation.
tom1337 42 minutes ago
The idea of having local LLMs accessible in the browser for privacy concerning is nice i guess but when each browser has a different model attached to this API testing becomes even more a nightmare then now. I wonder if this will drive more users towards chrome because most of the usages of this API might be just tailored to fit the Gemini Nano model?
rock_artist 7 hours ago
I think it's a step into a future of proper Model API. But it's just a small step. It reminds me of Apple's Foundation Models [1]
While many AI integrations are focused on text communication / chat style. A lot of software benefits from non-text interfaces.
I believe at some point OSes and browsers should provide an API to manage models so you'll have access to on-device/remote ones with a simplified interface for the app. Making something standardized that is cross-platform would be fantastic. It also needs to be on mobile devices, so the players that can easily make it happen are mostly Apple and Google. (Meta will follow or vice-versa I guess)
Key-point: it shouldn't be exclusive to promoted models.
(1) https://developer.apple.com/documentation/foundationmodels So the app would be able to query and get the right model(s).
afshinmeh 7 hours ago
kurtoid 4 hours ago
jameslk 8 hours ago
Seems like a good way for a rogue JS script to offload token generation to a bunch of unsuspecting visitors
It would actually be pretty interesting to see if its possible to decentralize the compute to generate something useful from a larger prompt broken down and sent to a bunch of browsers using a subagent pattern or something like RLM, each working on a smaller part of the prompt
varun_ch 8 hours ago
This feels like a lot of work for low reward, the technical/business infrastructure would be wild. And if anyone wants to offload their prompts to users browsers, they might as well just use the Chrome API correctly? How many server side prompts would realistically be useful to offload to a low end model like this?
Plus even if you really wanted to do that, WebGPU exists and has for a while right?
dnnddidiej 5 hours ago
Nefarous use cases. Run that on some suckers machine.
Edit: simple example is a spam bot
dotancohen 6 hours ago
> This feels like a lot of work for low reward
Low per-device reward combined with a high user count - either by large legitimate players or by botnets - has been the monetisation strategy of most online enterprises.jameslk 8 hours ago
> How many server side prompts would realistically be useful to offload to a low end model like this?
There's a lot of ways this API could go, e.g. more powerful models eventually, or perhaps integration with cloud models. For example, I could see Google trying to default Gemini as the model for users signed into Chrome
varun_ch 8 hours ago
mudkipdev 3 hours ago
Gemini Nano, unlike Gemma, is not open-weight, right? I would be interested in dumping the model weights, unless someone has done that already
benjaminbenben 3 hours ago
We use this for summarising our hack day write ups: https://remotehack.space/previous-hacks/
It's a tiny script that looks up the rss feed and uses the content to generate summaries; quite a nice fit with our static site. Sometime I'd like to extend it to ask different questions about the content.
nl 8 hours ago
The model this uses is useless for anything beyond 2 round chat at the most.
If you want to do anything interesting you need transformers.js and a decent mode. Qwen 0.9B is where things start working usefully
me551ah 3 hours ago
I’m just wondering how much more RAM and VRAM chrome will use after these changes
gopalv 6 hours ago
The better part of this is having a local-first AI, particularly because it has tool-calling builtin & structured output.
I haven't pushed out a full version[1] which uses ducklake-wasm + this to make a completely local SQL answering machine, but for now all it does is retype prompts in the browser.
izietto 4 hours ago
Can pass to it the current page contents for a AI-based AdBlock / cookie manager / etc.?
skybrian 8 hours ago
Still in origin trial? Looks like they're adding a temperature parameter:
fg137 8 hours ago
"sorry, to use our website, you must have at least 22 GB of free disk space."
cdrini 7 hours ago
True, but arguably better than "sorry, to use our website, you must have a ChatGPT subscription."
fg137 2 hours ago
More like "you need to sign up for our website and pay for a subscription", and I'd much rather do that if it's actually providing value. I am absolutely not going to run model locally which slowly churns out words at 5 tps while making the computer hot to touch.
_pdp_ 6 hours ago
that is ~9% of the total available disk space for baseline phones and laptops for a model that is not that useful.
jfoster 5 hours ago
Also much better than every website wanting its own 22 GB rather than the 22 GB being a shared resource.
fg137 2 hours ago
Ronsenshi 3 hours ago
Not long before all of the web content will be going through these AI pipelines where user might not even see original webpage.
tethys 5 hours ago
Slightly off-topic: Refreshing to see these two authors link to their Bluesky and Mastodon profiles. No Twitter/X in sight!
gorgoiler 8 hours ago
Imagine a Vendor API that adds a way to link from the page straight into a device purchase workflow. As a trial of the API in Chrome you can order a new Google Pixel 9b directly from any page with the word Android in it!
Or a LocalNet API that integrates with trusted hardware devices on your local network. As a trial (Chrome beta programme — strictly limited but here’s 3x signup links to share with your friends) you can adjust your Google Next Mini underfloor heating directly from Chrome!
Or a DirectCast API that lets you stream <video> elements to a device of your choice even over a VPN. As a Chrome trial, you can use your Google Cloud account to stream directly from YouTube Premium to any linked Google Chromecast devices you own!
danny_codes 7 hours ago
Domain names are a nice candidate for a Georgian tax