Gemma 4 on iPhone (apps.apple.com)
820 points by janandonly a day ago
pmarreck a day ago
Impressive model, for sure. I've been running it on my Mac, now I get to have it locally in my iPhone? I need to test this. Wait, it does agent skills and mobile actions, all local to the phone? Whaaaat? (Have to check out later! Anyone have any tips yet?)
I don't normally do the whole "abliterated" thing (dealignment) but after discovering https://github.com/p-e-w/heretic , I was too tempted to try it with this model a couple days ago (made a repo to make it easier, actually) https://github.com/pmarreck/gemma4-heretical and... Wow. It worked. And... Not having a built-in nanny is fun!
It's also possible to make an MLX version of it, which runs a little faster on Macs, but won't work through Ollama unfortunately. (LM Studio maybe.)
Runs great on my M4 Macbook Pro w/128GB and likely also runs fine under 64GB... smaller memories might require lower quantizations.
I specifically like dealigned local models because if I have to get my thoughts policed when playing in someone else's playground, like hell am I going to be judged while messing around in my own local open-source one too. And there's a whole set of ethically-justifiable but rule-flagging conversations (loosely categorizable as things like "sensitive", "ethically-borderline-but-productive" or "violating sacred cows") that are now possible with this, and at a level never before possible until now.
Note: I tried to hook this one up to OpenClaw and ran into issues
To answer the obvious question- Yes, this sort of thing enables bad actors more (as do many other tools). Fortunately, there are far more good actors out there, and bad actors don't listen to rules that good actors subject themselves to, anyway.
jwr 11 hours ago
> It's also possible to make an MLX version of it, which runs a little faster on Macs
FWIW, I found MLX variants to perform consistently worse (in terms of expected output, not speed) than GGUF in my measurements on my benchmark that matters to me (spam filtering). I used MLX models in LM Studio. GGUF was always slightly better.
Perhaps someone who knows more can pitch in and explain this.
embedding-shape 7 hours ago
It isn't 100% clear, but what quantization were you using for each? I've had worse results with MLX 8bit than what you get with Q4 GGUF, same model, seems mxfp8 or bf16 is needed when ran with MLX to get something worthwhile out of them, but I've done very little testing, could have been something specific with the model I was testing at the time.
pmarreck 7 hours ago
I was not aware of this. I might not be willing to trade accuracy for speed in this case, then.
c2k a day ago
I run mlx models with omlx[1] on my mac and it works really well.
pmarreck 19 hours ago
Holy hell, how new is this? I've never heard of it, looks great!
nothinkjustai 18 hours ago
barbazoo a day ago
> And there's a whole set of ethically-justifiable but rule-flagging conversations (loosely categorizable as things like "sensitive", "ethically-borderline-but-productive" or "violating sacred cows") that are now possible with this, and at a level never before possible until now.
I checked the abliterate script and I don't yet understand what it does or what the result is. What are the conversations this enables?
SL61 21 hours ago
LLMs are very helpful for transcribing handwritten historical documents, but sometimes those documents contain language/ideas that a perfectly aligned LLM will refuse to output. Sometimes as a hard refusal, sometimes (even worse) by subtly cleaning up the language.
In my experience the latest batch of models are a lot better at transcribing the text verbatim without moralizing about it (i.e. at "understanding" that they're fulfilling a neutral role as a transcriber), but it was a really big issue in the GPT-3/4 era.
dolebirchwood 21 hours ago
spijdar a day ago
Realistically, a lot of people do this for porn.
In my experience, though, it's necessary to do anything security related. Interestingly, the big models have fewer refusals for me when I ask e.g. "in <X> situation, how do you exploit <Y>?", but local models will frequently flat out refuse, unless the model has been abliterated.
tredre3 a day ago
int_19h 10 hours ago
throwuxiytayq a day ago
The in-ter-net is for porn
rav3ndust a day ago
pmarreck a day ago
1) Coming up with any valid criticism of Islam at all (for some reason, criticisms of Christianity or Judaism are perfectly allowed even with public models!).
2) Asking questions about sketchy things. Simply asking should not be censored.
3) I don't use it for this, but porn or foul language.
4) Imitating or representing a public figure is often blocked.
5) Asking security-related questions when you are trying to do security.
6) For those who have had it, people who are trying to use AI to deal with traumatic experiences that are illegal to even describe.
Many other instances.
tshaddox 19 hours ago
ryanjshaw 10 hours ago
peyton 21 hours ago
eloisant a day ago
I tried it on my mac, for coding, and I wasn't really impressed compared to Qwen.
I guess there are things it's better at?
OtherShrezzing 12 hours ago
Assuming you’re not copy/pasting for these tasks. What’s the stack required to use local models for coding? I’ve got a capable enough machine to produce tokens slowly, but don’t understand how to connect that to the likes of VSCode or a JetBrains ide.
mcintyre1994 11 hours ago
nkohari a day ago
You're comparing apples to oranges there. Qwen 3.5 is a much larger model at 397B parameters vs. Gemma's 31B. Gemma will be better at answering simple questions and doing basic automation, and codegen won't be it's strong suit.
kgeist a day ago
tredre3 a day ago
saagarjha 10 hours ago
I have found that a lot of the techniques used to decensor models (as far as I can tell, they basically get all their weights to say no turned off) also make them really stupid. Like, sure, it will help you rob a bank, but if you ask whether you should rob the bank it will go "The positives: … The negatives: … My take: You should ABSOLUTELY rob the bank".
pmarreck 7 hours ago
The abliteration in particular that Heretic does apparently results in a best-in-class lack of "stupefying" the underlying model. You haven't read its claims, apparently.
lxgr 8 hours ago
I wonder if this is due to abliteration actually "damaging" the model, or just an artifact of the model never having been properly trained on "forbidden" topics (as it's enough for them to recognize them, and there's no point in dedicating neurons to something that will never be exercised anyway).
zozbot234 7 hours ago
magospietato a day ago
Haven't built anything on the agent skills platform yet, but it's pretty cool imo.
On Android the sandbox loads an index.html into a WebView, with standardized string I/O to the harness via some window properties. You can even return a rendered HTML page.
Definitely hacked together, but feels like an indication of what an edge compute agentic sandbox might look like in future.
bossyTeacher a day ago
>there's a whole set of ethically-justifiable but rule-flagging conversations (loosely categorizable as things like "sensitive", "ethically-borderline-but-productive" or "violating sacred cows") that are now possible with this, and at a level never before possible until now.
Mind giving us a few of the examples that you plan to run in your local LLM? I am curious.
pmarreck 20 hours ago
I'm not sure what you're angling at but I already gave a set of questions that are ethically legitimate yet routinely censored by the public models:
https://news.ycombinator.com/item?id=47654013
Not to mention that doing what the big model makers do literally dumbs the model down.
They should at least allow something like letting you prove your age and identity to give you access to better/unaligned models, maybe even requiring a license of some sort. Because you know what? SOMEONE in there absolutely has access to the completely uncensored versions of the latest models.
satvikpendem 18 hours ago
karimf a day ago
This app is cool and it showcases some use cases, but it still undersells what the E2B model can do.
I just made a real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B. I posted it on /r/LocalLLaMA a few hours ago and it's gaining some traction [0]. Here's the repo [1]
I'm running it on a Macbook instead of an iPhone, but based on the benchmark here [2], you should be able to run the same thing on an iPhone 17 Pro.
[0] https://www.reddit.com/r/LocalLLaMA/comments/1sda3r6/realtim...
[1] https://github.com/fikrikarim/parlor
[2] https://huggingface.co/litert-community/gemma-4-E2B-it-liter...
dang 14 hours ago
Re-upped here:
Show HN: Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B - https://news.ycombinator.com/item?id=47652007
karimf 14 hours ago
Oh wow, that's awesome. Thanks a lot, dang!
storus 16 hours ago
That's cool! You can add SoulX-FlashHead for real-time AI head animation as well if you want to simulate a teacher.
karimf 14 hours ago
Thanks for sharing! I'm still torn about it. Sure it'll feel more natural if you have the AI head animation, but I don't want people to get attached to it. I don't want to make the loneliness epidemic even worse.
nothinkjustai a day ago
Parlor is so cool, especially since you’re offering it for free. And a great use case for local LLMs.
karimf a day ago
Thanks! Although, I can't claim any credit for it. I just spent a day gluing what other people have built. Huge props to the Gemma team for building an amazing model and also an inference engine that's focused for edge devices [0]
PullJosh a day ago
This is awesome!
1) I am able to run the model on my iPhone and get good results. Not as good as Gemini in the cloud, but good.
2) I love the “mobile actions” tool calls that allow the LLM to turn on the flashlight, open maps, etc. It would be fun if they added Siri Shortcuts support. I want the personal automation that Apple promised but never delivered.
3) I am so excited for local models to be normalized. I build little apps for teachers and there are stringent privacy laws involved that mean I strongly prefer writing code that runs fully client-side when possible. When I develop apps and websites, I want easy API access to on-device models for free. I know it sort of exists on iOS and Chrome right now, but as far as I’m aware it’s not particularly good yet.
buzzerbetrayed a day ago
For me the hallucination and gaslighting is like taking a step back in time a couple of years. It even fails the “r’s in strawberry” question. How nostalgic.
It’s very impressive that this can run locally. And I hope we will continue to be able to run couple-year-old-equivalent models locally going forward.
dimmke 20 hours ago
I haven't seen anybody else post it in this thread, but this is running on 8GB of RAM. It's not the full Gemma 4 32B model. It's a completely different thing from the full Gemma 4 experience if you were running the flagship model, almost to the point of being misleading.
It's their E2B and E4B variants (so 2B and 4B but also quantized)
https://ai.google.dev/gemma/docs/core/model_card_4#dense_mod...
zozbot234 20 hours ago
1f60c 20 hours ago
Strangely, reasoning is not on by default. If you enable it, it answers as you'd expect.
shtack 17 hours ago
With reasoning on I found E4B to be solid, but E2B was completely unusable across several tests.
janandonly a day ago
OP Here. It is my firm belief that the only realistic use of AI in the future is either locally on-device for almost free, or in the cloud but way more expensive then it is today.
The latter option will only bemusedly for tasks that humans are more expensive or much slower in.
This Gemma 4 model gives me hope for a future Siri or other with iPhone and macOS integration, “Her” (as in the movie) style.
crazygringo a day ago
> or in the cloud but way more expensive then it is today.
Why? It's widely understood that the big players are making profit on inference. The only reason they still have losses is because training is so expensive, but you need to do that no matter whether the models are running in the cloud or on your device.
If you think about it, it's always going to be cheaper and more energy-efficient to have dedicated cloud hardware to run models. Running them on your phone, even if possible, is just going to suck up your battery life.
mbesto a day ago
> It's widely understood that the big players are making profit on inference.
This is most definitely not widely understood. We still don't know yet. There's tons of discussions about people disagreeing on whether it really is profitable. Unless you have proof, don't say "this is widely understood".
victorbjorklund 8 hours ago
int_19h 10 hours ago
igtt 18 hours ago
petesergeant 12 hours ago
zozbot234 a day ago
The big players are plausibly making profits on raw API calls, not subscriptions. These are quite costly compared to third-party inference from open models, but even setting that up is a hassle and you as a end user aren't getting any subsidy. Running inference locally will make a lot of sense for most light and casual users once the subsidies for subscription access cease.
Also while datacenter-based scaleout of a model over multiple GPUs running large batches is more energy efficient, it ultimately creates a single point of failure you may wish to avoid.
janalsncm 21 hours ago
> It's widely understood that the big players are making profit on inference.
If you add in the cost of training, it’s not profitable.
Not including the cost of training is a bit like saying the only cost of a cup of coffee is the paper cup it’s in. The only way OpenAI gets to charge for inference is by selling a product people can’t get elsewhere for much cheaper, which means billions in R&D costs. But because of competition, each model effectively has a “shelf life”.
tybit 18 hours ago
jfoster 15 hours ago
They will always be training new models, so if training is expensive, that's just part of the business they are in.
Vast amounts of capital have been poured in, but they continue to raise more. Presumably because they need more.
Is the capital being invested without any expectation of ROI?
huijzer a day ago
Laptop/desktop could work. Most systems are on charger most of time anyway
jrflowers a day ago
> It's widely understood that the big players are making profit on inference.
I love the whole “they are making money if you ignore training costs” bit. It is always great to see somebody say something like “if you look at the amount of money that they’re spending it looks bad, but if you look away it looks pretty good” like it’s the money version of a solar eclipse
skybrian 21 hours ago
victorbjorklund 8 hours ago
nothinkjustai a day ago
> It's widely understood that the big players are making profit on inference.
Are they? Or are they just saying that to make their offerings more attractive to investors?
Plus I think most people using agents for coding are using subscriptions which they are definitely not profitable in.
Locally running models that are snappy and mostly as capable as current sota models would be a dream. No internet connection required, no payment plans or relying on a third party provider to do your job. No privacy concerns. Etc etc.
nl 20 hours ago
zozbot234 a day ago
_pdp_ a day ago
If you can run free models on consumer devices why do you think cloud providers cannot do the same except better and bundled with a tone of value worth paying?
amelius a day ago
A local model running on a phone owned and controlled by the vendor is still not really exciting, imho.
It may be physically "local" but not in spirit.
0dayman a day ago
this is not that first step towards your dream
kennywinker a day ago
Did you really watch “Her” and think this is a future that should happen??
Seriously????
jfreds a day ago
I don’t think OP’s point has anything to do with AI companions.
The big benefit of moving compute to edge devices is to distribute the inference load on the grid. Powering and cooling phones is a lot easier than powering and cooling a datacenter
kennywinker 18 hours ago
satvikpendem 19 hours ago
What does what they said have anything to do with Her? Local LLMs are better than big corporations owning your data and offering LLMs for a huge cost.
kennywinker 18 hours ago
teolandon 15 hours ago
sambapa a day ago
Torment Nexus sounds fun
kennywinker 18 hours ago
aninteger a day ago
Having Scarlett Johansson's voice might not be so bad or even something less robotic.
kennywinker 21 hours ago
esafak 21 hours ago
Unfortunately, one man's dystopia is another's utopia.
jeroenhd a day ago
English version of the page: https://apps.apple.com/us/app/google-ai-edge-gallery/id67496...
Also on Android: https://play.google.com/store/apps/details?id=com.google.ai....
It's a demo app for Google's Edge project: https://ai.google.dev/edge
om252345 15 hours ago
Gemma4 works really slow on my android e2b model on Samsung galaxy s21 ultra. Atleast 20-30 sec to warm up and then reply.
jeroenhd 8 hours ago
Running LLMs is probably the first time I find that the SoC of that generation to lack. Even Google's underpowered Tensor CPUs make a huge difference when it comes to LLM performance.
You can check your settings for GPU acceleration, it's possible that enabling that makes a big difference.
From what I've found online the difference may also simply be Snapdragon versus Exynos GPU driver optimizations, in which case I don't think the performance can be fixed by anyone but Samsung. Others online seem to get decent performance out of the model on the S21 Ultra at the very least.
satvikpendem 15 hours ago
Needs a modern phone, local LLMs don't work well on older phones.
thepbone 9 hours ago
The bigger E4B model is pretty fast on my Galaxy S21 Ultra even with thinking enabled. Maybe GPU acceleration was not enabled?
jeroenhd 8 hours ago
cobicobi 14 hours ago
need s24 ultra and above i think
ysleepy 11 hours ago
The S25 (edge) runs this very well. 29 tok/s for E2B.
amai 4 hours ago
The cooperation of Apple and Google is going to crush the competition: https://blog.google/company-news/inside-google/company-annou...
The combination of Apples hardware and Googles software is unbeatable.
lemonish97 30 minutes ago
Isn't their competition (at least in the mobile space) google/android themselves?
bigyabai 3 hours ago
With Google's graveyard and Apple's walled garden, nothing can stop the enshittification train from trundling down the tracks.
lemonish97 34 minutes ago
I hope they add a web search tool to the agent skills too. Most of my llm usage on my phone are just quick lookups and search summarizations. Would love to do these with a local model rather than Google AI mode of any other cloud based inference tools.
rock_artist 13 hours ago
I really believe in the future of local models.
From app developer and user, My main concern for now is bloating devices. Until we’ll have something like Apples foundation model where multiple apps could share the same model it means we have something horrible as Electron in the sense, every app is a fully blown model (browser in the electron story) instead of reusing the model.
With desktops we have DLL hell for years. But with sandboxed apps on mobile devices it becomes a bigger issue that I guess will/should be addressed by the OS.
For my app I’ve been trying to add some logic based on large model but for bloating a simple Swift app with 2-3GB of model or even few hundred MBs feels wrong doing and conflicting with code reusability concepts.
janandonly 8 hours ago
This app unlocks using the Apple Foundation model itself: https://apps.apple.com/nl/app/locally-ai-local-ai-chat/id674...
al_borland 17 hours ago
I find it odd they are using the term “edge” to brand this, if it’s target is the general public.
I’ve been to a few tech conferences and saw the term used there for the first time. It took me a little bit to see the pattern and understand what it meant. I have never heard the term used outside of those circles. It seems like “local” would be the term average users would be familiar with. Normal people don’t call their stuff “edge devices”.
ycombinete 11 hours ago
Funnily enough I work in the security industry and the term is ubiquitous there, so I didn’t even notice it.
bigyabai 14 hours ago
> if it’s target is the general public.
It's not - Apple is working with Google right now to make Siri into the public-facing version of this. This is kinda just the tech preview before all the branding has been painted on.
areys an hour ago
The use cases that open up when inference stays on-device are genuinely different. Health apps, journaling, anything where users are (justifiably) paranoid about their data leaving the phone — that's a big surface area that cloud APIs can't really touch. Surprised this is happening at the speed it is on consumer hardware.
orf 10 hours ago
I’d recommend locally.ai[1] - it’s really good and has a wide range of models. Also has shortcuts support.
1. https://apps.apple.com/gb/app/locally-ai-local-ai-chat/id674...
janandonly 8 hours ago
Thanks for the link. Gemma 4 also works in this app.
dhbradshaw a day ago
My son just started using 2B on his Android. I mentioned that it was an impressively compact model and next thing I knew he had figured out how to use it on his inexpensive 2024 Motorolla and was using it to practice reading and writing in foreign languages.
allpratik a day ago
Nice! Tried on iPhone 16 pro with 30 TPS from Gemma-4-E2B-it model.
Although the phone got considerably hot while inferencing. It’s quite an impressive performance and cannot wait to try it myself in one of my personal apps.
golem14 18 hours ago
It's at least somewhat limited in non-English content. It knows how to make lentil soup, so I was happy that I never need to look up recipe sites with awful UX and ads, but then it couldn't find a recipe for "Kalter Hund"/"Kalte Schnauze". So sad ;)
Still, absolutely fabulous. What a time to be alive!
mudkipdev 13 hours ago
It's strange that my iPhone 14 is at regular temperature when using the E2B model. But also it's a lot slower (not sure how to measure the exact tokens per second, ~12 if I had to guess)
TGower a day ago
These new models are very impressive. There should be a massive speedup coming as well, AI Edge Gallery is running on GPU, but NPUs in recent high end processors should be much faster. A16 chip for example (Macbook Neo and iphone 16 series) has 35 TOPS of Neural Engine vs 7 TFLOPS gpu. Similar story for Qualcomm.
api a day ago
That’s nuts actually for such a low power chip. Can’t wait to see the M series version of that.
I’m sure very fast TPUs in desktops and phones are coming.
zozbot234 a day ago
The Apple Silicon in the MacBook Neo is effectively a slimmed down version of M4, which is already out and has a very similar NPU (similar TFLOPS rating). It's worth noting however that the TFLOPS rating for Apple Neural Engine is somewhat artificial, since e.g. the "38 TFLOPS" in the M4 ANE are really 19 TFLOPS for FP16-only operation.
haizhung 6 hours ago
I encourage everybody to try this, if they have an iPhone. If you’re like me and don’t have the time to tinker with the latest and greatest all the time; this app lowers the barrier to entry significantly and provides a glimpse into what’s possible locally, on device.
Honestly, I was extremely impressed by the speed and quality of the answers considering this thing runs on a phone. It honestly makes me want to sit down and spin up my own homegrown AI setup to go fully independent. Crazy.
two_handfuls 18 hours ago
The description says it's private, but the legalese it makes you agree to makes no promise. Rather, the opposite:
> We collect information about your activity in our services
mjlee 10 hours ago
I was about to ask if anybody had looked at what it was sending home. I’m travelling so I’m not in a position to run this through a proxy for a couple of weeks, but also I’m travelling so this could be useful!
bigyabai 18 hours ago
The app is open source[0], although given Apple's stance on sideloading it's hard to confirm if you're using the open version.
selfsigned 10 hours ago
Two (very quick) minutes on their GitHub repo and it's pretty obvious that they're using firebase-analytics and at the very least seem to be sending URLs[1] and infos such as the model you download or the capacities[2] you use.
[1] https://github.com/google-ai-edge/gallery/blob/main/Android/...
[2] https://github.com/google-ai-edge/gallery/blob/main/Android/...
kaliqt 4 hours ago
kaliqt 4 hours ago
That is the Android repo, where is the iOS repo?
_nagu_ 10 hours ago
If this works smoothly on iPhone, it could change how we think about mobile apps. Less backend dependency, more on-device intelligence.
jcutrell 6 hours ago
This is what Apple promised a long time ago, and just couldn't quite connect on delivery.
deckar01 a day ago
It doesn’t render Markdown or LaTeX. The scrolling is unusable during generation. E4B failed to correctly account for convection and conduction when reasoning about the effects of thermal radiation (31b was very good). After 3 questions in a session (with thinking) E4B went off the rails and started emitting nonsense fragment before the stated token limit was hit (unless it isn’t actually checking).
3abiton 2 hours ago
They have very limited capabilities compared to bigger more complex models, but for general stuff, they are fantastic. We need to set the expectations correctly of what they can do, I know lots of hype around Gemma 4, even though Qwen3.5 outperformed it. It's just a reliable overall small model, with great small model abilities.
hadrien01 a day ago
Is it me or does the App Store website look... fake? The text in the header ("Productiviteit", "Alleen voor iPhone") looks pixelated, like it was edited on Paint, the header background is flickering, the app icon and screenshots are very low quality, the title of the website is incomplete ("App Store voor iPho...")
lateforwork a day ago
Here's the US version of the same page: https://apps.apple.com/us/app/google-ai-edge-gallery/id67496...
The design quality is still poor. But that's the new Apple. Design is no longer one of their core strengths.
giarc a day ago
It's the dutch version, see /nl/ in the url.
If you just go to https://apps.apple.com/ it does look better, but I agree, still a bit "off".
throwatdem12311 a day ago
Issues caused by a low effort localization?
On my iPhone it opens on the App Store app, so it looks fine to me.
piperswe a day ago
What browser are you using? I don't see any of this behavior on Firefox...
hadrien01 a day ago
Firefox on Windows, but it looks about the same in Edge
Screenshot of the header: https://i.imgur.com/4abfGYF.png
morpheuskafka a day ago
t-sauer a day ago
OJFord 21 hours ago
Firefox on Android: 'Google AI' (in app name) is clipped off the top; the Apple 'share' button is clipped on the bottom.
j0hax a day ago
Everything renders crystal clear with Firefox on GrapheneOS.
ezfe a day ago
Nothing weird on my side
sshrajesh 3 hours ago
> Note: I tried to hook this one up to OpenClaw and ran into issues
Anyone worked on hooking up OpenClaw to gemma4 running locally?
burnto a day ago
My iPhone 13 can’t run most of these models. A decent local LLM is one of the few reasons I can imagine actually upgrading earlier than typically necessary.
Gigachad 16 hours ago
I’ve got a 17 pro and tbh I haven’t found any use for local models yet. They are a neat curiosity but the online ones are absolutely massively far ahead. Considering they are being given away for free currently, it’s hard to justify not making use of them over dumber local models.
mchusma 6 hours ago
I’m expecting the new iPhone release this fall to be coupled with some great version of Siri/model. This could be the first reason I’ve seen to upgrade in a while (although even that I’m not sure of, as I am king of in the “always use the best model it’s worth it” camp.)
Apple has a great shot at making a highly optimized 4.5 version of this model highly tuned to the next gen iPhone, which could work great.
carbocation a day ago
It would be very helpful if the chat logs could (optionally) be retained.
davecahill 18 hours ago
I really like Enclave for on-device models - looks like they're about to add Gemma 4 too: https://enclaveai.app/blog/2026/04/02/gemma-4-release-on-dev...
robbru 6 hours ago
I've been using Enclave ever since, they have been the best App Store option for a long time.
rudedogg 18 hours ago
This is fun, FYI you don’t have to sign in/up with a Google account. I hesitated downloading it for that reason.
satvikpendem 18 hours ago
This is also on Android and has an option to use AICore with the NPU which can run much faster than even the GPU models.
nout 18 hours ago
How do you get it running on Android?
satvikpendem 18 hours ago
It's the same app, Google AI edge gallery.
dwa3592 a day ago
I think with this google starts a new race- best local model that runs on phones.
dwa3592 a day ago
I wonder why the cut off date for 3n-E4B-it is Oct, 2023. That's really far in the past.
satvikpendem 19 hours ago
Because that's Gemma 3, not 4.
danielrmay 16 hours ago
I spent some time getting Gemma4-e4b working via llamacpp on iPhone and I'm really impressed so far! I posted a short video of an example application on LinkedIn here https://www.linkedin.com/feed/update/urn:li:activity:7446746... (or x: https://x.com/danielrmay/status/2040971117419192553)
derwiki 8 hours ago
I asked it about the “Altamont Free Concert” (exact name of Wikipedia article), and it’s been a while since I’ve seen an hallucination this rich. Doesn’t give me confidence to use it.
thot_experiment 21 hours ago
Gemma 4 E4B is an incredible model for doing all the home assistant stuff I normally just used Qwen3.5 35BA4B + Whisper while leaving me with wayy more empty vram for other bullshit. It works as a drop in replacement for all of my "turn the lights off" or "when's the next train" type queries and does a good job of tool use. This is the really the first time vramlets get a model that's reliably day to day useful locally.
I'm curious/worried about the audio capability, I'm still using Whisper as the audio support hasn't landed in llama.cpp, and I'm not excited enough to temporarily rewire my stuff to use vLLM or whatever their reference impl is. The vision capabilities of Gemma are notably (thus far, could be impl specific issues?) much much worse than Qwen (even the big moe and dense gemma are much worse), hopefully the audio is at least on par with medium whisper.
totetsu 10 hours ago
I have been looking at ARGmax https://www.argmaxinc.com/#SDK for running on apple devices, but not sure yet at whats involved in porting a model to work with their sdk
MysticOracle 16 hours ago
Crashes for me on a couple of different iDevices (2 generations behind) after only a few 2-3 chats. Probably not enough RAM.
Saw this one on X the other day updated with Gemma 4 and they have the built-in Apple Foundation model, Qwen3.5, and other models:
Locally AI - https://locallyai.app/
neurostimulant 21 hours ago
I'm able to sweet talk the gemma-4-e2b-it model in an iphone 15 to solve a hcaptcha screenshot. This small model is surprisingly very capable!
rcarmo 12 hours ago
This is fun. I just wish I could add more skills, the UX is too dumbed down but knowing there is a run_js tool there is a lot that can be done here.
XCSme a day ago
Gemma 4 is great: https://aibenchy.com/compare/google-gemma-4-31b-it-medium/go...
I assume it is the 26B A4B one, if it runs locally?
adrian17 21 hours ago
No, only E2B and E4B.
rotexo 20 hours ago
E4B is pretty good for extracting tables of items from receipt scans and inferring categories, wish this could be called from within a shortcut to just select a photo and add the extracted table to the clipboard
nickvec 17 hours ago
Extremely impressed by how fast responses are on iPhone 17 Pro Max. Can’t wait for this to be used for Siri’s brain one of these days (hopefully!)
gdzie-jest-sol 10 hours ago
I need normal server too in local network I can run chat in other device and 'counting' on iphone.
Second idea is input audio in other language, like Czech, Polish, French
modeless 13 hours ago
It's so ridiculous that Google made a custom SoC for their phones, touting its AI performance, even calling it Tensor, and Apple is still faster at running Google's own model.
Google really ought to shut down their phone chip team. Literally every chip from them has been a disappointment. As much as I hate to say it, sticking with Qualcomm would have been the right choice.
ulfw 13 hours ago
It runs very fast on my Qualcomm Elite Gen 5 SoC Oppo Find N6
allpratik 13 hours ago
How many tokens per second? Also, does it get warm/hot?
modeless 12 hours ago
Sharmaji000 17 hours ago
Still didnt release training recipe, data, methodology etc unlike deepseek. Mostly released to get developer ecosystem across their android built in ai. Still good and interesting, but not exactly philanthropic to the open source progress.
MagicMoonlight 8 hours ago
It seems really capable. A few more iterations of this and you won’t even need a subscription.
All it needs is web search so that it can get up to date information.
mc7alazoun 20 hours ago
Would it work locally on a Mac Pro M4 24gb? If so I'd really appreciate a step-by-step guide.
weberer 19 hours ago
These E2B and E4B models are very small so that they can fit into phones with around 8gb of RAM. You can get away with a much larger model. Just run:
brew install ollama
ollama run gemma4:26b-a4b-it-q4_K_Mmc7alazoun 10 hours ago
Legend! Thanks heaps.
jdthedisciple 12 hours ago
it's Google, so is it really private?
remember, megacorps are dying for infinite amounts of analytics data
prism56 11 hours ago
Are there any alternatives for on device android llm that aren't google and/or more private?
classified 12 hours ago
When I saw it wants me to "agree" to Google's "privacy policy", I deleted the app on the spot.
rickdg a day ago
How do these compare to Apple's Foundation Models, btw?
simonw a day ago
So much better. Hard to quantify, but even the small Gemma 4 models have that feels-like-ChatGPT magic that Apple's models are lacking.
snarkyturtle a day ago
AFM had a 4096 token context window and this can be configured to have a 32k+ token context window, for one.
Waterluvian 20 hours ago
I see a phenomenal opportunity for old phone re-use by arraying them in some dock and making them be my "home AI."
garff a day ago
How new of an iPhone model is needed?
tithos 20 hours ago
Most of the models are not available. I’m guessing they will become available soon enough… At least I hope.
beeflet a day ago
Isn't this already possible in a much more open-ended way with PocketPal?
https://github.com/a-ghorbani/pocketpal-ai
https://apps.apple.com/us/app/pocketpal-ai/id6502579498
https://play.google.com/store/apps/details?id=com.pocketpala...
lzzqrd 21 hours ago
Could you clarify what you mean by 'open-ended' in this context, since both initiatives are essentially open-source?
imadselka 9 hours ago
good model!
dzhiurgis a day ago
I recently got to a first practical use of it. I was on a plane, filling landing card (what a silly thing these are). I looked up my hotel address using qwen model on my iPhone 16 Pro. It was accurate. I was quite impressed.
After some back and forth the chat app started to crash tho, so YMMV.
lol8675309 21 hours ago
It’s gotta be free!?!? Right!?!? Oh oh wait
__natty__ a day ago
That's a great project! I just wondered whether Google would have a problem with you using their trademark
tech234a a day ago
This is an app published by Google itself
yalogin 19 hours ago
Are these models open source? If so this is Google’s attempt to collect user data from their models.
int_19h 10 hours ago
How is Google going to collect user data from a locally running model?
yalogin 7 hours ago
If you do it yourself they don’t, that is why they are packaging into an app