Hacker News

by Ryan Harman

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon (github.com)

119 points by sanchitmonga22 2 hours ago

Hi HN, we're Sanchit and Shubham (YC W26). We built a fast inference engine for Apple Silicon. LLMs, speech-to-text, text-to-speech – MetalRT beats llama.cpp, Apple's MLX, Ollama, and sherpa-onnx on every modality we tested. Custom Metal shaders, no framework overhead.

Also, we've open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to spoken response, entirely on-device. No cloud, no API keys.

To get started:

  brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
  brew install rcli
  rcli setup   # downloads ~1 GB of models
  rcli         # interactive mode with push-to-talk

Or:

  curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash

The numbers (M4 Max, 64 GB, reproducible via `rcli bench`):

LLM decode – 1.67x faster than llama.cpp, 1.19x faster than Apple MLX (same model files): - Qwen3-0.6B: 658 tok/s (vs mlx-lm 552, llama.cpp 295) - Qwen3-4B: 186 tok/s (vs mlx-lm 170, llama.cpp 87) - LFM2.5-1.2B: 570 tok/s (vs mlx-lm 509, llama.cpp 372) - Time-to-first-token: 6.6 ms

STT – 70 seconds of audio transcribed in *101 ms*. That's 714x real-time. 4.6x faster than mlx-whisper.

TTS – 178 ms synthesis. 2.8x faster than mlx-audio and sherpa-onnx.

We built this because demoing on-device AI is easy but shipping it is brutal. Voice is the hardest test: you're chaining STT, LLM, and TTS sequentially, and if any stage is slow, the user feels it. Most teams fall back to cloud APIs not because local models are bad, but because local inference infrastructure is.

The thing that's hard to solve is latency compounding. In a voice pipeline, you're stacking three models in sequence. If each adds 200ms, you're at 600ms before the user hears a word, and that feels broken. You can't optimize one stage and call it done. Every stage needs to be fast, on one device, with no network round-trip to hide behind.

We went straight to Metal. Custom GPU compute shaders, all memory pre-allocated at init (zero allocations during inference), and one unified engine for all three modalities instead of stitching separate runtimes together.

MetalRT is the first engine to handle all three modalities natively on Apple Silicon. Full methodology:

LLM benchmarks: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...

Speech benchmarks: https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-t...

How: Most inference engines add layers between you and the GPU: graph schedulers, runtime dispatchers, memory managers. MetalRT skips all of it. Custom Metal compute shaders for quantized matmul, attention, and activation - compiled ahead of time, dispatched directly.

Voice Pipeline optimizations details: https://www.runanywhere.ai/blog/fastvoice-on-device-voice-ai... RAG optimizations: https://www.runanywhere.ai/blog/fastvoice-rag-on-device-retr...

RCLI is the open-source voice pipeline (MIT) built on MetalRT: three concurrent threads with lock-free ring buffers, double-buffered TTS, 38 macOS actions by voice, local RAG (~4 ms over 5K+ chunks), 20 hot-swappable models, and a full-screen TUI with per-op latency readouts. Falls back to llama.cpp when MetalRT isn't installed.

Source: https://github.com/RunanywhereAI/RCLI (MIT)

Demo: https://www.youtube.com/watch?v=eTYwkgNoaKg

What would you build if on-device AI were genuinely as fast as cloud?

rushingcreek 2 minutes ago

Very cool, congrats! I'm curious how you were able to achieve this given Apple's many undocumented APIs. Does it use private Neural Engine APIs or fully public Metal APIs?

Either way, this is a tremendous achievement and it's extremely relevant in the OpenClaw world where I might not want to have sensitive information leave my computer.

vessenes 2 hours ago

Just tried it. really cool, and a fun tech demo with rcli. I filed a bug report; not everything is loading properly when installed via homebrew.

Quick request: unsloth quants; bit per bit usually better. Or more generally UI for huggingface model selections. I understand you won't be able to serve everything, but I want to mix and match!

Also - grounding:

"open safari" (safari opens, voice says: "I opened safari") "navigate to google.com in safari" (nothing happens, voice says: "I navigated to google.com")

Anyway, really fun.

Tacite 30 minutes ago

How did you try it? You said on github it doesn't work.

wlesieutre 10 minutes ago

They said it didn't work installed from homebrew, so I assume they went back and did the curl | bash install option

Tacite a minute ago

focusgroup0 39 minutes ago

The fact that Apple didn't ship this in years after Siri acquisition is an indictment of its Product leadership

liuliu 19 minutes ago

This is not different from mlx-lm other than it uses a closed-source inference engine.

jonhohle an hour ago

If I send a Portfile patch, would you consider MacPorts distribution?

AmanSwar 44 minutes ago

yes please

alfanick 2 hours ago

I'm not looking for STT->AI->TTS, I'm looking for truly good voice-to-text experience* on Linux (and others). Siri/iOS-Dictation is truly good when it comes to understanding the speech. Something this level on Linux (and others) would be great, yeah always listening, maybe sending the data somewhere, but give me UX - hidden latency, optimizing for first chars recognized - a good (virtual) input device.

coder543 an hour ago

> Siri/iOS-Dictation is truly good when it comes to understanding the speech.

What...? It is terrible, even compared to Whisper Tiny, which was released years ago under an Apache 2.0 license so Apple could have adopted it instantly and integrated it into their devices. The bigger Whisper models are far better, and Parakeet TDT V2 (English) / V3 (Multilingual) are quite impressive and very fast.

I have no idea what would make someone say that iOS dictation is good at understanding speech... it is so bad.

For a company that talks so much about accessibility, it is baffling to me that Apple continues to ship such poor quality speech to text with their devices.

derefr 42 minutes ago

Maybe they have exactly the accent iOS dictation was trained to recognize.

swindmill an hour ago

Have you tried https://handy.computer ?

DetroitThrow 2 hours ago

Wow, this is such a cool tool, and love the blog post. Latency is killer in the STT-LLM-TTS pipeline.

Before I install, is there any telemetry enabled here or is this entirely local by default?

bigyabai an hour ago

Don't give RunAnywhere your GitHub: https://news.ycombinator.com/item?id=47163885

shubham2802 2 hours ago

Fully local - no data is collected!!

computerex an hour ago

Amazing, this is what I am trying to do with https://github.com/computerex/dlgo

stingraycharles 2 hours ago

I’m a bit confused by what you’re offering. Is it a voice assistant / AI as described on your GitHub? Or is it more general purpose / LLM ?

How does the RAG fit in, a voice-to-RAG seems a bit random as a feature?

I don’t mean to come across as dismissive, I’m genuinely confused as to what you’re offering.

glitchc an hour ago

From the TFA: Document Intelligence (RAG): Ingest docs, ask questions by voice — ~4ms hybrid retrieval.

Seems pretty clear. You can supply documents to the model as input and then verbally ask questions about them.

drcongo 2 hours ago

I came to the comments here to see if anyone had worked out what it is, so you're not alone.

Tacite 2 hours ago

Doesn't work. " zsh: segmentation fault rcli"

esafak an hour ago

You could share your setup details, on GH if not here, to make it actionable.

Tacite 33 minutes ago

I did on Github. This looks vibecoded? EDIT: Dev is using Claude Code as stated in their github updates.

tristor 2 hours ago

> What would you build if on-device AI were genuinely as fast as cloud?

I think this has to be the future for AI tools to really be truly useful. The things that are truly powerful are not general purpose models that have to run in the cloud, but specialized models that can run locally and on constrained hardware, so they can be embedded.

I'd love to see this able to be added in-path as an audio passthrough device so you can add on-device native transcriptioning into any application that does audio, such as in video conferencing applications.

tiku an hour ago

Personally I'm so disappointed about the state of local AI. Only old models run "decent" but decent is way to slow to be usable.

j45 30 minutes ago

"Apple M3 or later required. MetalRT uses Metal 3.1 GPU features available on M3, M3 Pro, M3 Max, M4, and later chips. M1/M2 support is coming soon. On M1/M2, RCLI automatically falls back to the open-source llama.cpp engine."

Tacite 15 minutes ago

Funny you mention that because on their github they just pushed an update to say that it didn't work M3 and M4.

john_strinlai an hour ago

i knew i recognized this name from somewhere.

they are a company that registers domains similar to their main one, and then uses those domains to spam people they scrape off of github without affecting their main domain reputation.

edit: here is the post https://news.ycombinator.com/item?id=47163885

Imustaskforhelp an hour ago

Yup. The most crazy aspect was that they had bought the domain intentionally (just 1 month prior) that whole fiasco.

Maybe its just (n=2) that only we both remember this fiasco but I don't agree with that. I don't really understand how this got so so many upvotes in short frame of time especially given its history of not doing good things to say the very least... I am especially skeptical of it.

Thoughts?

Edit: I looked deeper into Sanchit's Hackernews id to find 3 days ago they posted the same thing as far as I can tell (the difference only being that it had runanywhere.ai domain than github.com/runanywhere but this can very well be because in hackernews you can't have two same links in small period of time so they are definitely skirting that law by pasting github link)

Another point, that post (https://news.ycombinator.com/item?id=47283498) got stuck at 5 points till right now (at time of writing)

So this got a lot more crazier now which is actually wild.

john_strinlai an hour ago

i unfortunately dont know enough about vote patterns on hn, or what is expected/normal voting behavior.

what i do know is that their name is etched into my mind under the category of "shady, never do business with them".

Imustaskforhelp an hour ago

david_shaw 39 minutes ago

I think the title should read "RunAnywhere," not "RunAnwhere."

Imustaskforhelp 35 minutes ago

Dang has changed the title and it seems that he may have had a minor error doing it . Must have been a typo from his side changing it and that's okay! I think that Dang will update it sooner than later.

Edit: just reloaded, its fixed now.

Imustaskforhelp an hour ago

I am just gonna link the stats of this hackernews post[0] and let public decide the rest because for context, this is same company which was mentioned in a blow-up post 12 days ago which had gotten 600 upvotes and they didn't respond back then[1] (I have found it hard for posts to have such a 2x factor within minutes of posting, that's just my personal observation. Usually one gets it after an hour or two or three.)

I was curious so I did some more research within the company to find more shady stuff going on like intentionally buying new domains a month prior to send that spam to not have the mail reputation of their website down. You can read my comment here[2]

Just to be on the safe side here, @dang (yes pinging doesn't work but still), can you give us some average stats of who are the people who upvoted this and an internal investigation if botting was done. I can be wrong about it and I don't ever mean to harm any company but I can't in good faith understand this. Some stats

Some stats I would want are: Average Karma/Words written/Date of the accounts who upvoted this post. I'd also like to know what the conclusion of internal investigation (might be) if one takes place.

[There is a bit of conflicts of interest with this being a YC product but I think that I trust hackernews moderator and dang to do what's right yeah]

I am just skeptical, that's all, and this is my opinion. I just want to provide some historical context into this company and I hope that I am not extrapolating too much.

It's just really strange to me, that's all.

[0]: https://news.social-protocols.org/stats?id=47326101 (see the expected upvotes vs real upvotes and the context of this app and negative reception and everything combined)

[1]: Tell HN: YC companies scrape GitHub activity, send spam emails to users: https://news.ycombinator.com/item?id=47163885

[2]:https://news.ycombinator.com/reply?id=47165788

dang an hour ago

The upvotes on the current post are fine - the reason you saw the submission rise in rank is that startup launch posts by YC startups get special placement on the front page (this is in the FAQ: https://news.ycombinator.com/newsfaq.html). Not every such post does, but some do.

In other words, your perception wasn't wrong, but the interpretation was off. I've put "Launch HN" and "YC W26" back in the title to make that clearer - I edited them out earlier, which was my mistake.

As for the booster comments, those are pretty common on launch threads and often pretty innocent - most people who aren't active HN users have no idea that it's against the rules. We do our best to communicate about that, but it's not a cardinal sin—there are far worse offenses.

john_strinlai 28 minutes ago

hi dang. while you are here -- are comments artificially ordered on this post?

https://news.ycombinator.com/item?id=47326953 is grey (i.e <=0 karma). my top-level comment is at 14 karma. we posted within 15 minutes of each other. their comment is higher up the page. ive never seen something like that before.

the two posts calling out unethical behavior have been living at the bottom of this post the entire time, until a couple of actually [flagged] comments ended up under them.

(edit: 16 karma now, and still lower than a 0 karma comment posted at roughly the same time)

i do not care about the karma itself, at all. but i do care to know if launch/show posts have comment sections have cherry-picked ordering or organic ordering.

Imustaskforhelp 15 minutes ago

Imustaskforhelp 37 minutes ago

Thanks dang but can you please explain there being two accounts who wrote something very small comment and one account being completely new and the other being 7 months old only being invoked in this case.

Clearly I am not the only one here as john_strinlai here seems to have had somewhat of the same conclusion as me.

Dang I know you care about this community so can you please talk more what you think about this in particular as well.

I understand that YC companies get preferential treatment, Fine by me. But this feels something larger to me

I have written everything that I could find in this thread from the same post being shown here 3 days ago in anywhere.ai link to now changing to github to skirt off HN rule that same link can't be posted in short period of time and everything.

This feels somewhat intentional just like the spam issue, I hope you understand what I mean.

(If you also feel suspicious, Can you then do a basic analysis/investigiation with all of these suspicious points in mind and everything please as well and upload the results in an anonymous way if possible?)

I wish you to have a nice day and waiting for your thoughts on all of this.

dsalzman an hour ago

[flagged]

iharnoor an hour ago

Lets go!!

Imustaskforhelp an hour ago

This is a 7 month old account which has only responded to this particular comment.

And sorry to say but I don't think that Lets go!! is a valid comment, this makes me even more suspicious.

Especially given the history and suspicions I already had.

Hacker News

by Ryan Harman

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon (github.com)

rushingcreek 2 minutes ago [-]

vessenes 2 hours ago [-]

Tacite 30 minutes ago [-]

wlesieutre 10 minutes ago [-]

Tacite a minute ago [-]

focusgroup0 39 minutes ago [-]

liuliu 19 minutes ago [-]

jonhohle an hour ago [-]

AmanSwar 44 minutes ago [-]

alfanick 2 hours ago [-]

coder543 an hour ago [-]

derefr 42 minutes ago [-]

swindmill an hour ago [-]

DetroitThrow 2 hours ago [-]

bigyabai an hour ago [-]

shubham2802 2 hours ago [-]

computerex an hour ago [-]

stingraycharles 2 hours ago [-]

glitchc an hour ago [-]

drcongo 2 hours ago [-]

Tacite 2 hours ago [-]

esafak an hour ago [-]

Tacite 33 minutes ago [-]

tristor 2 hours ago [-]

tiku an hour ago [-]

j45 30 minutes ago [-]

Tacite 15 minutes ago [-]

john_strinlai an hour ago [-]

Imustaskforhelp an hour ago [-]

john_strinlai an hour ago [-]

Imustaskforhelp an hour ago [-]

david_shaw 39 minutes ago [-]

Imustaskforhelp 35 minutes ago [-]

Imustaskforhelp an hour ago [-]

dang an hour ago [-]

john_strinlai 28 minutes ago [-]

Imustaskforhelp 15 minutes ago [-]

Imustaskforhelp 37 minutes ago [-]

dsalzman an hour ago [-]

iharnoor an hour ago [-]

Imustaskforhelp an hour ago [-]

rushingcreek 2 minutes ago

vessenes 2 hours ago

Tacite 30 minutes ago

wlesieutre 10 minutes ago

Tacite a minute ago

focusgroup0 39 minutes ago

liuliu 19 minutes ago

jonhohle an hour ago

AmanSwar 44 minutes ago

alfanick 2 hours ago

coder543 an hour ago

derefr 42 minutes ago

swindmill an hour ago

DetroitThrow 2 hours ago

bigyabai an hour ago

shubham2802 2 hours ago

computerex an hour ago

stingraycharles 2 hours ago

glitchc an hour ago

drcongo 2 hours ago

Tacite 2 hours ago

esafak an hour ago

Tacite 33 minutes ago

tristor 2 hours ago

tiku an hour ago

j45 30 minutes ago

Tacite 15 minutes ago

john_strinlai an hour ago

Imustaskforhelp an hour ago

john_strinlai an hour ago

Imustaskforhelp an hour ago

david_shaw 39 minutes ago

Imustaskforhelp 35 minutes ago

Imustaskforhelp an hour ago

dang an hour ago

john_strinlai 28 minutes ago

Imustaskforhelp 15 minutes ago

Imustaskforhelp 37 minutes ago

dsalzman an hour ago

iharnoor an hour ago

Imustaskforhelp an hour ago