Hacker News

by Ryan Harman

OpenAI unveils its first custom chip, built by Broadcom (techcrunch.com)

793 points by jamdesk a day ago

Announcement: https://openai.com/index/openai-broadcom-jalapeno-inference-...

https://decrypt.co/371971/openai-broadcom-jalapeno-first-cus...

https://www.cnn.com/2026/06/24/tech/openai-broadcom-jalapeno...

sharkjacobs a day ago

> Developed from design to production in nine months, accelerated by OpenAI’s models

> the use of OpenAI models to accelerate parts of the design and optimization process.

I wish there was more about this. As is I kind of have to assume that this is just meaningless marketing, like saying development was accelerated by Microsoft Office or their 5k LG Ultrafine 40-inch monitors.

Like, if this was as big a deal as it kind of vaguely implies, they would be making a bigger deal of it, right?

zgao a day ago

Chip CEO here. It really depends on what "design" or "production" means. Does "design" mean that the design was complete? Does "production" mean the beginning of production, i.e. tapeout? If measuring from RTL-freeze to tapeout, this is a fairly typical (even somewhat unimpressive) timeline (accounting for some unexpected issues) for a large, complex 3nm chip. If measuring from concept (no RTL at all, block diagram of architecture) to tapeout, this is an amazing timeline. The truth is probably somewhere in between. A more concrete statement would use actual technical milestones and gates.

otterdude a day ago

Not a chip CEO, but I read this article and thought that they're working on some kind of application specific chip only for serving models. Similar to how an FPGA can optimize certain tasks.

Given constant weights / biases of a Transformer / DNN you could use pipelining to feed forward calculations through the array one layer at a time. For DNN's with thousands of layers you might see 1:1 speed up per layer channel.

I doubt they would undergo this process for marginal gains.

kmacdough 8 hours ago

zgao 20 hours ago

xdavidliu a day ago

pama 16 hours ago

If you look at the timelines for the hiring of the hardware team, this was an extremely fast and high risk implementation from concept to tapeout. Amazing it works at all during bringup.

nonethewiser a day ago

>If measuring from RTL-freeze to tapeout, this is a fairly typical (even somewhat unimpressive) timeline (accounting for some unexpected issues) for a large, complex 3nm chip.

Even for a company’s first design?

hailwren a day ago

zgao 20 hours ago

formerly_proven a day ago

Aurornis a day ago

The hardware description languages (HDL) used in chip development are like programming languages. The existing models understand them and can do a lot with them. You don’t need to have separate, specialty models designed for this work to use LLMs in chip design workflows.

Design verification also involves a lot of traditional programming which benefits from LLMs.

So it’s not meaningless at all. You could download some of the open source chip design software today and the LLMs could even help you get started on your own tiny chip if you are so interested.

knicholes a day ago

I tried making a button using Claude entirely (including the 3D printed enclosure) and it effed up pretty hard with the traces and the header spacing. The project was a big red arcade button that plays the "ah-my-groin.mp3" when pushed (from Simpsons). It did cool work on saving battery life, and the 3d enclosure was awesome, but yeah, I'm convinced I'd have to do another version or two of the custom chip until it came back right. I used a Blender MCP for the 3d modeling. I used a KiCAD MCP server for the chip design/validation.

I think we're not there yet. I've been meaning to look at this flux.ai to see if it has the prompts/workflow worked out better than what I was able to cobble together in a few hours. Maybe Alteryx's MCP server would have been better. I'll try that this weekend for another board I've got.

Aurornis a day ago

rpcope1 a day ago

chamomeal 13 hours ago

ses1984 a day ago

The question isn’t whether or not they employed a particular tool, the question is how big of an impact did it have.

nradov a day ago

Most HDL code is locked up behind corporate firewalls and not available as training data. While LLMs can handle it to an extent there's a lot of room for improvement. I'll bet that OpenAI and their competitors are racing to license this IP from major hardware vendors in order to compete in the chip design vertical.

tonfa 20 hours ago

bsder 19 hours ago

doxeddaily a day ago

This reminds me of the dude on youtube building a chip fab in his shed.

einpoklum 5 hours ago

> The existing models understand them

No they don't.

holoduke 19 hours ago

One day we can design our own pcb with chips, hardware and other io. Companies will accept these as files and you can collect your pcb the same day. I think in China they are doing this already

remexre 16 hours ago

IshKebab a day ago

> The existing models understand them and can do a lot with them.

In my experience they are not especially good at SystemVerilog. There's a lot of knowledge about it that is locked behind paywalls and it's very niche.

My guess is the "from scratch" here is quite the exaggeration. Otherwise why did they need Broadcom?

whynotminot a day ago

aseipp 21 hours ago

cloudengineer94 7 hours ago

aurareturn 21 hours ago

Broadcom already has a ton of IP for AI SoCs. I'm guessing the hard parts of this inference chip was already designed by Broadcom and OpenAI simply told Broadcom what it wanted. It's likely very similar to Google's TPU.

  Early testing shows that the first-generation accelerator will deliver performance per watt substantially better than current state-of-the-art

What is substantial here? Vera Rubin is shipping in volume later this year and it is expected to be 10x more power efficient for inference than Blackwell.[0] Even if they're already taped out the chip, getting bugs fixed, getting chips manufactured, getting HBM allocation, getting a rack design, hooking them up together, putting them in a data center will likely take at least another 12 months or likely more. By the time this chip is in data centers in volume, they're likely competing against Vera Rubin Ultra or maybe even Feynman.

Personally, I don't think OpenAI should have invested in this project. It's too early for them. They should have focused on models like Anthropic and win there. When they're profitable, they can take on these projects.

The risk here is very high for OpenAI because AI has a hard cap in energy. If you have a gigawatt, you should only install the best chips. If Nvidia's chips are better, then this is a wasted project and likely wasted billions.

[0]https://developer.nvidia.com/blog/scaling-token-factory-reve...

AtlasBarfed an hour ago

"Substantial" seems like a damning word.

So one of my pet theories I haven't seen in general discourse is that AI came from the massive vector processing jump available commercially in GPUs when it left CPU bound processing behind. That's a factor of 100x-1000x of processing power.

AI is not-quite-there, and to get even another leap might take another 10-100x processing power.

Now... what? ASICs probably won't deliver even a 10x? There's only so much you get out of node shrinks.

"Substantial" doesn't even mean twice IMO. "Substantial" almost sounds like ... 15% better?

cptskippy 21 hours ago

Why do you assume Broadcom has a ton of IP for AI SoCs but hasn't done any of the other work around data center scale deployments?

aurareturn 20 hours ago

dofm a day ago

Right. There are two possible meanings and shades in-between:

1) OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)

2) OpenAI designed test/verification models and kernels that could be run on the simulated hardware to test its performance

As you and others have said, it's hard to trust when they are happy to write something that could easily only mean the latter but sounds like the former.

lovasoa a day ago

3) The engineers working on the chip used ChatGPT from time to time.

Catloafdev a day ago

fl4regun a day ago

reducesuffering a day ago

wongarsu a day ago

Or OpenAI accelerated the design and optimization process by summarizing emails exchanged during the design and optimization process, or made it possible to ask an AI questions about meeting notes

Aurornis a day ago

> 1) OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)

Chip design languages (HDLs like Verilog or VHDL) are well understood by LLMs. They don’t need specialty tools to use GPT-5.5 or other LLMs with them.

You could even try it yourself with open source chip design tooling if you wanted to see it.

dofm a day ago

dpe82 a day ago

wmf a day ago

https://dl.acm.org/doi/10.1145/3785362

https://developer.nvidia.com/culitho

https://www.synopsys.com/blogs/chip-design/analog-layout-syn...

https://arxiv.org/abs/2302.06415

etempleton a day ago

I feel like they would be very specific if it was no.1.

scrollop a day ago

Perhaps they used gpt 5.5 mini to draft emails. Create a coffee schedule.

oceanplexian a day ago

> OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)

Why is that a bold and unlikely claim?

Are you saying that AI, which has been proven to cure diseases, solve our hardest math problems, write complex computer code and generate entire generated worlds and HD video from a simple prompt would somehow be like, my bad, I guess I can't design chips?

smokel a day ago

dofm a day ago

cess11 a day ago

nixon_why69 a day ago

There is a lot of verilog out there, it's pretty feasible that they had AI assistance writing more to design their chip.

It doesn't have to be revolutionary, it could just be AI-assisted design and lined up well enough with their operations for a custom ASIC to be worth it.

KeplerBoy a day ago

Also there's some much boilerplate around everything. Writing a testbench with codex is extremely feasible. This is the kind of verifiable feedback loop the agents shine at.

u1hcw9nx 7 hours ago

Written with AI is the new written in Rust. Both are nonsensical statements and tell noting about the quality of the software.

Without context, both are warnings about the quality of the developers.

blitzar 21 hours ago

> the use of email, spam filters and spellchecker to accelerate parts of the design and optimization process

honestly you don't realise how much more efficient it is until you are stuck using the wrong flavour of outlook, the spam filter breaks or sloppy spelling, punctuation and grammar force you to clarify details needlessly.

nickvec 21 hours ago

I feel like "the use of OpenAI models to accelerate parts of the design and optimization process" just means that engineers were using ChatGPT to sanity check their designs and suggest potential optimizations, though that's just my take (and I'm quite cynical about AI marketing in general!)

Kiro 8 hours ago

I think this kind of "hard work" is a perfect fit for AI, and something where the complexity for a human is incorrectly extrapolated to LLMs.

Tirelessly wading through heaps of specifications and documentation with very clear goal definitions is hard for a human but easy for an AI. Meanwhile, taking UX and edge cases into account in a business application is easy for a human but hard for an AI.

SCUSKU 21 hours ago

My girlfriend works at Broadcom doing chip design, and based on what she's told me they JUST got claude code like 3 weeks ago, so I really doubt this means anything beyond them vibe coding some scripts or something...

figassis a day ago

VHDL, VLSI are well documented languages, with well build test and verification frameworks and harnesses. Even just by iteration you could get there if you have the money to pay for it.

FanaHOVA a day ago

NVIDIA already designs most of their chips using AI. Why would you assume it's meaningless marketing?

fecal_henge a day ago

Perhaps because they are suggesting what they are doing is novel.

DoctorOetker a day ago

seydor a day ago

realistically, how hard are AI accelerators to design?

WithinReason 8 hours ago

The hardware? Not too difficult, there are dozens of startups. The software? Only NVIDIA could do it so far sufficiently well.

sentinalien 5 hours ago

therealcamino 14 hours ago

Uh, pretty hard?

HarHarVeryFunny a day ago

I would assume they've already made as big a deal of it as they can without outright lying too much. Read the rest of the press release.

FWIW, Google is now on their 8th generation TPU, having put out the last 4 generations on a 1-year cadence.

davidpapermill 2 hours ago

> Google is now on their 8th generation TPU

Remarkable that the TPU pre-dates the attention paper. Was a solid bet on energy efficient dense matrix multiplication and has stood the test of time.

xnx a day ago

AlphaChip is what a chip design with AI is. I'm very suspicious that OpenAI has anything like this or they would be bragging about it.

https://deepmind.google/blog/how-alphachip-transformed-compu...

shellcromancer a day ago

Probably obvious but still omitted in the OpenAI post: chips are being made by TSMC [1]. Wasn't sure if Intel got it.

1. https://www.investing.com/news/stock-market-news/openai-unve...

HarHarVeryFunny a day ago

I just read a claim on Twitter that the reason these companies (Google and Amazon as well as OpenAI) are using Broadcom isn't just for design expertise, but because Broadcom have allocation agreements in place with TSMC and the memory manufacturers.

alephnerd a day ago

Most design partners have allocation agreements. The thing is Broadcom is an absolute GIANT in the ASIC design space, and it's closest competitor Marvell is a fraction of it's size.

There are a lot of large tech companies that most of HN has never heard about that completely dominate entire segments.

ahartmetz a day ago

...and because most hardware sales except AI accelerators are down due to RAM prices, Broadcom probably can't otherwise use their allocation at TSMC.

NavinF 21 hours ago

a_conservative a day ago

I recently put 2+2 together.

Broadcom has become wealthy by being Google's TPU hardware partner, including sharing their TSMC capacity with Google, and evidently now they are doing the same thing with OpenAI. What a brilliant way to take advantage of the AI gold rush!

I wish they weren't using their piles of money to extort money out of the software industry like they are with VMWare and Bitnami.

kccqzy 21 hours ago

Well Google has reduced reliance on Broadcom already. They found a new hardware partner, MediaTek, that’s probably much, much cheaper than Broadcom.

https://finance.yahoo.com/sectors/technology/articles/broadc...

mschuster91 19 hours ago

alephnerd a day ago

> Broadcom has become wealthy by being Google's TPU hardware partner...

Kinda, but not exactly.

Broadcom cornered the enterprise infra and security market in the late 2010s and early 2020s after acquiring CA Technologies, BMC (EDIT: Did NOT acquire them, they were considering it back in 2018 but decided against it and KKR ended up acquiring them), Symantec (which they bought instead of BMC), and VMWare and were able to make a strong cybersecurity story during the late 2010s cybersecurity and SaaS boom.

That gave them plenty of cashflow that helped subsidize their hardware business when hardware was not viewed as hot as it is today.

Additionally, Broadcom is GCP's marquee customer and has been for a little under a decade so they were able to make a sweetheart deal where all that software businesses at Broadcom would be exclusively using GCP and in return GCP would working with Broadcom to design it's silicon and source infra needed for their DC buildouts.

Ironically, the DoJ blocking Broadcom's acquisition of Qualcomm was the best thing it ever could have done for Broadcom, because it gave Broadcom the dry powder to dominate the Enterprise SaaS and build a strong niche in the cybersecurity space.

> piles of money to extort money out of the software industry

From personal experience, executives and leadership who started off in the electronics and hardware industry are much more vicious and cutthroat than their peers who started in software.

Working in an industry that historically had to deal with high commodification, low margins, and long tail sales leads to leadership that can execute. Additionally, no one climbs the leadership ladder without having spent years as a line-level engineer, but that's true for software as well to an extent.

Edit: can't reply

> Did they acquire also BMC?

Nope.

Broadcom was considering acquiring them in 2018 but decided not to go through with the opportunity and KKR jumped in.

vb-8448 a day ago

a_conservative a day ago

nickpinkston a day ago

This is very cool to see - seems like soooo much efficiency waiting to be unlocked at the chip level.

What's everyone think of Taalas?

They're actually burning the LLM model into the silicon, with some onboard memory for fine-tuning. They claim huge cost / latency wins.

Super fast demo live at: https://chatjimmy.ai/

https://taalas.com/

https://www.reddit.com/r/singularity/comments/1r9frzk/taalas...

jsenn 5 hours ago

Their demo is almost unbelievably fast, but as I understand it, the limitation of Taalas's strategy is KV-cache. This grows with context length, so either needs to be stored in SRAM (small) or streamed in (slow). Even for a tiny model like the Llama 8B they have in their demo, the KV cache will be ~64kb per token at 8-bit quantization, so at a 1,000-token sequence length you are already at 64MB of SRAM for a single user. This is probably why their demo only lets you generate 1,000 tokens: they can't go beyond that without slowing down inference.

So I'm curious what their strategy is. It seems to me that the options are: 1. Target smaller usecases that can live with a tiny context window 2. Use huge amounts of SRAM (at which point they look like Groq or Cerebras) 3. Make it up with extreme KV-cache compression/quantization 4. Run linear-attention/sliding window attention models

Other commenters have mentioned robotics as a potential application, which sounds interesting.

kccqzy 21 hours ago

> seems like soooo much efficiency waiting to be unlocked at the chip level

Well if you are exclusively using GPUs that are general purpose, of course you leave so much efficiency on the table. That’s why Google started making TPUs more than a decade ago. I remember that kerfuffle when Google fired Timnit Gebru when Gebru’s paper used GPUs to calculate the environment impact of LLMs while ignoring the efficiency of TPUs; this basically made Jeff Dean very angry due to that wide efficiency gap.

redox99 16 hours ago

These NVIDIA GPUs aren't general purpose in the way that you think. They can't even run games. Nvidia blackwell is probably slightly more efficient than TPUs for training. Do you really expect a 4 trillion company with the majority of its revenue being AI for some years now, not to have built its flagship product fully around AI? The GPU name stuck around, but they are pretty terrible at graphics.

The real efficiency win in these chips is that they are made for inference only. You can throw away the vast majority of a chip if you only need a few ops, a single precision (like INT8 or FP8) and don't need ultra fast interconnects.

jacques_chester 21 hours ago

That ... wasn't the kerfuffle

janalsncm 17 hours ago

Herring 18 hours ago

qnleigh 10 hours ago

I haven't read any of these papers, but given the environmental impact of LLMs in 2026, it seems like Timnit Gebru has been thoroughly vindicated...

Catloafdev a day ago

It'd be cool to see more of this type of thing, but I have to imagine the ability for it to be updated to a brand-new model as new models come out is limited. If that is the case, it's going to be an extremely hard sell.

NitpickLawyer a day ago

> extremely hard sell.

It really depends on the pricepoint at which they can get a board. If they can do a ~32B model for 1k$ and a size of an external HDD, I'd buy one now, even knowing that it won't be upgradeable / the model remains fixed. The speeds they've shown are a quality of its own, and there's plenty you can do with such a model and faster than instant responses.

nemonemo 21 hours ago

runeks 6 hours ago

If performance per watt is 100x better than GPUs (as GP link claims) then I don't think it's a hard sell at all. That's actually a cost reduction that matters.

empath75 a day ago

You don't need SOTA models for all tasks, and being able to do more routine tasks at something like 10% of the cost and 70x speed unlocks LLM use for things that are just unthinkable now (bulk classification tasks, real time speech interaction, etc)

wongarsu a day ago

A hard sell right now. The rate of change will slow down

gpm a day ago

ianm218 21 hours ago

cmrdporcupine a day ago

I think the model they chose is out of date and hard to sell, but there are plenty of use cases where today's dumb small models are fine. A Qwen 3.5/3.6 or Gemma 3 model on silicon at those speeds would be genuinely world changing even if it's only 1-3B params. Such a model at those speeds will remain extremely useful even over a 5-6 year timespan, I think.

If you consider the places you could deploy it -- with no network access, and at those high speeds... very useful .. for adding vague "common sense" fuzzy thinking to all kinds of applications that right now piss consumers off with poor UX. Esp if the model can do voice-to-text and text-to-speech well (some of the smaller models can)

crote a day ago

mdp2021 16 hours ago

martythemaniak a day ago

In a chatbot, 17k tok/s is a neat but nearly useless showcase. In a coding agent it is a meaningful improvement. In robotics, it could be an absolute revolution.

8B models aren't useful in general, but for specific use cases they can provide an enourmous amount of intelligence - nVidia's Tesla/Waymo competitor is a 7B LLM with a 2B diffusion model, and running that at those speeds could be an order of magnitude cheaper than existing solutions.

hadlock a day ago

17K tok/s is approaching realtime motor cortex needs for a robot with ~12 actuators (bipedal humanoid) and an IMU. I don't know how many parameters a motor cortex would need but 8B feels like it is within 2 orders of magnitude.

nok22kon a day ago

cruffle_duffle 21 hours ago

Bumping the speed of these things would be more than meaningful. It would be a massive game changer.

I assert like 80% of this “multi agent parallel workflow” business is simply a workaround to models being soooooo slow. Like as the dude driving these things… you kick it off and twiddle your thumbs waiting minutes to hours sometimes for all the inference and token generator to finish. So you dispatch multiple workstreams in parallel to be more efficient.

I assert that if the model was even 10x faster we’d be using these things radically different. You’d be doing things that are currently time prohibitive. At 100x, holy shit will software dev get crazy. You’d be kicking off hundreds of parallel workers attacking a problem from every angle and stuff. Who even knows!!!

And the thing is, 10x will absolutely come and probably even 100x. And it will be sold like a video game cartridge or something depending on how the actual model gets “baked” into the hardware. No remote inference at all.

Imustaskforhelp a day ago

Could you give me some example how in robotics it can be an absolute revolution?

My understanding is that robotics doesn't really rely much on LLM's in the first place but rather other things.

Is the thing that you are suggesting that it would ingest all real time data and then reason through it at an incredibly fast speed and then act on it and re-iterate? I might imagine some problems with this though I am not a robotics engineer and perhaps someone who deeply understands this topic can give more information.

nok22kon a day ago

martythemaniak a day ago

typ 15 hours ago

Low latency is nice. But it would be more interesting if they could demonstrate the efficiency of energy consumption.

flumes_whims_ 3 hours ago

Tokens/seconds and watt-hours seem related?

rebeccajae 20 hours ago

It seems technically interesting, but they seem very sparse on details. I don't know if I like the idea of a single unchanging model forever on a chip. How much more expensive would the silicon be if they used rewritable ROM for the weights? Such an arrangement would permit fine-tunes of the model it was designed for, which might minimize concerns about the model becoming outdated.

mdp2021 17 hours ago

There is no memory storage of weights in the Taalas cards but translation of the weight multiplier into a circuit.

dcchambers 21 hours ago

I think hardware like this is the future for LLM-providers once we reach a point where the models aren't advancing much any more. You could argue we're close now.

The hyperscalers like AWS will made great use of these to serve up models that will be relevant for several years. But right now, we're still seeing significant bumps in model quality every couple of months - especially with open-weight models like Deepseek/Kimi/GLM.

Until that point, though, I don't see how this is ever going to be cost effective vs general purpose hardware.

I also think we'll see miniature versions of this baked into mobile hardware for super fast and efficient on-device LLMs.

WASDx 21 hours ago

I see only these two possibilities:

1. If LLMs keep improving, burning models onto silicon becomes obsolete too fast and is not worth doing. Outcome: We keep getting better LLMs. 2. If LLM improvements slow down, they will be burned onto silicon. Outcome: We get faster, cheaper and energy-efficient LLMs.

Either way sounds great to me. It will certainly be a mix so we can even get both.

londons_explore 21 hours ago

I wanna see an inference chip where the weights are part of the rom of the chip.

There would be 1 multiplier per weight (and since they're constant, the whole thing turns into a bunch of simple adders), and the total pipelined system throughput would be one token per clock cycle.

That means you can probably have millions of users simultaneously using a single bit of silicon, with perhaps 500 million tokens per second coming out the output bus.

Downside is this chip would be huuuuge - a whole wafer.

Wafer level faults probably won't matter though - neural nets are resistant to a few missing or wrong weights.

Due to the speed the industry moves, you'd want to race from model weights to production super fast, make 50 wafers, use them for a year, then bin them when that model is obsolete.

sometimelurker 21 hours ago

this appeared some time ago, https://taalas.com/, but I'm sure there's others thinking these same thoughts. this would be best for small models imo, nothing frontier because that changes too fast

1e1a 20 hours ago

you can try it out here: https://chatjimmy.ai/

Meetvelde 16 hours ago

agazso 10 hours ago

Smaug123 21 hours ago

By the way, you've seen Cerebras? It's not gone as far as what you described - loads of cores and RAM but you still load up the weights onto it as software and they need to be streamed into the chip for large models - but it is a whole wafer.

trouve_search 21 hours ago

Cerebras is a whole lot of SRAM, basically a ton more L1/L2 cache, hence increasing throughput.

They're pretty supply constrained right now though and their production costs seem prohibitive.

The interesting players at the moment are from Toronto: taalas (print the model onto the silicon) and tenstorrent (dataflow programming based hardware)

londons_explore 20 hours ago

There is a huge downside to weights being modifiable - it means you need to have multipliers (not simply adders), and SRAM to store those weights.

I suspect for equal performance, that's probably a 5x increase in silicon area (and therefore cost).

whazor 5 hours ago

You don't need a single wafer, you can split the model into many smaller different chips and connect inputs/outputs.

Skip VHDL and directly go for GDSII / OASIS. Try to find similar vectors so you get re-usable blocks.

You can dynamically calibrate a chip by fine tuning output.

phkahler 21 hours ago

>> I wanna see an inference chip where the weights are part of the rom of the chip.

I've been wondering about that for a while now. For a lot of tasks putting weights in ROM is probably OK. OTOH:

>> There would be 1 multiplier per weight...

I'm not sure that is a good idea. Maybe if its quantized down to 2 bits... Otherwise maybe a small ROM near each multiplier (or row of them or whatever) so the multipliers could handle N distinct matrix operations without having to move the data from far away.

Another fun thought is to have a row of MAC units on DRAM so a DRAM row would be a vector. Row size might be 64Kbit or 8K weights if they're 8bit. This also keeps the weights and calcs on the same chip. I'm not sure this would put enough multipliers on one chip though. Systolic arrays can have tens or hundreds of thousands each doing one op per clock cycle.

cyptus 21 hours ago

analog chips could also be very interessting instead of using digital signals and processing them against the weights in the ROM. I have no idea if that scales with such big models though.

mdp2021 17 hours ago

freakynit 13 hours ago

This may be extreme, or, completely stupid, but, why are we not using genetics to "grow" chips in a chemical soup yet? Similar to Verilog/VHDL, don't we have some similar language to express circuits using gene sequences?

marcosqanil 9 hours ago

I've worked for one of Europe's biggest synthetic biology labs and I know lots of biologists are low-key interested, but current players in semiconductors see it as kind of a tarpit.

IBM used to have a program using DNA origami for lithography back in 2009, which makes sense as lithography masks are a pain to make. I really wish I know why the program was stopped, but most of the researchers are retired by now.

As to whether you can just "grow" the whole chip from scratch, the answer is probably, but it would require lots of non-trivial scientific discoveries. For instance, we can't really make sizable chips using DNA without horrible defect rates. Biology is much better at making redundant rube goldberg machines, than very precise machines with no tolerance for errors.

I think we'd have a better chance of success if we made very weird kinds of chips that better took advantage of the medium, perhaps even something that we "train" rather than just use out of the box.

I'd love it if anyone here knew more about this !

freakynit 8 hours ago

whalee 12 hours ago

We lack robust frameworks for 'forward engineering' stochastic thermodynamic computation over molecular free-energy landscapes (which is basically what a "chemical soup" is doing) like we do for analog/optical/digital computing. This is why, as a field, medicine is so heavily empirical and reverse engineering oriented.

freakynit 10 hours ago

AceJohnny2 12 hours ago

Are referencing the 1998 short story "Taklamakan" by Bruce Sterling?

freakynit 10 hours ago

fallat 13 hours ago

Do that at scale

freakynit 13 hours ago

voidUpdate 10 hours ago

> "Downside is this chip would be huuuuge - a whole wafer."

Why don't we have chips like that? If a CPU the size of a postage stamp can do x amount of performance, imagine how much performance you could get if you used an entire wafer of chips running in parallel. Obviously there would be certain use cases, like you couldn't fit an entire wafer in a phone, but still

ngomez 9 hours ago

Using the space of an entire wafer for one chip would result in extremely low manufacturing yields. Even with state of the art silicon cleanrooms, there will still be defects in parts of the output.

With CPUs and GPUs, chip makers can disable faulty cores and bin them as lower SKUs to get some yield out of it. But if you're using an entire wafer to embed weights, and a speck of dust causes a printing defect that makes the weights wrong, the entire wafer is worthless.

voidUpdate 7 hours ago

Jyaif 5 hours ago

cactusplant7374 7 hours ago

kimsey0 7 hours ago

We do. The Cerebras line of Wafer Scale Engines is exactly an entire wafer of cores running in parallel with fast memory next to each one. It's intended for very high throughput LLM inference. https://www.cerebras.ai/chip

WithinReason 5 hours ago

One token per clock cycle at 1B parameters would imply 2 ExaFLOPS, consuming about 10 KWs

yuriyguts 21 hours ago

I've also been thinking about this. Although the forward pass of a transformer model also involves some heavier operations like normalization, reciprocals, exponentiations or other non-linearities (GeLU, SiLU) which may (though typically don't) involve learned weights as operands.

Salgat 18 hours ago

Supposedly memristors would be ideal for this (and it would be reprogrammable), but then again, memristors seem to be the carbon nanotubes of the computing world.

mdp2021 17 hours ago

> weights [as] part of the rom of the chip

Not really that: you are pointing to Compute-In-Memory (CIM) - techniques where the data (here, a multiplier value) is part of the processor (here, the multiplying circuit).

The problem of "fetch and process" is bypassed completely architecturally: the data is there where the processing happens - it's not moved, there is no latency.

zkmon 20 hours ago

firmware upgrade would mean flashing a huge BIN file.

HDThoreaun 16 hours ago

How would the pipelining work when the next token depends on the last token?

cruffle_duffle 21 hours ago

“ Wafer level faults probably won't matter though - neural nets are resistant to a few missing or wrong weights.”

Brain science people “love” traumatic brain injury cases because it can help explore what happens when bits of the “brain wafer” get damaged. We’ve learned a lot from such things.

I wonder if people are intentionally “destroying” parts of the model weights to learn more about what happens? Like could you strategically wipe a gig of the model so it’s “all zeros” and see what happens?

I have to wonder

zurfer 21 hours ago

This is called mechanistic interpretability. There is lots of fascinating insights already since you can do basically everything down to the neuron or weight level thousands of times. The human brain is many orders of magnitude harder to make sense of.

sometimelurker 21 hours ago

Cantinflas 20 hours ago

mdp2021 17 hours ago

Of course tampering with chunks or nodes in the NNs is a way to study the "spawned" (through gradient descent etc.) configuration and "reverse-engineer the black box" to get "AI transparency".

Anthropic published an important work around one year and a half ago.

mdp2021 11 hours ago

Computer0 21 hours ago

Reminds me of Golden Gate Claude (https://www.anthropic.com/news/golden-gate-claude)

maz1b a day ago

Pretty huge move. Google and their TPUs are looking infinitely more prescient as I think they are on their 7th generation, along with the offshoots it inspired like the LPU and even others, perhaps like Cerebras and their Wafer Scale Engine.

However, based off first impressions, it seems like this is meant for inference side, and not training, which is also an interesting choice.

skeledrew a day ago

Training is pretty much a 1x cost, and efficiency there is already on the way down with architectural improvements. Inference though is an ongoing cost which over time takes orders of magnitude more resources, so focusing on making that far more efficient means way greater gains over time.

ggcr 7 hours ago

With Reinforcement Learning, inference is very present in post-training stages now too

forrestthewoods a day ago

Inference costs are higher than training now. I think.

Nvidia is king of general purpose training chips. But inferences can be specialized.

lugu 18 hours ago

What makes you think this? With wider adoption the ratio shall shift in favor of inference. And API price is becoming more important than SOTA capability.

forrestthewoods 18 hours ago

cactusplant7374 20 hours ago

Cerebras's Codex Spark 5.3 has been a huge flop. Small context window and old model. But hopefully they can improve so that we can benefit from 1000 tokens/second with GPT 5.5.

zer00eyz a day ago

> early testing shows that Jalapeño will deliver performance per watt substantially better than current state-of-the-art

We're starting to see what really matters here, and though this is hand wavy the TPU makes similar claims.

I think googles memo about having no moat still stands (see: https://newsletter.semianalysis.com/p/google-we-have-no-moat... if you are unaware). It kind of makes sense that all of this is looking more like 60's to 90's IBM, DEC, Cray, Sun and the hardware race that happened then. History doesn't repeat but it often rhymes and I suspect that these efforts will follow the same trajectory.

granzymes a day ago

To be clear, that is not "Google's memo". It's a memo by a guy who happened to work at Google. There is a diversity of opinions at a company that employs 180,000 people.

deweywsu a day ago

With the pace of AI, and with AI helping to pave the way for faster/better AI, I keep wondering if hardware like this will become obsolete well before it has a meaningful ROI. Huge AI models can be run with less resources already through quantization and offloading, but that's just the beginning. One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop. Think that's crazy? Look at the size of the first hard drives. The IBM 350 was a disk with 50 platters, 24 inches in diameter, that held 3.5Mb, and was leased for today's equivalent of $35K.

https://www.computerhistory.org/storageengine/first-commerci...

Compare that to a multi-terabyte ssd. Now apply that improvement to how an LLM is architected and run now. With AI assisting, it won't be long before a leap occurs and these data centers with all their current ultra-cutting edge Nvidia cards are nearly obsolete overnight.

admax88qqq a day ago

> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop.

But if you have such a breakthrough could you not also apply it and run 200T models on todays datacenters?

pennomi a day ago

That assumes scaling laws still hold up. A bigger model might end up only incrementally more intelligent.

ACCount37 21 hours ago

Not only you could: you would also want to.

The likes of Mythos show that the scaling laws are real, and you can x5/x2 the total/active params and get meaningful gains. If "inference per param" gets cheaper? Up the params and get more intelligence for the same price.

deweywsu a day ago

Quite true

simonebrunozzi a day ago

Interesting comment, but the comparison with hard disk drives is probably unfair.

The IBM 350 was commercialized 70 years ago; it took 70 years for someone like you to be able to compare that to a multi-TB SSD.

Furthermore, nothing says that Moore's Law will necessarily apply to LLMs, for decades to come.

deweywsu a day ago

Very true, and all I am basing my comment on is the improvement in speed AI has demonstrated when applied to software development, and inferring it might enable a similar 10X or 100X improvement in both hardware architecture as well LLM structure and/or interface methods. If that speed improvement applies to performance of AI, that could mean the 70 years it took for people to improve storage technology might be able to be compressed to achieve a step change in AI performance in a drastically shorter timeframe.

LZ_Khan a day ago

I think Jevons Paradox and scaling laws will make this not the case. If bigger models are always better (which seems they are), then will always need high-end hardware.

gdiamos a day ago

Usually breakthroughs in computing lead to more usage of computing, not less.

3abiton a day ago

> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop.

I think there will be specialized hardware (beside GPUs) that would be custom made for LLMs. Yes TPUs exist, but mainly for datacenter. GPUs exist, but they are adapted from mainly graphic application. Once all the demand from data center dries up, innovation will kick in.

andriy_koval a day ago

> I keep wondering if hardware like this will become obsolete well before it has a meaningful ROI

it will build expertise/infra/know-how foundation for next generation of hardware

dwa3592 a day ago

True but as someone else pointed out; at that time we'd be interested in running 200T parameter model rather than 200B. Why, you might ask? Law of human laziness - a human will become as lazy as the technology allows it to. With the 200T or 20,000 T model - I'd be heavily incentivized to ask it to make the bread for me that I enjoy making now or create a movie for me (featuring myself) which will maximize the dopamine production in my brain.

zabriel_goss a day ago

I agree with you. Stepping stones are still a part of getting there, if only to be briefly useful.

hyhatqtv a day ago

Looking at the development of memory bandwidth, capacity and prices over the last 10 years there is little indication that’s likely.

v5v3 a day ago

>designed for initial deployment by the end of 2026 and expanding in the years ahead,

So after the IPO and will be featured heavily in the IPO sales brochure as a future promise?

I'm sceptical over any pre-IPO announcements.

estetlinus a day ago

Yeah, the narrative feels like pre-IPO shenanigans, and it looks like the lid on my laundry basket. I wouldn’t be surprised if this is a con.

Culonavirus 19 hours ago

Con or not it is an obvious thing they have to do. Might as well promise.

IIRC their biggest cost they're "hiding" in their financials by doing creative accounting is inference (putting it into marketing and whatnot, in the billions)... if they can't hide it in their S-1 then they have to rationalize it, either by a) increasing the prices (not gonna happen, with token based billing orgs are already watching their codex spends) or b) lowering the inference costs. You can lower that by "soft optimizing" (dumbing down) your models but then you have the other players breathing down your neck (see quick rise of Claude), or actually optimizing, in software and in hardware. We're like 5 years into the rise of LLMs, there's not THAT much left on the table unless you write to the metal you specifically designed for your models (and I'm pretty sure the lack of "nvidia tax" would help with covering most of the r&d costs of a custom solution, at least in the long term).

50% cheaper inference without losses in fidelity would unquestionably be a massive win for OpenAI.

frandroid a day ago

Who's IPO? Broadcom and Google are already listed, obviously.

airspresso a day ago

OpenAI's upcoming mega IPO

awestroke a day ago

OpenAI, the non profit organization, is going to become a publically traded profit maximizing corporation

hk__2 a day ago

signatoremo 20 hours ago

I haven't seen this discussed here:

So far, the accelerator is showing cost savings of roughly 50% compared with typical AI graphics processing units, Broadcom Chief Executive Officer Hock Tan said in an interview. - [0]

50% cost saving. The picture changes so quickly, there are still a lot of low hanging fruits, that I find any discussion about whether a vendor has moats, or if they can recoup investment, is moot and futile.

[0] - https://www.bloomberg.com/news/articles/2026-06-24/openai-an...

wmf 20 hours ago

If GPUs have 75% margin then 50% cheaper is no surprise.

epolanski 19 hours ago

Operational costs far outweight hardware cost.

lugu 18 hours ago

Schiendelman 19 hours ago

"Typical" is doing a lot of work there. That could mean much older chips than Nvidia is currently selling.

signatoremo 14 hours ago

"Typical" usually means typical, i.e. median. Also they are claiming cost saving, not performance. The saving would even be more impressive if much older chips are less efficient than the newer ones -- costing more to run.

chris_money202 a day ago

Microsoft, Google, and Amazon also do this, but they also have the hyperscaler datacenter infrastructure to host the chips. Designing and taping out the chip is one thing, packaging, cooling, deploying, powering, and managing the fleet is another stack entirely. Wonder where that will come from?

wmf 21 hours ago

Don't forget Stargate.

Update: Somebody on Twitter said it's going to be hosted 50/50 at Microsoft and Oracle.

chris_money202 21 hours ago

I forgot Stargate

cpldcpu a day ago

I had Opus 4.5 design an LLM inference engine in verilog, including firmware and automated verification a while ago: https://github.com/cpldcpu/smollm.c

It's of course far from optical. But lowering the implementation through the abstraction levels turned out to be extremely powerful.

smetannik a day ago

Can you suggest some tutorials for Verilog and FPGAs in general?

I have a spare Tang Nano 9k but I don't feel confident about blindly asking Claude to vibecode me a solution and still would like to have at-least a basic level of understanding.

cpldcpu 20 hours ago

hm.. has been quite a while for me. The good thing about the Tang Nano is that it is supported by the Yosys open source toolchain. There are quite a few resources on the web when you search for the combination.

jared0x90 21 hours ago

the hdlbits course is really good imo

digitaltrees a day ago

We’ve entered the “if you care about software, build hardware” phase of AI

some-guy a day ago

I have been eyeing what Taalas is doing [1] by making pure hardware models. The speed is absurd.

[1] https://taalas.com/products/

mikewarot a day ago

They talk about products, but they don't sell the hardware, thus they don't really have a product, just a service.

I know, it's nick picking, but when people can just reach in and take services away, like Fable/Mythos, hardware is the only thing worth buying.

LoganDark a day ago

arcanemachiner a day ago

jupr a day ago

crazy product. their test chatbot feels a db query.

https://chatjimmy.ai

digitaltrees 19 hours ago

I have and it was wild. Paradoxically it made me realize that I actually like reading the stream as it's generating.

wmf a day ago

“People who are really serious about software should make their own hardware.” ― Alan Kay

zwarag a day ago

What are the other phases. Or what are you referring to in general?

digitaltrees 12 hours ago

Mainframe punch card -> PC floppy disk -> cloud SaaS -> AI --> return to the land agrarian

yiyingzhang 33 minutes ago

This is another Cerebras? fwiw, it took Cerebras many years to finally get a handle on the yield and the cooling problem. Wondering if they just hired a bunch of people from Cerebras.

bogdiyan a day ago

I am not sure how much of the work is done by OpenAI, or whether it is basically a Broadcom chip specifically built for OpenAI models. It is a necessary step, but building a high-performance chip is not easy. Look at companies like Groq, Amazon, and Google.

u1hcw9nx a day ago

Both Google and Amazon also codesign heavily with Broadcomm (Amazon also with Marvell and Alchip)

Broadcomm does stuff like physical design, provides IP blocks, managing manufacturing process with TSMC, packaging and testing. Google and Amazon work with system architecture, performance targets, and requirements but Broadcomm as consultant.

_boffin_ 4 hours ago

My question is: what will this do to Ceberas? It validates them, did they just have their lunch eaten?

kilroy123 a day ago

I hope to see something like this, but in a small form factor like the NVIDIA spark.

I want a super fast LLM that is Opus 4.6+, like, in ability.

wmf a day ago

Memory bandwidth is the bottleneck in the Spark. If you replace the SoC with an optimized ASIC but keep the same 256-bit LPDDR5 the performance will be the same. You can increase performance by using wider memory but that's also more expensive.

phonon a day ago

M3 Ultra has a 1024 bit memory bus (819 GB/s) and starts at $3,999 (96GB of RAM). It can be done....

bigyabai a day ago

smith7018 a day ago

Unfortunately Sam Altman won't be the one to deliver us at-home hardware that can run Opus-level models

blitzar 21 hours ago

I wonder what is happening with the OpenAI / Jony Ive crossover episode.

flyinglizard a day ago

Forget about it. Datacenter class hardware is getting farther and farther from desktop use. It’s not PCIe GPUs anymore.

theowaway213456 a day ago

This seems like more competition for Cerebras? Am I understanding correctly?

HarHarVeryFunny a day ago

This is just an uncut wafer - I don't think it's intended to be a wafer-scale chip.

Cerebras etch memory onto the wafer alongside the processing elements, but AFAIK OpenAI are going to be using HBM memory and a conventional chiplet design.

KeplerBoy a day ago

Still competition for cerebras. Seems quite unlikely they will get an OpenAI deal anytime soon.

smsx 21 hours ago

HarHarVeryFunny 21 hours ago

dadoum a day ago

> May we scale smoothly, exponentially and uneventfully through A[SI]

That sentence sounds weird to me. I can't really put my finger on why, maybe the combination of adverbs, or just the fact of writing the desire of scaling as a company so directly. It feels (to me) like openly claiming their selfish goals. Or maybe I am just misinterpreting and they are referring to the whole humanity as "We" (but knowing Broadcom and in a lesser extent OpenAI doings, I am not convinced).

lifeisstillgood 20 hours ago

So I’ve been wondering about “one or two levels back” chip design. If I understand it, 28nm chips (pre EUV) is just about suitable to run (not train just inference) frontier models.

And so if I was a mid-level State would it be worth while to take my nascent chip industry and push it out to build a 28nm foundry and supporting eco-system.

The models will come but the real challenge of the future is having enough compute power for every one and every use. Even if LLMs don’t become AGI they will still be incredible tools - and as OpenAI seems to spend 8000 for each 200 monthly subscription building one’s own data centres seems sensible

paxys 17 hours ago

You are underestimating how difficult it is even for a large nation state to attract the kind of talent and investment it would take to set up a chip industry. It is out of reach for anyone outside of the 3-5 largest national economies and a few big American/Chinese multinational corporations.

wmf 16 hours ago

28nm chips is just about suitable to run frontier models

I doubt it. 28 nm is 4-5 generations back so inferencing would need a large number of chips with very high power consumption. Maybe you're thinking more of 7 nm which is what Chinese fabs have; it seems to be OK for companies like Huawei.

And so if I was a mid-level State would it be worth while to take my nascent chip industry and push it out to build a 28nm foundry and supporting eco-system.

It never reaches breakeven so you'd have to provide billions in subsidies per year forever. The sovereign chip stuff only makes sense for the US and China; even the EU probably isn't large enough to make it work. A single country definitely couldn't.

eggsome 13 hours ago

But the energy requirements per token would be orders of magnitude worse than chips made at 3nm. So probably better for your hypothetical state to just pay the extra for more efficient chips so that they don't have (as much) of an energy problem.

mdp2021 16 hours ago

> Even if LLMs don’t become AGI they will still be incredible tools

(Mostly an aside, but: LLMs have paved the way, now the problem is there, it is a challenge and a geopolitically relevant race... AGI is a goal set: not-having-reached-it will be just a stage.)

groundzeros2015 19 hours ago

This is starting to sound like startup scope creep. Instead of making the AI model it’s now custom silicon, web browsers, and consumer electronics?

krick 6 hours ago

But there never really was a moat in LLM?.. I mean, I don't know where you stand, but my perception is that we all kinda knew that the whole time since 2017, and really knew that since DeepSeek. What they really care about is:

1. Customer acquisition.

2. Cheap(er) electricity/hardware.

So it's really surprising to me that them making their own chip surprises anyone at all. The electricity thing is already kinda being taken care of by earlier strategic alliances with some other evil people, the chip is a natural next step.

glaslong 19 hours ago

Definitely has that smell... At the same time though, they NEED inference cost to drop substantially, and even better for them if it only happens for their models on their hardware.

I assume they're doing everything they can to make that happen model-side, but coming at it from the other end makes sense too if they can swing it.

guywithahat 17 hours ago

Maybe, but they’re also a massive company. At some point Google stopped being a startup and become a massive company with margins to look after

groundzeros2015 5 hours ago

After they were wildly profitable.

brcmthrowaway 18 hours ago

Nearly all those initiatives have failed though

jnaina 13 hours ago

Two turkeys don't make an eagle.

I don't have much confidence in either OpenAi/Sama nor Broadcom, given past history. Again this is just pre-IPO shenanigans.

As credible as the "Datacenter in Space" claim by Elmo, before the SPCX IPO.

olalonde 9 hours ago

Why even "unveil" it? Seems like giving away competitive intelligence for no reason at all... other than hyping the stock?

mobile6test 13 hours ago

„ OpenAI says early results show significantly better performance-per-watt than current state-of-the-art alternatives“

would be very interesting to see any papers/data around this

GL26 9 hours ago

OpenAI is going to close the one thing it needs to be profitable : calculation power. Love this website : https://isaiprofitable.com/, shows who wins at the AI revolution. Nvidia wins because it has instant revenue, OpenAI is going to close that gap.

MangoCoffee a day ago

cheap token is more important now than ever. Chinese open weight model is getting pretty good. the real cost of AI adaption will come down to who (China or US) can provide cheap token for consumers and companies. Microsoft consider DeepSeek for their cowork is an example and now OpenAI with its own AI inference chip.

SV_BubbleTime 19 hours ago

I’m not understanding. If cost per token hits the floor that does not mean that you want a model that uses tokens.

If the Chinese are optimizing for token usage, that’s also speed.

Why use more token if few do trick?

BLKNSLVR 19 hours ago

*requires VMWare license.

paxys 20 hours ago

Very interested to know the distribution of effort between the two companies. Is this truly a brainchild of OpenAI engineers or did they pay to white label and use a new Broadcom chip?

satvikpendem a day ago

I'm assuming they used LLMs to (help humans) do custom circuit design. Even pre LLM there were various computer optimizations that didn't require humans like genetic algorithms. It'd be cool to see a paper on how they did it.

Legend2440 a day ago

The only surprising thing about this is that they didn't do it three years ago.

OrvalWintermute a day ago

Word of Advice for OpenAI:

Never underestimate Broadcom’s ability to shaft their own customers

- VMware

- CA Technologies

- Symantec Enterprise Security

- Brocade

- LSI Corporation

SV_BubbleTime 19 hours ago

I don’t know. I’m kind of glad that two of my least favorite companies are working together.

antonvs a day ago

CA Technologies was much worse than Broadcom in its heyday.

Three of their top execs - CEO, CFO, and head of sales - went to federal prison on securities fraud, conspiracy, and other charges. The CEO, Sanjay Kumar, who was at least partly the fall guy for co-founder Charles Wang, served 10 years.

Being acquired by Broadcom could only have been an upgrade, as strange as that may sound.

fennecbutt a day ago

I mean I'd love to be able to buy something like the 17k tps taalas chip as a pcie or m.2.

Imagine when we can roar along at that speed, low power. Can just have the model reason for a while about anything and everything. It reminds me of the "race to idle" for mcus etc.

ipdashc a day ago

> 17k tps taalas chip

It's odd to me that I haven't heard anything about this approach (baking LLMs/weights into silicon directly) since. It seems almost common-sense that we're going to end up there eventually. And it feels like that point is drawing ever closer now that model capabilities, if not quite plateauing out, are at least getting to a "good enough" point for a LOT of use cases.

I wonder if it's being worked on in secret, if there's something about it that makes it infeasible, or if companies are really too nervous to lock in one model like that because the next one down the line could be a huge improvement. Re. infeasability, I have heard that the Taalas demonstration chip ran Llama 3.1 8B (a pretty horrible model) and that even that took a massive amount of transistors / die area. So it might just be the case that the good models are too big to fit on silicon?

topspin a day ago

I have also been thinking about this a lot, and share your belief that this is inevitable.

Taalas has a running demo here: https://chatjimmy.ai/

It's eye opening: generated an AVX-512 optimized Mersenne Twister in C in 0.076s, 13,706 tok/s. Too fast for the tok/s to be terribly accurate.

mdp2021 16 hours ago

> It's odd to me that I haven't heard anything about this approach ... I wonder if it's being worked on in secret, if there's something about it that makes it infeasible

The studies and efforts are ongoing and public, and there are technical hurdles to be faced - but the relevant works go back in time quite a lot and there is heightened interest in it now.

It seems that you simply took the "hyped headlines" for the whole of the work.

ipdashc 3 hours ago

coder543 15 hours ago

> It's odd to me that I haven't heard anything about this approach since.

It has only been four months since they unveiled their first prototype. I don't understand your confusion. Chip development does not happen overnight...?

Their initial blog post laid out a roadmap, so theoretically they should have another thing to demonstrate this summer.

ipdashc 3 hours ago

mdp2021 15 hours ago

wmf a day ago

Good models will require multiple Taalas chips but Groq and Cerebras also require a lot of chips and that hasn't stopped them.

ipdashc 3 hours ago

MichaelNolan a day ago

The current taalas chip is for a 3.1B param model. I’m hope so much that they can get that up to the 30B range. Just imagine Gemma 4 or Qwen 3.6 at 17k tps.

coder543 15 hours ago

Taalas' first chip is for a Llama 3.1 8B quant, not a 3.1B parameter model, to clarify.

imglorp 6 hours ago

Is broadcom really the best business partner? 100,000 VMware customers might say no.

skyberrys a day ago

The new chip sounds like it's vustom made to accelerate a few specific models they really need to run fast. The advantage is it's truly and ASIC, not a xPU. There are several new startups targeting EDA tooling automation, Chip Agents is the biggest one I can think of but their are smaller players too, Silimate is one I recall. These companies are focusing on building fast AI powered tools to speed up the tape out cycle.

Jyaif 5 hours ago

Broadcom will let the entire industry leverage the decade of research done for TPUs.

The AI business of Nvidia is cooked.

mangomanai 9 hours ago

owow...what gonna be next.....thei own robot????

shevy-java 14 hours ago

So this mafia is driving up RAM prices. And now build their own overpriced hardware.

Either RAM prices go down, or that mafia must pay us all compensation money for this cartel build up. Why is the USA protecting this? How much does the orange man profit personally from helping drive up the prices here?

BobbyTables2 18 hours ago

Why the hell Broadcom of all companies?

philjohn 8 hours ago

Because they have the skills necessary to help bring custom designed ASICs to fruition. Google uses them for their TPU's, Meta uses them for their custom ASICs as well.

qsxfthnkp2322 a day ago

aw shucks nvda has some spicy competition

Make sure you all use that fancy ñ

boarush a day ago

They don't have true competition, what they lose out on is market share with hyperscalers, since OpenAI would have no plans to share inference hardware with any other company right now. Plus, I don't know how does NVIDIA's investment equation pans out long terms given OpenAI will be investing in more purpose built inference stack for the future.

ismailmaj a day ago

they're still kings for training, though I've heard Anthropic is training now on JAX+TPU setup, so might not be a monopoly in that segment.

fibonacci112358 a day ago

So this is where all the memory they bought is going to.

babelfish a day ago

that's not really how it works

jonhohle 17 hours ago

If it’s really a differentiator, why announce it? Why not keep it secret and make it a competitive advantage?

bakies 17 hours ago

Investors, I'm sure everyone's had the idea and they're doing it.

gravypod a day ago

I wonder how close OpenAI is getting to using the memory they purchased. Are they planning to stack a huge amount of HBM2 into these chips?

wmf a day ago

I assume OpenAI has been buying memory and "giving" it to Nvidia in exchange for a discount.

renoir a day ago

Look at the SIZE of that chip.

Cerebras stock is down nearly 20% today.

Not only is approach overlapping, OpenAI is also Cerebras's only major customer.

tantalor a day ago

If you're referring to the big circle of silicon, that's a wafer, generally contains many chips (100-1000s).

arcanemachiner a day ago

The alt text of the first image describes it as the "Jalapeño inference chip".

As a non-RTFA-er. I'm assuming it's a wafer-scale chip, similar to the ones made by Cerebras.

EDIT: From TechRadar[0]: "The 300mm wafer that both CEOs are holding will generate about 50 to 60 ASICs."

[0] https://www.techradar.com/pro/broadcom-and-openai-debut-jala...

jupr a day ago

That made me chuckle but I guess if you have never seen one I could see how that assumption could be made.

If this photo is real I wonder what can be revealed about the approach they have taken by analyzing the architecture of what we can see.

mdp2021 15 hours ago

thrtythreeforty a day ago

For reticle-limit chips, it's on the order of 100. And less than that once you filter out bad dies.

moralestapia a day ago

Everybody here knows that.

What some don't know (including you) is that the industry is doing wafer-sized chips nowadays, of which Cerebras is the flagship company.

That's why the stock movement could be related, and that is why GP wrote that comment.

AxiomaticSpace a day ago

I think Cerebras stock going down could also be partly caused by the lock-up period ending today for 200k shares (page 73 of their prospectus) - https://www.sec.gov/Archives/edgar/data/2021728/000162828026...

maxall4 a day ago

It doesn’t seem like it? Unless I am misunderstanding these Nasdaq insider trading reports: https://www.nasdaq.com/market-activity/stocks/cbrs/insider-a...

moralestapia a day ago

Dang, I just checked and CBRS is in free-fall since the IPO.

Sucks, I think they're a cool company.

OTOH, I was the only person back then pushing hard during my time at KAUST (back in 2019) to buy one of their systems when they were nobody, eventually resulting in a partnership between the two.

Then I joined their online discourse, very few users, I was semi-active there but they didn't care much.

Then I came to Toronto and heard they were opening an office here, tried to get noticed several times but got mostly ignored. I asked about upcoming events several times, anything to get involved, "yeah man, maybe one day". Then they made an event during Toronto Tech Week and didn't even tell me ... idk.

I don't get schadenfreude as I still think they're a cool company.

My point is they put all the eggs in one basket (AI inference) and neglected everything else. They seem to be on shaky ground now ... sad.

fl4regun a day ago

my friend briefly worked there and then got hit by layoffs, as a result, I am enjoying the schadenfreude.

ksd482 a day ago

That's just the wafer disc. Looks like it was presented to Sam Altman for ceremonial purposes.

The wafer disc is what the CPU gets "printed" on.

delduca a day ago

NVidia stocks are red now

dgellow a day ago

Because of Micron, no? I don't think it's related to OpenAI's announcement

brcmthrowaway 20 hours ago

What happened with Micron?

dgellow 10 hours ago

bluegatty 21 hours ago

'braodcom' ha ha ... it's not OpenAI's chip then ...

kazinator a day ago

There is a never ending torrent of money coming, so why not make custom chips.

Whoo ... party!

mdp2021 16 hours ago

Although, custom HW has to be the focus right now - simply because we are dealing with a technology (big NNs) that are not the best match with Von Neumann architectures.

duendefm a day ago

If this is something that will hurt Nvidia, I'm all for it

jabedude a day ago

how much does this chip help with inference speed?

wmf a day ago

It's probably the same speed but cheaper.

Buttons840 19 hours ago

Fucking Broadcom?

The only time I've ever seen that name before is when trying to solve driver issues, on both Linux and Windows.

Are they especially stingy with their IP related to drivers or something?

m3kw9 20 hours ago

They tested on spark model, i bet it's a mix of that with focus on inference speed. Whatever it is, hopefully it shows up with current models as faster. Token/s is as big thing as anything else, and thats where they can really gain some edge over the competition.

tehjoker 21 hours ago

No information on how significant the reduction in energy per token is. No information on amortized price per request. Increasingly its clear OpenAI must demonstrate order of magnitude reductions in cost to not die, this is investor story time without that information.

rvz a day ago

No surprise here. [0]

[0] https://news.ycombinator.com/item?id=45429514

mdp2021 15 hours ago

Actually, I find the idea of using Cerebras etc. for /training/ (not just inference) surprising: I did not stumble in much data and discussion about "super-CPUs" in that area, where NVidia (with the tools focused on it) has that long-built edge...

Edit: contextually,

> Jalapeño is specifically designed for inference

Imustaskforhelp a day ago

Although this seems to be for inference itself only and not training but inference is a recurring cost and training is a one time cost and so to me, even if Nvidia still gets moat on training, I don't think that it could ever justify its massive evaluations because for example, some chinese models are actually trained on Non-Nvidia models. The moat in that is incredibly thin.

(at the moment), I think that if I were Nvidia, I would be a bit terrified and I imagine the stock to not be doing super great as I can just imagine everyone online might start talking about it for better or for worse.

I am a bit impressed by OpenAI but is this what can be classified as a plan for OAI to salvage itself and all the commitments it has made nearing a 1.4 Trillion dollars from my memory and this article[0] is from 2025

But could OpenAI simply walk out of its commitments when necessary (for example to Nvidia) if this chip works out or what exactly might happen in the future as these commitments are asked to be paid for, its still smart for OAI to diversify with this chip and to have more deeper ways of revenue than just being a simple middleman but I imagine that Nvidia and others have also invested in OpenAI and they must not be happy with this change.

The thing with AI deals are that they have become so complicated that it is hard for me to find the first order impact of things, let alone second or third order impacts and financial accountability seems to be impacted quite heavily because of all of it and there is some sense that it is done so intentionally.

https://techcrunch.com/2025/11/06/sam-altman-says-openai-has...

wilg a day ago

> significantly better performance-per-watt than current state-of-the-art alternatives

An interesting example of how the current market dynamics incentivize low cost and therefore power efficiency and therefore lowering resource use.

zuzululu 21 hours ago

im very excited that frontier models now have so much money and revenue they are releasing their own chips that could change the relationships and bottom line

gaigalas a day ago

But nvidia's moat is software support, isn't it?

KeplerBoy a day ago

You don't need a whole lot of software support if you just want to serve a single family of LLMs.

gaigalas 21 hours ago

A lot of companies that serve a single family of LLMs seem to prefer nvidia though. Why is that?

It's not just good drivers, which is what moats them for games and ML. It's a multi-decade work of making chips that are nice to program for and software infrastructure around them.

Apple and Google have excelent chips, yet they needed to invest a lot in long-tail software projects to make those chips do actual premium work. Still not state of the art for serving LLMs (although Google is strong in that, mostly because it piggybacked on previous chip-related software work for phones and so on).

SV_BubbleTime 19 hours ago

hari_vardhan 12 hours ago

xyst 13 hours ago

> built by Broadcom

AI is cooked bro. Broadcom is the death sentence of anything.

jauntywundrkind 20 hours ago

Is there any actual content on what the chips are?

You can't purchase Microsoft or AWS chips, but both of them do pretty good write-ups on what they've done. https://blogs.microsoft.com/blog/2026/01/26/maia-200-the-ai-...

This seems utterly empty of actual substance.

sehw a day ago

lol

flyinglizard a day ago

I call BS. It’s probably a white label around existing Broadcom IP, impossible to go from zero to this kind of chip in nine months. I doubt OpenAI had any significant contribution.

zerohp a day ago

That’s exactly what this is.

9 months to production is completely impossible anyway.

9 months from design to early samples is probably impossible given than TSMC takes 3 months after tape out to produce them. Then it’s up to the customer to qualify and revise for production. TSMC doesn’t do that.

There’s no AI that makes this happen in 9 months.

Mistletoe a day ago

The similarities between the AI world and the crypto world are so much closer than any AI fanboy would ever admit.

samrus 19 hours ago

This is why ram prices are fucked. Cause altman doesnt give a shit about normal people as long as openAI suceeds

Africa-Ai a day ago

Wow thats sounds tempting to use open ai newest chips

nullbio 16 hours ago

Big tech AI labs will develop LLM accelerators and hardware LLMs that increase frontier model output to tens of thousands-hundreds of thousands of TPS.

These chips will be used internally for their own business goals, giving them the capability to iterate at such an insane pace they will be able to clone every software product and software company on Earth. Meanwhile they'll trickle out 100-300 tps access to the rest of subscription users to drain them of their cash and keep the beast fed with fresh training data.

How can any individual company building a product, with access to 100-300 TPS behind-frontier security-gated, censored and capability gated models expect to compete with a company like Anthropic or OpenAI with frontier, unrestricted, unlocked models that can produce 100-1000x the output? 3-5 of their employees working to clone your 500 staff business will likely be easy pickings for them.

This should concern everyone.

The only reason they aren't 100% in on the strategy of replacing everyone is because they need us for training material and they needed the bootstrap. But the bootstrap problem is already gone, and they don't need to give us fair access to keep training data rolling.

jerojero a day ago

One thing I don't like about California based companies is how cringe the names always are.

"Jalapeño" is such a bad name, having an "ñ" already makes it difficult and annoying to deal with in so many little ways. Good luck with that.

But also, theres the sort of "yes lets use Mexican related things because we're California" thought that I just really hate. I don't know, its like corporate Memphis to me. You see a product like this, you know it's an uppity califonia based firm that came up with it.

thewebguyd a day ago

No worse, I suppose, than, the obsession with Lord of the Rings that the authoritarian surveillance companies have. Palantir, Anduril. Then we have the not defense/surveillance ones: Mithril, Valar, Narya, Erebor

skeledrew a day ago

What kinds of names would you suggest?

thewebguyd a day ago

utopiah a day ago

Strawberry was too complicated as a codename.

CrzyLngPwd a day ago

Too many Rs.

smallmancontrov a day ago

anthk a day ago

Don't worry, in Europe it's the same, but for insurances/lawyer stuff. Tons of companies have names based on Latin words such as Civitas/Insalus/Legalia/Legalitas or whatever which looks tacky/rancid/old fashioned kilometers away.

qsxfthnkp2322 a day ago

Jalapeño

Really has a… ring to it

Hacker News

by Ryan Harman

OpenAI unveils its first custom chip, built by Broadcom (techcrunch.com)

sharkjacobs a day ago [-]

zgao a day ago [-]

otterdude a day ago [-]

kmacdough 8 hours ago [-]

zgao 20 hours ago [-]

xdavidliu a day ago [-]

pama 16 hours ago [-]

nonethewiser a day ago [-]

hailwren a day ago [-]

zgao 20 hours ago [-]

formerly_proven a day ago [-]

Aurornis a day ago [-]

knicholes a day ago [-]

Aurornis a day ago [-]

rpcope1 a day ago [-]

chamomeal 13 hours ago [-]

ses1984 a day ago [-]

nradov a day ago [-]

tonfa 20 hours ago [-]

bsder 19 hours ago [-]

doxeddaily a day ago [-]

einpoklum 5 hours ago [-]

holoduke 19 hours ago [-]

remexre 16 hours ago [-]

IshKebab a day ago [-]

whynotminot a day ago [-]

aseipp 21 hours ago [-]

cloudengineer94 7 hours ago [-]

aurareturn 21 hours ago [-]

AtlasBarfed an hour ago [-]

cptskippy 21 hours ago [-]

aurareturn 20 hours ago [-]

dofm a day ago [-]

lovasoa a day ago [-]

Catloafdev a day ago [-]

fl4regun a day ago [-]

reducesuffering a day ago [-]

wongarsu a day ago [-]

Aurornis a day ago [-]

dofm a day ago [-]

dpe82 a day ago [-]

wmf a day ago [-]

etempleton a day ago [-]

scrollop a day ago [-]

oceanplexian a day ago [-]

smokel a day ago [-]

dofm a day ago [-]

cess11 a day ago [-]

nixon_why69 a day ago [-]

KeplerBoy a day ago [-]

u1hcw9nx 7 hours ago [-]

blitzar 21 hours ago [-]

nickvec 21 hours ago [-]

Kiro 8 hours ago [-]

SCUSKU 21 hours ago [-]

figassis a day ago [-]

FanaHOVA a day ago [-]

fecal_henge a day ago [-]

DoctorOetker a day ago [-]

seydor a day ago [-]

WithinReason 8 hours ago [-]

sentinalien 5 hours ago [-]

therealcamino 14 hours ago [-]

HarHarVeryFunny a day ago [-]

davidpapermill 2 hours ago [-]

xnx a day ago [-]

shellcromancer a day ago [-]

HarHarVeryFunny a day ago [-]

alephnerd a day ago [-]

ahartmetz a day ago [-]

NavinF 21 hours ago [-]

a_conservative a day ago [-]

kccqzy 21 hours ago [-]

mschuster91 19 hours ago [-]

alephnerd a day ago [-]

vb-8448 a day ago [-]

a_conservative a day ago [-]

sharkjacobs a day ago

zgao a day ago

otterdude a day ago

kmacdough 8 hours ago

zgao 20 hours ago

xdavidliu a day ago

pama 16 hours ago

nonethewiser a day ago

hailwren a day ago

zgao 20 hours ago

formerly_proven a day ago

Aurornis a day ago

knicholes a day ago

Aurornis a day ago

rpcope1 a day ago

chamomeal 13 hours ago

ses1984 a day ago

nradov a day ago

tonfa 20 hours ago

bsder 19 hours ago

doxeddaily a day ago

einpoklum 5 hours ago

holoduke 19 hours ago

remexre 16 hours ago

IshKebab a day ago

whynotminot a day ago

aseipp 21 hours ago

cloudengineer94 7 hours ago

aurareturn 21 hours ago

AtlasBarfed an hour ago

cptskippy 21 hours ago

aurareturn 20 hours ago

dofm a day ago

lovasoa a day ago

Catloafdev a day ago

fl4regun a day ago

reducesuffering a day ago

wongarsu a day ago

Aurornis a day ago

dofm a day ago

dpe82 a day ago

wmf a day ago

etempleton a day ago

scrollop a day ago

oceanplexian a day ago

smokel a day ago

dofm a day ago

cess11 a day ago

nixon_why69 a day ago

KeplerBoy a day ago

u1hcw9nx 7 hours ago

blitzar 21 hours ago

nickvec 21 hours ago

Kiro 8 hours ago

SCUSKU 21 hours ago

figassis a day ago

FanaHOVA a day ago

fecal_henge a day ago

DoctorOetker a day ago

seydor a day ago

WithinReason 8 hours ago

sentinalien 5 hours ago

therealcamino 14 hours ago

HarHarVeryFunny a day ago

davidpapermill 2 hours ago

xnx a day ago

shellcromancer a day ago

HarHarVeryFunny a day ago

alephnerd a day ago

ahartmetz a day ago

NavinF 21 hours ago

a_conservative a day ago

kccqzy 21 hours ago

mschuster91 19 hours ago

alephnerd a day ago

vb-8448 a day ago

a_conservative a day ago

nickpinkston a day ago

jsenn 5 hours ago

kccqzy 21 hours ago