Uber's $1,500/month AI limit is a useful signal for AI tool pricing (simonwillison.net)

575 points by pdyc a day ago

ValentineC a day ago

> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.

Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?

Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.

vidarh 20 hours ago

We can tell that the inferencing costs for many of these models are low enough that these models are being sold close to real costs on the basis that many of them are open weight and available from third party providers who have no incentive to subsidize them.

I think the frontier labs will need to drop their high per-token prices at least for their low and mid-level models for the reason that several Chinese models (at least Qwen, DeepSeek, Kimi and GLM) are "close enough" that with the right harness they are cost effective alternatives.

They won't necessarily need to close the gap - at least not yet -, because these models won't necessarily compete at the same token counts. E.g. at least some of them need to do far more work to solve the same problems.

But, yeah, the prices will come down one way or the other.

At the same time, even the subscriptions for the cheap Chinese models are probably subsidised, and those subscriptions are likely to get less generous over time.

White_Wolf 8 hours ago

I really doubt Deepseek is subsidised. It's roughly the same price everywhere you look. Deepseek is using the Huawei hardware (as far as I managed to understand from various articles) and hence the savings.

vidarh 44 minutes ago

xyzsparetimexyz 6 hours ago

vablings 3 hours ago

bel8 6 hours ago

Add MiMo 2.5 to the list. Priced like DeepSeek, performs similarly but it also has vision capability.

dgellow a day ago

One aspect Paul Kedrosky mentioned recently is the concept of „duration mismatch“. The price per token goes down over time (either because the AI vendor reduces due to competition pressure, or because customers are now incentivized to use older cheaper models). But datacenters are financed through debt, with the assumption their revenue increases over time. Quoting him: „[AI vendors are] paying for a fixed cost with a depreciating commodity“[0].

So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt.

0: https://youtu.be/wGZboZcSGDY?is=64GuKyqBh_4aSjTE

missedthecue 21 hours ago

"So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt."

Not necessarily, the bond holders could simply take a massive hair cut and lose shitloads of money. On the topic of bubbles and exuberance, Jeff Bezos made the salient point that there was a massive over-invested biotech boom in the 1990s and tons of sophisticated investors ended up losing lots of money. But humanity still kept the medical advancements made by the boom. Stocks going down didn't un-research drugs, and it won't un-research new GPUs or un-build datacenters.

solatic 12 hours ago

20k 11 hours ago

dgellow 9 hours ago

biztos 20 hours ago

Frieren 9 hours ago

geysersam 20 hours ago

Current AI datacenter/model development investment rate is roughly 1T/year. That's a lot. But the US economy is 33T/year. So the investment pays back (roughly) over ten years if, each year, the AI investments increase overall productivity by 0.6%, assuming the AI companies can capture half of the value of that productivity gain.

> „[AI vendors are] paying for a fixed cost with a depreciating commodity“

That's just a confusing way to say you don't think future models will be worth the development costs. Because if future models are significantly better, why would the price of tokens to access those models deprecate?

timacles 16 hours ago

PunchyHamster 8 hours ago

dgellow 10 hours ago

flextheruler 19 hours ago

jiggawatts 20 hours ago

treis an hour ago

Relative to the current usage demand for tokens is effectively unlimited. If the price of tokens go down people will send more tokens to compensate. We are very very far away from a cost per token where people run out of things they want to send through an LLM.

try-working 15 hours ago

If you have a good model router, you can route to older, cheaper models that run on older hardware, for simpler tasks. That helps labs extend the economic life of their hardware investments. They will likely fight it at first though as they see it as reducing ASP.

This is why I'm building role-model, a routing protocol and a router runtime: https://role-model.dev/

jurgenburgen 7 hours ago

bandrami 11 hours ago

The other part of that is that while price per token may be going down, tokens per task is going up

no-name-here 11 hours ago

Forgeties79 9 hours ago

I really wouldn’t be surprised if we saw some of these data centers scrapped in the next few years

bijowo1676 a day ago

do GPU chips really depreciate physically? There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally.

I think its only accounting depreciation.

I have been using my laptop for a decade, what is stopping datacenters from using the purchased GPU chips for a decade?

bgnn 21 hours ago

Aurornis a day ago

munk-a a day ago

tardedmeme a day ago

vb-8448 a day ago

malfist a day ago

threetonesun a day ago

mattalex 20 hours ago

whateverboat a day ago

numpad0 a day ago

foobarian a day ago

dgellow a day ago

manyatoms a day ago

bigfishrunning a day ago

ozim 19 hours ago

fooker 16 hours ago

sandworm101 a day ago

bethekidyouwant 19 hours ago

Using a shittier model is just more work for the user, I’m not sure why anyone does it, unless they’re playing with it like a toy.

SoMomentary 17 hours ago

no-name-here 11 hours ago

Kaliboy 17 hours ago

dgellow 9 hours ago

satvikpendem 21 hours ago

Don't worry, they'll just lobby to ban Chinese models instead to keep their token revenues high.

> Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent.

https://www.anthropic.com/research/2028-ai-leadership

CuriouslyC 21 hours ago

If you do the math, they don't have a choice. If China captures America's AI market it'll cause a major depression. They'll give it the BYD treatment, though it'll be a lot less effective.

arealaccount 20 hours ago

WarmWash 20 hours ago

le-mark 8 hours ago

> Once a model is open-weight, safeguards that do exist can be removed

Safeguards trained into the model (ie exist in the weights) can’t be removed.

gck1 6 hours ago

regularfry 5 hours ago

throwyu8 17 hours ago

China is the worst trading partner in the world. They banned most companies from functioning in their country for decades

evolighting 10 hours ago

Animats a day ago

Raise them, more likely. NVidia says that GPU hardware prices won't decrease until at least 2030. The world is out of fab capacity.

davedx 11 hours ago

Meanwhile, Google...

stingraycharles 10 hours ago

kristianp 18 hours ago

> The world is out of fab capacity.

Can anyone expand on this point? I read an article saying that the big AI co's datacentre spend was a bunch of lies because they can't build datacentres at anywhere near the rate they want to.

stingraycharles 10 hours ago

no-name-here 10 hours ago

EA-3167 21 hours ago

Seriously, they’re trying to justify trillion+ IPO’s while setting piles of money on fire, prices aren’t going DOWN.

criddell 21 hours ago

dakolli 19 hours ago

freediddy a day ago

Most sane US companies will disallow use of cloud-based Chinese AI providers, because everything including code, data, PII, etc is being sent to them.

eikenberry a day ago

Then don't use the cloud-based Chinese providers, use cloud-base US/EU providers using Chinese models. The interesting Chinese models are all open making this issue mostly moot.

daemin 13 hours ago

ceejayoz a day ago

Saner companies ask the same question about models from their own country too.

rd a day ago

I wonder if I could start a US-based company with good data regulation and just serve open-weight models at a competitive price. I feel like the real barrier is just that most companies willing to adopt AI usage enough to make it worth it at this point don't want to be using inferior models.

CobrastanJorji a day ago

tokioyoyo a day ago

mediaman a day ago

dakolli 19 hours ago

fg137 21 hours ago

amunozo a day ago

You can run DeepSeek as it's open weights, unlike Claude or GPT.

HWR_14 8 hours ago

Do you trust OpenAI with your code, data, PII? What makes you so sure it's not all part of the next training set anyway?

tmp10423288442 21 hours ago

There are some objections here saying that some US firms are using Chinese AI providers, but I wonder if any of those are subject to compliance. Large firms that are disproportionately responsible for AI spending are all subject to compliance.

cheeze a day ago

Deepseek has some models in Bedrock. There is definitely a huge market for a "good enough" model running within the country of the company

KronisLV 19 hours ago

LastTrain 18 hours ago

> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?

Raise, they are going to raise the prices. We will spend more on AI infrastructure in 2026 and 2027 than the gross sales of the entire global software and services sector. Current pricing is at a major loss for current providers.

xyzsparetimexyz 6 hours ago

Why would I even pay for deepseek? I get deepseek v4 flash for free with opencode. If I somehow run out of tokens for the day, I can just then on my vpn

testdelacc1 a day ago

Per token costs will fall, but the harnesses will get more token hungry. Instead of just centering the div it’ll spin up a battery of agents to architect, critique, advise, code, review, refactor and so on.

sevenzero a day ago

I wish I could disable most of these. I already hate all the "oh you're actually right, let me fix that" nonsense. Then it proceeds to burn 50k tokens on the git history instead of copying logic A from a different part of the codebase to logic B, where I want that exact logic without having to write the boilerplate myself...

apsurd a day ago

sfn42 a day ago

ed_elliott_asc 12 hours ago

If Anthropic are then they are making a big mistake, their token hungry Claude code is far too greedy

bigbuppo 17 hours ago

They're going to need to bring in a few trillion dollars fast to meet wall street expectations. Expect prices to rise.

PunchyHamster 8 hours ago

> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?

Are they even making money off them now ?

SecretDreams a day ago

> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?

I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.

aDyslecticCrow a day ago

An inference only platform selling good open weight model inference without the research overhead could capture a-lot of market for lower size model uses (haiky, gemeni flash). Diffusion-transformers and clever cashing can drop inference even lower, which is improving at a high rate.

The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity)

At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments).

For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position)

pianopatrick 21 hours ago

SecretDreams a day ago

HDThoreaun 20 hours ago

Prices can go down while tokens sold increases so that profit increases. The labs number one goal right now is moving past software engineers so that every white collar worker in the country finds ai assistants indispensable. Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs.

SecretDreams 20 hours ago

cyanydeez a day ago

id be amazed any american business will aend data to china

linkregister a day ago

HuggingFace offers DeepSeek as one of its models— it's pretty simple to spin up instances under your control.

I'm not sure about OpenRouter but I wouldn't be surprised if they offer a US-based provider of DeepSeek.

For reference, Cursor has their first own light fork of Kimi that they use as their baseline coding and review model.

dghlsakjg a day ago

alpinisme a day ago

“Any” is a very high bar Unless laws prevent it, I don’t see why a substantial minority wouldn’t buy services from where they can get them at a similar quality and much lower price.

dkersten a day ago

Together.ai provide many open weights models and as far as I’m are their servers are US based (the company certainly is)

lowbloodsugar a day ago

Any IT cost center will send to the lowest bidder. This isn’t intellectual property: it’s annoying shit that is an unwelcome cost of doing business. China might copy our tedious scripts? Will they make a product out of it? Can I buy it and fire my IT staff? Great!

Not everyone using AI is using it to code core value IP.

vinzenzu 10 hours ago

API prices of Anthropic, OpenAI, and Google are massively inflated.

https://martinalderson.com/posts/no-it-doesnt-cost-anthropic...

There's no way that all AI inference providers are colluding and/or all running at a massive loss, meaning the cheap Chinese model prices must be the real cost it takes to run frontier-class models PLUS their margin.

Look at Deepseek 4 Pro. https://openrouter.ai/deepseek/deepseek-v4-pro/providers Deepseek and Baidu are subsidising prices but they probably train on inputs. I have no model training and ZDR in OpenRouter enabled, and the first provider that shows up there is Deepinfra, significantly more expensive than Deepseek. BUT much cheaper than Sonnet 4.6 and ChatGPT GPT-5.4.

f311a a day ago

How many more months do we need to wait, until big companies realize that flash models work just fine if you:

1) Don't ask LLMs for big changes

2) Review everything and point them in the right direction

Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.

The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.

So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.

_jab a day ago

It's pretty simple; organizations are willing to tolerate paying $1500/month/engineer, which seems to be roughly inline with "normal" consumption for most full-time engineers. If that number grows significantly, then I bet companies will start exploring flash models more, as you propose.

lavezzi a day ago

They are willing to tolerate it now, which is quite a switch up from the free for all we had a few weeks ago, and if they aren’t able to tie in this new ~$1500p/m cap to demonstrable productivity and revenue increases then that will be kneecapped even faster

phreeza 11 hours ago

aiisjustanif 39 minutes ago

rudedogg 21 hours ago

> organizations are willing to tolerate paying $1500/month/engineer

One organization, that is a software company

> which seems to be roughly inline with "normal" consumption for most full-time engineers

My peers are using $20/mo plans, only a handful are using more than $100/mo in tokens. We haven’t had any limits imposed yet.

epolanski 19 hours ago

Which organizations?

Uber is not representative of any trend beyond big tech and VC over funded startups.

mrothroc a day ago

The easy decision is to just go with the biggest SOTA model you can afford.

But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.

The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.

It's the pipeline, not the model, that gets you quality at a given token budget.

chaoz_ 4 hours ago

There is something about using the most advanced tooling possible. Why would you pay for IntelliJ, if Eclipse can do the same thing a bit worse?

You want to master your craft, develop "optimal" systems, understand where things are going by utilizing SOTA.

You can call it FOMO, but you get the point.

jmtulloss 19 hours ago

Is your argument that $1500 / mo is too much? Why would the engineering team not be more rigorous in their model selection given a constraint?

gravypod 19 hours ago

If you had a business task to complete that was only possible with ai and it cost you >$1500/month of work, how long would you have to delay the task so that it's cheaper long run to buy hardware and do local models?

$1,500/mo * 14 months = $21,000.

If local models are 14mo behind as many in HN say it may be profitable to just wait. Maybe just spend a few hundred dollars of your tokens and buy hardware piece by piece.

therealdrag0 12 hours ago

pchristensen 18 hours ago

edmundsauto 15 hours ago

econ a day ago

I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly? Perhaps, if it can measure complexity, even generate a quote?

Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.

AgentMasterRace a day ago

Many harnesses do this, I've recently dropped all my big subscriptions for using deepseek. Codewhale (formerly deepseek-tui) will use pro for large tasks and route smaller ones to flash. It's pretty good, but I just use pro and everything as the cost is quite low.

This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)

ValentineC a day ago

> I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly?

This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.

jorl17 21 hours ago

andersmurphy 21 hours ago

This a thousand times. The bigger models also have a habit of overcomplicating things.

warmwaffles a day ago

> Don't ask LLMs for big changes

> Review everything and point them in the right direction

Sorry upper management doesn't care. That's an engineering problem that you need to solve.

eikenberry a day ago

They were proposing a solution.. To use flash models and use them in a way that best amplifies your work.

AgentMasterRace a day ago

epolanski 19 hours ago

I'm legit annoyed at opus 4.8 at any setting above 4.8.

I believe it can be great for vibe coding, but mundane day work? Hell no, I'd rather work with Haiku. It's too slow, checks too many things, it's annoying as hell.

thundergolfer 19 hours ago

> That means each employee's AI spending cap is ~11% of that median compensation package.

Probably better to use the fully-loaded cost of the engineer, which is much higher than their compensation package. The fully-loaded cost is the total cost paid for the labor power of the engineer, and it includes big ticket items such as office space, food, equipment, insurance, payroll tax, fringe benefits, recruiting costs.

If the median compensation package is $330k/year then the median fully loaded cost is probably around $450-500k.

munk-a 19 hours ago

My usual rule of thumb for the US is north of double the received compensation but something in that range sounds reasonable with such high compensation. It's actually really interesting and underappreciated how that fully-loaded cost varies from country to country. Canada (for most salary ranges) is about half again instead of double owing to the insurance portion coming out of income tax rather than being a hidden expense so Vancouver ends up being attractive for trading 160k USD for like 120k CAD in compensation and then also lowering overhead from 100k USD down to like 60k CAD. The savings can be extremely dramatic.

hansvm 18 hours ago

Why would double be a good rule of thumb for typical US SWEs? Most of the costs aren't proportional to salary, and the ones which are aren't anywhere approaching 50%, much less double.

jmalicki 17 hours ago

ptero 16 hours ago

newsoftheday an hour ago

> $330k/year

For a traditional software engineer? I retired last year after 3 decades and my salary was about the same as it was in the early 2000's at the last company I was at. Maybe I should have negotiated more but I thought only FAANG paid traditional pre-AI engineers more than $250K.

simplyluke 27 minutes ago

Uber's comp packages are probably right in line with that. Tech salaries are trimodal, and uber's right in line with the big public tech companies.

https://newsletter.pragmaticengineer.com/p/trimodal

stingraycharles 17 hours ago

I’ve even heard the rule “twice the salary” being used here in EU, but the tax and insurance burden may be higher. All kinds of those are based primarily on total payroll amount.

consp 10 hours ago

That number usually includes cost of habitat and others. It's also a stupid number as it is skewed by how much you can squeeze out of your employees. A better number would be to compare it vs revenue per capita.

spacemanspiffii 8 hours ago

It is also possible that capping at $1500 will give you ~99% of the benefits. So even with gains that are much higher, a cap could be a rational decision. Also, most decisions, especially around AI aren't exactly rational, so I wouldn't read to much into this number.

notnullorvoid 16 hours ago

Both metrics are valuable.

If one uses AI minimally and is able to out perform peers who are maxing out AI spend, one might want to use that in salary negotiations.

ransom1538 17 hours ago

"$330k/year" Lol. I thought I clicked on hacker news 2022.

barumrho 15 hours ago

Is it too high or too low? Honestly cannot tell

random__duck 14 hours ago

Quoting the article : > Levels.fyi lists the median yearly compensation package for Uber software engineers in the USA at $330,000.

stego-tech 18 hours ago

It’s also worth noting that’s the peak benefit. Expect most engineers to not hit those limits on the regular (if at all, since limiting this puts skills in focus again), and that limit to come down over time as the easy processes are automated and humans are re-tasked with harder problems relative to their TC.

This is not a good bellwether for the AI industry, including its adherents. Their growth assumed a level of indispensability that’s not being reflected in hard numbers and real costs, which lends credence to the notion that these IPOs being fast-tracked are meant to try and cash out before the bubble really pops in earnest. There’s no way consuming enterprises are going to pay such insane costs for such minimal uplift in the long run, and the AI companies can’t keep offering subsidized tokens via subscription plans at their current pricing.

tuesdaynight a day ago

Why there are so many people that still believe that AI coding is a fad? It's something that started less than two years ago and companies are already paying thousands per seat. I know one that gives you 5k per month. Which other tool went from nothing to this level of acceptance so quickly?

OptionOfT a day ago

Because companies are betting that this spending will allow them to reduce cost by firing people.

Right now the AI LLM PRs we're seeing are just introducing more work for other people, while these so-called builders are looking good with their new dashboards and functionality they're demoing.

But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.

It's not built up from the ground with experience from x people taken into account. It's materialized from nothing, with no foundational separation, and barely any abstractions.

No one wants to touch it. The PRs are too large, and the 'authors' of the PRs aren't on call with us.

They get all the glory, but do none of the work.

It's kinda like designing a house and then sending it to an architect and engineer saying: make this work.

saulpw a day ago

> But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.

You can absolutely do this. It's even right most of the time.

chmod775 21 hours ago

datsci_est_2015 a day ago

ssss11 20 hours ago

scubbo 18 hours ago

> But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.

There are plenty of valid criticisms or warnings about over-reliance on AI coding, but this is not one of them. Today, I am using a semi-autonomous agentic coding system which has an `interview` functionality built in - when it spits out the PR from the input, if you have questions about the motivation or context for a particular choice, you can start up a clone of the original agent in a sandbox to question it.

Now, you might claim that those responses aren't always reliable, accurate, or consistent, and that claim has a little more weight (though, in my experience, decreasingly so) - but it is _certainly_ not the case that you cannot interview an agent about choices made. I'm literally doing it every day.

OptionOfT 16 hours ago

com2kid 14 hours ago

> Because companies are betting that this spending will allow them to reduce cost by firing people.

I've never worked at a company that didn't have a technical backlog measured in years.

LtWorf 5 hours ago

scuff3d 20 hours ago

Literally in the middle of ripping apart a vibe coded mess at work to figure out what's even worth keeping. Not fun :(

bvcp 8 hours ago

foolserrandboy 15 hours ago

HNisCIS 21 hours ago

It's so fucking bad. I'm watching a team try to maintain a huge dashboard/control application that interfaces with a large amount of hardware using solely AI workflows.

Literally nothing works, all the timers/time counters are different across the pages, constantly commands hardware to do stupid shit, breaks during critical moments/in front of clients.

Eventually mgmt had to institute change freezes for high profile events because the team was breaking too much shit all the time.

The average C suite dipshit doesn't realize that the performance drops off a cliff once your project is more than some fraction of the context window so they will make pretty dashboards all day long but once you need to cover all the edge cases of a real system it all explodes.

AI isn't trained on the type of software style we'll need to create systems using AI, it's trained on how we used to write software. It doesn't reuse code or elegantly structure annoying, it just adds more code until the thing builds and passes some fake tests, even if half of it is functionally dead/unused.

lbrito a day ago

That's just a non sequitur. "companies are already paying thousands per seat" has zero correlation with something being a fad or not. There are much more reasonable rationales explaining why companies are acting the way they are than "because AI coding is not a fad"

Kiro 21 hours ago

It's just silly to claim it has zero correlation.

tmp10423288442 21 hours ago

Can you name a service that charged companies thousands/seat/month that turned out to be almost or completely useless? There's lots of random services sold to corporates that are not very useful (all the random benefits besides health care, life insurance, and other big-ticket items), but the per-seat charge of those is much smaller.

edent 21 hours ago

iammrpayments 14 hours ago

marcosdumay 20 hours ago

sdevonoes 19 hours ago

mike_hock 20 hours ago

overfeed 10 hours ago

ipaddr 17 hours ago

LtWorf 5 hours ago

agumonkey a day ago

I would use these exact facts as a sign that it's maybe not what it seems. It's much too big and too fast to feel stable. It might keep at that level, increase even more, or drop down to a saner level of use / allocation.

teeray a day ago

I can see a corporate future where tokens are haggled over in department budgets just like any other line item. Some projects will get more of them, other projects will get less of them. "Use AI for everything" will become "use AI economically and build things that outlast our budget for it."

mekael 6 hours ago

Aurornis a day ago

> It might keep at that level, increase even more, or drop down

Bold prediction. :)

I think anyone predicting a drop or near-term flattening is not thinking beyond the online bubbles where these tools are discussed. In a local tech meetup a lot of the normal companies are barely coming online with AI tools at their company, and even then with very low limits.

johnfn a day ago

So it might either go up, stay the same, or go down? :)

agumonkey 19 hours ago

sirsinsalot 18 hours ago

Fear of loss to competitors embracing a technology creates a fear driven adoption.

Let me ask you this: is any technology worth so much break-neck adoption without first seeing clear evidence of ROI? No. The adoption is irrational.

anamexis 17 hours ago

What makes you think there is no clear evidence of ROI?

techblueberry 17 hours ago

tokioyoyo a day ago

“AI coding is a fad” is not just one big camp of similar-minded people. Different groups have to give up on their pre-existing beliefs in order to be ok with AI coding.

Think of people who were very strict with variable names. People who pushed for multiple-levels deep of abstractions for a single API logic that’s not going to be reused. People who believed that coding is craft, rather than just a process to get to the end during work hours. This makes most of these people’s points more-or-less moot.

I was in some of those camps, but I’ve seen coding evolve in the last 15 years. So I understand that these priors need to be updated, as most arguments don’t apply to today’s world.

devin 20 hours ago

"as most arguments don't apply to today's world" makes me want to roll my eyes so hard at you. The vast majority of problems we had with building complicated systems are all still just sitting there. People are speedrunning relearning things we've known about software engineering for decades.

The more things change, the more they stay the same.

rootusrootus 20 hours ago

tokioyoyo 20 hours ago

fragmede a day ago

What's an int vs a float vs a boolean? What's a function? What's a class? What's a variable? You don't actually need to know the answer to those questions in order to vibe code. That's a lot of priors to update!

tokioyoyo a day ago

harry8 19 hours ago

nomel 21 hours ago

malfist 21 hours ago

javier2 20 hours ago

Because the vibe coded stuff is sometimes great, sometimes it breaks stuff, sometimes it breaks things that we fixed multiple times earlier. The PRs are too large, nobody can review that mess and you better be on call for your deployment. Maybe it will get better, maybe not. I dont know yet.

marcosdumay 20 hours ago

Oh, it won't get any better. LLMs already trained on every bit of code ever published, they won't get any more material.

therealdrag0 12 hours ago

throwatdem12311 19 hours ago

Gigachad 19 hours ago

The massive PRs is something that probably has to end. You can ai generate smaller changes in reviewable PR sizes. It probably even helps the AI code review tools to break the work in to smaller logical chunks too.

therealdrag0 12 hours ago

What about that means AI coding is a fad?

toasty228 21 hours ago

There is a whole spectrum between "ai coding is a fad" and "unlimited tokens for every employees we don't even care if it actually ends up being a net positive financially"

tmp10423288442 21 hours ago

> "unlimited tokens for every employees we don't even care if it actually ends up being a net positive financially"

That was clearly a short-term trend that would obviously get fixed. Doesn't say much about AI coding as a business model.

perlgeek 9 hours ago

As a side note, I wonder when we'll hear the first reports about employees reselling (parts of) their token budget.

Probably not worth it risking your job for a 200$/month good, but at 5K, I'm sure some folks will be tempted. Especially if companies do stupid things like token usage leaderboards.

anthonypasq a day ago

perhaps the personal computer? Companies were spending 3-5k (10-15k inflation adjusted) on every employee for just hardware.

everyone making comparisons to the dotcom bubble seems misguided. this is clearly computing 2.0 imo

thewebguyd a day ago

No disagreement on computing 2.0, but companies spending 3-5k per employee for hardware isn't generally a monthly cost. It's a at the time of hire, and then once every 3 to 5 years after that, for a monthly amortized cost of about $50/employee.

I have my concerns with current inference pricing in that there's a non-zero possibility for a rug pull in the future for the subscription plans for organizations and individuals that can still use them. For now, its only companies larger than ~150 users that need to pay per token, but what if that wasn't the case? Not every company can afford over $1k/month/employee to give them access to AI tooling, further making it harder to compete against the behemoths. If we get to a point where an individual can no longer pay $100/month for nearly unlimited usage and instead must pay per token, that's going to be a problem.

Personal computing eventually became an equalizer (until we started centralizing on mainframes again, aka the cloud) because it got cheap. My hope is that inference also gets just as, if not cheaper.

I have high hopes for local AI and open weight models and we will continue the ethos of local, personal computing and not needing to offload everything to OpenAI/Anthropic/Google, etc. to get work done once the hardware and hardware availability catch up.

GrinningFool 20 hours ago

dghlsakjg a day ago

dghlsakjg a day ago

The Dotcom bubble is an interesting comparison.

The general thrust that everything would be online was correct, it was just that the market mistimed and misallocated of capital by a decade or more. There was massive spending on infrastructure capacity that we wouldn't end up needing until the 2010s. There were hype driven valuations completely disconnected from business fundamentals just because a company was an 'internet' company. Things were going from cutting edge to obsolete in less than a year. There were breathless promises that this was business 2.0! Of course, none of that sounds remotely like what is going on today...

I'm optimistic about AI, but I also don't think that it is going to change everything as fast as promised.

threetonesun a day ago

jghn a day ago

Two things can be true at the same time. It can be true that this is here to stay. It can also be true that companies are grossly overvalued right now and that the market is irrationally exuberant. This would mean we could both have a crash and also see AI coding be the new future.

pixelesque a day ago

Hardware's not generally a subscription, monthly cost though.

You update it for them every 3/4 years (if they're lucky).

It probably makes a bit more sense to compare it to existing software subscriptions like Office, or the old-school 'per-seat' licenses per user for software.

thewebguyd a day ago

pmg101 a day ago

I think the right comparison is the invention of the microprocessor. At that time people were grappling with a lot of the same things we are today - would it automate jobs away, would it transform education and the work place, etc.

tikhonj 14 hours ago

I still believe Scrum is a fad and yet companies have been spending obscene amounts on to push it down developers' throats for decades now.

therealdrag0 12 hours ago

Scrum spending is very rare IMO. No company I have worked at pays anything for scrum.

maplethorpe 16 hours ago

> Which other tool went from nothing to this level of acceptance so quickly?

NFTs? My company had nothing to do with blockchain but I ended up working on NFT integration regardless.

Barrin92 a day ago

>Why there are so many people that still believe that AI coding is a fad?

Because there's not a single piece of evidence that this has improved the quality of the delivered software, or for that matter even the speed of features any of these companies produce, in fact if anything the opposite.

The point of software development, the hint is in the name, is to develop software, not consume tokens. If Uber was now full of 10x engineers the stock price of Uber would be up, not down on a yearly basis. Hilariously enough the only company whose stock price is up appears to be Antrophic

tuesdaynight 3 hours ago

I don't believe that the quality is the best metric for these companies. I doubt that Google has top-notch code quality in every product they developed, but it does not matter if they are making billions per month. Furthermore, I honestly believe that the quality stayed the same, at least.

sirsinsalot 18 hours ago

How dare you mention evidence! This isn't engineering you know!

jbvlkt 21 hours ago

Because writing huge amounts of code is easy for humans too. Agents already proved that they can do it. But are agents able to maintain it? I do not know and unless I know for sure, I am not fully committing to AI generated code.

i.e. I am able to write about 1k lines of code of "acceptable" quality per week. Which means in 1 year, there will be about 5Ok LoC. I am pretty sure, that I would have to spent like 60-80% of time to maintain 1st year code and the rest to make new features in the second year so I would have to hire more people and spent time to onboard them to maintain velocity. All of that are rough estimates, probably overoptimistic and way worse in 3rd year. Good luck doing such estimates with code agents. Even worse if you already have huge amounts of legacy code.

themafia 20 hours ago

Why are there so many people who mistake simple anecdotes for actionable data? Why do the majority of businesses fail rather than succeed?

LAC-Tech 20 hours ago

Because we have spent a lot of time and money using AI to generate code and have been unimpressed with the results.

As for why they got accepted so quickly 1) the industry's long running desperation to deskill computer programming 2) the addictive psychology baked into LLMs "That's an elegant solution! Shall I ... ?"

asadotzler 16 hours ago

Also, a bucket for VC to put all that NFT, IoT, blockchain, VR investment into. VCs gonna VC and the last 15 years of bets failed so the last few years have been a transition away from those toward "the next thing".

jujube3 21 hours ago

It's cope. People desperately want to believe that AI coding is going away so that they can go back to partying like it's 2020.

So there's a huge number of HN posters claiming that the price of tokens will go UP over time rather than down (that's how Moore's Law works, right???) or that code bases that AI contributes to will spontaneously combust, or something.

maplethorpe 9 hours ago

> So there's a huge number of HN posters claiming that the price of tokens will go UP over time rather than down (that's how Moore's Law works, right???)

I mean, Github Copilot's pricing just went up considerably, so I guess they were right?

dofm 21 hours ago

I don't think it is unreasonable to say both will happen, is it?

In the long term, tokens will fall in price. Obviously. (If "tokens" continues to be the unit)

In the short to medium term, for the IPOs to succeed, people have to start actually paying for what they are using, so the price will go up, and is going up, quite a lot. Once their value is set they will slowly fall from that point (or some point maybe halfway, depending on how much the market is willing to continue to subsidise).

I am an AI cynic, but I am now an informed cynic; I am learning agentic tools so I know where they are useful and I know my enemy.

I think the "fad" here is cloud-based, metered AI being a dominant work mode.

Nothing, so far, has suggested to me that any other outcome is likely than edge- to local-scale, on-device, on-laptop, on-prem models getting good enough to the point where people use them by default and use the cloud models only when they need the extra oomph.

I cannot believe that there is anything other than an enormous incentive for companies like Uber to find local, small model and on-premises solutions to their problems, not least while pricing is so changeable and people are getting nasty surprises.

Betting on OpenAI and Anthropic being around over the long term in the form that they are now, that feels like valley hopium. Utility monopolies essentially always derive from physical/geograpical limitations, don't they?

jujube3 17 hours ago

Der_Einzige 17 hours ago

Token costs do go down over time for sure due to software optimizations (i.e. better attention kernals) but acting like hardware INFLATION isn't happening for at least a few more years is just nonsense. Objectively an A100 is more expensive to rent today than it was in 2024 (a 7 year old GPU - Big short guy is a turbo idiot) and rising. As such, over short time horizons, it's possible to see limited amounts of "price per token goes up" for the same model.

oblio 10 hours ago

CharlieDigital a day ago

$1500/mo is $18,000/seat/annum.

Maybe Microsoft and Nvidia are on to something.

128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?

pqtyw a day ago

How is tok/s not a bottleneck I? I assume most people still use ai agents interactively rather than leaving them to do their own thing during the night.

I find anything below 50 tps or so entirely unusable...

Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.

brianwawok a day ago

I startup 4 or so projects then go do other things for 4 hours. I don’t have enough energy to steer overnight, but I’m at least “semi afk” for daytime steering. So throughput is king for me, tokens per hour. Not latency or actual tokens per second.

smallerize a day ago

sweetjuly 21 hours ago

Is interactive use for coding something that actually works today? With unsafe mode, even frontier hosted models are slow enough I end up just tabbing out to work on other tasks. It would need to be much faster if I am to sit and stare at it while it churns. Local models might be a lot slower but workflow-wise it doesn't change much for me.

cyanydeez a day ago

It's not a bottleneck if you care about the actual code.

pqtyw a day ago

Buttons840 a day ago

I think companies will eventually just buy a local AI server.

Using local hardware is expensive when it's running a complicated software stack that can break in 10,000 different ways.

These eventual local AI servers will just talk some protocol for AI and sit in the corner and nobody will think about them.

I guess they still might need access to various systems, so idk. Eventually I think someone will offer "AI in a box" though, running the latest open model or whatever.

pm90 a day ago

Yep, its already quite easy to do so with tools like opencode/openrouter. Ive used some open source models and they seem … ok? Im not doing foundational math, just refactoring code, understanding existing code etc. I don’t see a future where companies blow 11% of employee compensation on a single tool; the hosted AI server + oss models will 99% win out.

dangus a day ago

I don’t think companies will do that. Why don’t they just buy local on-premise infrastructure even though it’s cheaper than AWS?

“AI in a box” sounds a heck of a lot like “the box” from the Silicon Valley TV show. Or the Google search appliance. Or name any other on-premise thing that is equally dinosauric.

The real finding of this article is that AI tokens are direct competitors with offshoring. $1,500/month buys you a whole employee in India.

And this is before AI companies inevitably increase pricing after the conclusion of the growth phase.

pm90 a day ago

zozbot234 a day ago

I agree on the basic point, but running $1500/mo's worth of SOTA local AI is non-trivial already, and that's a figure for a single seat. That's equivalent to generating at least 20 tok/s on a 24/7 basis, in fact probably quite a bit more than that (because open-weight models are vastly cheaper than proprietary ones even when served from reputable Western providers - reaching the same spend would take around 100 tok/s or more, which is well within datacenter hardware territory).

You could probably reach the former figure on a prosumer platform but only for very special workloads. If you spend a lot of time on prefill (which is common for agentic workloads) the outlook is even worse since that's a significant constraint for any on-prem AI.

physicsguy 11 hours ago

It’s non trivial now - will it get easier in 12 months though?

dgellow a day ago

You’re way better to run your own on premise models. Laptops are depreciating assets, do not benefit from economy of scale, have fixed specs, result in a fragmented fleet where you need to keep models up to date. Without talking about power consumption and cooling issues. I really don’t see why companies would go that direction

bluGill 21 hours ago

You don't need to run on laptops, desktops plugged into mains power get more power consumption and better cooling. I want my laptop to work, but I can accept when I'm on an airplane at 32k feet I get less abilities.

CharlieDigital a day ago

Even if the laptop costs $5k and you upgrade it every year with the latest hardware and run local models (assuming your workload can tolerate smaller models at slower tok/s), you win.

darkwater a day ago

> it's WTF did Uber build with all of that spend?

You can ask the same for the median 330k salary in the US for Uber Engineering... and being a bit snarky, attending Uber engineers talks here and there at a few conferences, looks like. they love to (re)invent internal tooling/platforms. That's pretty expensive on its own.

EDIT: I'm not saying that Uber's engineers didn't add value to the company, they absolutely did and handling the scale up they had to handle is not an easy feat. But I do challenge the notion of "what features did they create with that (LLM) spending?" of GP.

SlinkyOnStairs a day ago

> You can ask the same for the median 330k salary in the US for Uber Engineering

People DO.

It's well known that most tech companies are ran incompetently. As you say, it's not the engineers' fault.

But most projects and hiring in these companies exists to juice promotion criteria. And that, depending on perspective, these companies are either massively overstaffed or massively underproductive.

The comparison to AI spending being wasteful holds up pretty well, these are companies that readily piss away billions in pointless spending.

hibikir 21 hours ago

CharlieDigital a day ago

This is what all "platform engineers" have to do once things are working nicely: you have to keep inventing work.

I don't know; I'm a Ron Popeil "set it and forget it" kind of guy. Make the dumbest, simplest thing that's going to work with some clear path for scaling. Then go do valuable things instead.

darkwater a day ago

quantified a day ago

Sure, but has their rate of value added increased as a result? It's a good question to ask. They added value before LLM coding, and now are more expensive than before thanks to token costs.

throwaw12 a day ago

you don't get promotion for supporting existing things, but for "inventing" you can get promoted. also for large migrations

FergusArgyll a day ago

This is a very good answer but there's a flip side too.

The idea of "if you add intelligence you make more money" is contradicted by the fact companies don't just always hire more people. Wy doesn't google just hire everyone?

ricardobayes a day ago

128GB machines can't run anything locally that is even nearly as capable as a frontier model like Claude. We can get an idea from deepseek v4 pro being 1.6T model, requiring approx. 860GB VRAM to run.

dkdcdev a day ago

at their scale they could also just run a large on-premise or rented (basically still cloud, but cheaper) GPU cluster and run through that. fixed costs, even license a SOTA model’s weights if you’d like

embedding-shape a day ago

> even license a SOTA model’s weights if you’d like

Yeah, I bet all labs releasing SOTA models are more than happy to remove the main way they make money and let you run it locally, especially if you're a big spender like Uber who seems very willing to throw money into the sea as an experiment.

throwway120385 a day ago

idiotsecant a day ago

mrweasel a day ago

The problem isn't really Uber, Microsoft or Nvidia, it's all the smaller none IT companies that also have developers on staff. They are screwed. $1500 per seat per month is just way to expensive, but they also can't afford to build and maintain their own on-premise solution. If Microsoft can't afford to run CoPilot for their own developer, what chance does any of their customers stand?

If the large, well founded IT companies in the world believes the current AI cost is to high, then Anthropic, OpenAI and CoPilot have no actual customer base. AI is then relegated to very profitable niche business, but that can't fund the R&D for the models.

skybrian a day ago

treis a day ago

mvdtnz a day ago

danans 17 hours ago

> How did it meaningfully impact their revenue in a positive direction?

It probably allowed them to avoid hiring as many people to build a certain amount of software. Even if it didn't increase revenue, it could have lowered human labor costs.

> 128 GB machines that can run local LLMs are a bargain even if priced $5-8k.

Don't forget the energy costs. Searching around, advanced models use an average of 25 Wh/1000Tok.

$1500/month gets you about 150M tokens.

At the aforementioned energy/token, that's 3750kWh.

What are your local office electricity rates/tariffs? (Hint: they are going up because of AI data centers). Even if my price and energy assumptions are wrong above, you probably aren't going to get the rates that the hyperscalers do.

Even at cheap (i.e Texas) retail electricity rates, that many tokens will probably cost you hundreds per month. In most other electricity markets, probably far more.

eclipticplane 16 hours ago

How much more software does Uber need?

Unless they are iteratively replacing expensive vendors and optimizing other headcount costs?

jvanderbot a day ago

Right - the future of LLMs is like ol' windows XP+Dell. Commercialized "things" you run locally offline, co-designed with hardware, with a known productivity suite, and large businesses building the next generation thing and suite with 18mo release cycles (ish).

nonethewiser a day ago

XP? I can see the argument for enterprise support but in that case the latest windows OS is going to be virtually free and I dont know if MS and Dell etc. would even support an XP machine. Might even be required for hardware. If no enterprise support wouldnt Linux make a lot more sense?

I get that if it's offline the security downside of XP doesnt matter, and I assume XP is free, but being free doesnt really seem that valuable compared to alternatives (free linux and virtually free OS if buying wholesale).

jvanderbot a day ago

treis a day ago

I don't see it. Leasing equipment and paying per seat license fees makes a lot of accounting and cash flow sense. Maybe when it gets to the point where you can run SOTA LLMs on consumer hardware. But that seems a solid decade and probably much more away.

Even then it makes more sense to rent the bigger GPU and get your answer faster.

gedy a day ago

There's waayyyy too much money betting on that not happening, to the point I feel there'll be regulations popping up for "safety reasons" etc to ensure the big players control this.

thewebguyd a day ago

sajithdilshan 21 hours ago

I don't think it's necessarily what Uber build, but the gained productivity. If the engineers use the AI tools the correct way, it can drastically increase the productivity and that means they can actually use the LLM as a junior or an associate engineer. $1500/mo is way cheaper for that level of productivity where as they would have had to pay far more for a human engineer.

ssivark a day ago

Even if companies decided to move away from expensive models from the major labs, it probably much more economical to pay a cloud provider to host some open weights model which could then be amortized across all (internal) users and do inference at a substantial batch size, rather than giving everyone their own hardware -- which means the company would need to provision for peak usage and inference at batch size of one.

ungreased0675 a day ago

Your last question is really important. What did they accomplish with all that spend?

I suspect there’s some mass delusion with respect to actual accomplishments as a result of LLM use. Sure, things are moving faster, but does it matter?

oblio 7 hours ago

Never confuse movement with action.

devttyeu a day ago

If you believe a 128gb machine that is essentially DGX Spark in a laptop chassis can run models comparable to SOTA you either never ran open models on hard tasks, or you aren't scratching the surface of SOTA closed LLM capability in how you're using them.

f311a a day ago

Can you show me an example of a hard task that can't be achieved using light models? When we don't want the model to work on autopilot without reviewing the code at all. Even SOTA models will produce garbage code, if you don't guide them all the time.

Hard tasks require a lot of guidance and code reviewing, unless you are creating another throw away project where correctness, maintainability and code understanding does not matter.

infecto a day ago

I am wondering more and more if this becomes true as these smaller models take off. I might be old fashioned but I have yet to crack the workflows some of the hype people spout like Claude codes Boris where he and others talk about running hundreds of agents overnight.

I have still found the sweet spot for me is using LLMs but I am still in the drivers seat.

CharlieDigital a day ago

That's because for some of these folks, the cost of the tokens doesn't have to match the value of the output; the hype from the story is all they need.

Normal people have to produce something of value from that spend. So starting 100 agents and then waking up to something cool but useless just means you spent a few thousand dollars and created nothing of value............

ofjcihen a day ago

Running hundreds of agents overnight is almost certainly 99 percent waste.

thelastgallon 17 hours ago

>WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?

Uber (and quite a few bay area companies and startups) can afford to spend that money. There is no expectation of profit, Uber lost ~62B and growing: https://uberlosses.com/

oblio 7 hours ago

As much as I love to hate on Uber, that website is from 2022. Uber has been profitable since 2023.

It's profit margin seems to have stabilized around 10%.

The real economic crime is losing at least $40bn over 10 years scaling a business that ended up having retail profit margins (i.e. low profit margins).

sourcecodeplz a day ago

$1.5kpm for SOTA. 128gb you run DSV4 Flash.

pqtyw a day ago

What's the point of running it locally though? Inference for open models is quite cheap already. They could just selfhost, anyway. The experience of running LLMs locally will be excruciatingly bad in comparison at least for the near future.

jcgrillo a day ago

> WTF did Uber build with all of that spend?

WTF did anyone build with all that spend? Despite all the feel-good anecdotes about how productive folks feel using ai coding tools there's a deafening silence when it comes to actual, demonstrated efficacy. How can we be this far entrenched in these workflows and still not know whether they actually do anything useful?

awesan a day ago

I can say at least for me at a small-ish company (~40 FTE) there has been a surge in internal productivity tools. Nothing to improve the end user product directly but a lot of tools to make processes easier and less error prone.

What would previously be janky internal dashboards or excel sheets are now actually nice to use tools. That said of course the maintenance cost of all that has yet to be discovered, and the ROI is questionable.

CharlieDigital a day ago

jcgrillo a day ago

ftkftk a day ago

~70 FTE Engineering team. We are shipping more features, especially features that previously would not have survived the cut to make it on the roadmap. Even though we are shipping more, our total amount of escaped bugs has not increased, so our escape rate has actually lowered. On top of that we are able to triage and fix escaped bugs more quickly now. And then of course there has been an uptick in internal tooling that makes the rest of the company more efficient, and we have been able to address tech debt at a higher rate than before.

I don't think this would have been possible without having solid engineering culture and processes in place before bringing in ai coding tools.

And I don't want to sugarcoat it, this hasn't been easy, requires continued discipline, and took well over a year to get good at. And we still have to continuously learn, experiment and adapt our training, tooling, and processes.

CharlieDigital a day ago

nonethewiser a day ago

The real answer?

Software engineer quality of life.

There can be an increase in productivity without a corresponding increase in total output. The gains could be captured by software engineers doing a days work in an hour then fucking off in a variety of ways.

pqtyw a day ago

MengerSponge a day ago

slopinthebag a day ago

RugnirViking a day ago

Imo its pretty clear that anyone who is taking the issue at least somewhat seriously knows the amount of value they provide is not non-zero. However, the problems are manifold: firstly, toolchains vary wildly, from fancy autocomplete, to engineers chatting with codebases they're unfamiliar with, to people integrating them into devops and infra, to people doing spec driven development, with a thousand philosophies inbetween. Many people suspect that those above them in the ladder are on the cusp of massive failure due to losing track of the code, and many people higher on the ladder think those below them are overly cautious. I hate to be the guy saying "oh it must be somewhere in the middle", but I will say at the very least I like being able to use it to read docs for me, and to synthesize syntax and simple scripts (give me a join that works across these tables and gives me column x, y and z - give me a python script that parses a file like this example and extracts abc data - given this api spec figure out how I can get this data from this endpoint, go)

as for building actually complex software, the art of that is not in simply chaining together such scripts. Its the art of using architecture and testing to shape uncertainty, and developing requirements (and extrapolating sensibly from incomplete requirements). I don't think llms are great at this, but they arent terrible either. A lot of the more active users in the space are doing stuff where theyve realised they need more detailed specs, which like, yeah, we knew this already - better defined problems lead to better software.

jcgrillo a day ago

empath75 a day ago

I think probably the correct spend is something closer to 10x that if people can figure agent coordination problems out. It's not even really about capability at this point, it's about keeping track of what agents are doing.

m3kw9 a day ago

You can't get an edge using local models, these guys may have competitors that will spend on SOTA models. They won't likely ever consider local machines even for some offloading scenarios, the complexity and costs will be even higher.

CharlieDigital a day ago

Consider rewiring your perspective: getting an edge doesn't really matter; the only thing that matters is will customers pay for this? Is this a useful, valuable problem to solve?

Coding faster doesn't really solve that.

Uber makes more money if people buy more rides, order more food, have some breakthrough in autonomous driving. They can save money if they can optimize some ops or spend somewhere. Is there any evidence that with the spend on AI that they achieved any of this? If they did, I'm sure we'd hear about it in some engineering blog.

analognoise a day ago

18k/yr? None of the LLMs generate anything like that in value!

simonw a day ago

I'm definitely getting that much value out of Claude Code and Copilot.

CharlieDigital a day ago

ofjcihen a day ago

siliconc0w a day ago

I use the $100/mo sub but my 30 day API cost is about $1700/mo.

It really depends how you use it, if you're using prompts to generate detailed designs, breaking those into lists of tasks, and then feeding those to multiple agents - it's really easy to burn through many thousands.

If you're being more deliberate and using a few agents at a time interactively, having it review PRs/resolve issues, automated clean-ups and performance optimization, etc it could be more like $1500.

If you're just throwing it one-off questions like a better stack-overflow that is well under a $100.

I've really gotten into /goal, if you can find something verifiable and leave it overnight - it's kinda like christmas morning to see where it landed.

thesumofall 10 hours ago

Plenty of comparisons here between salaries and token costs. All fair but very much assumes that salaries are rational. Why do we pay some engineers 10x as much for the same role just because they are in a different location? The WFH discussion surfaced some of that. If money is cheap, all sorts of funny things are happening. Is it worth to spend 1500 USD on AI? I don’t know. Is it worth paying engineers 300k USD instead of 30k? Honestly, I don’t know

wiseowise 8 hours ago

Why even pay them at all? Just lock them in a cell and give them a bowl of rice.

palmotea 10 hours ago

> All fair but very much assumes that salaries are rational. Why do we pay some engineers 10x as much for the same role just because they are in a different location?

Who's this "we" you're talking about? Are you a software engineer or a temporarily embarrassed billionaire? Do you think the rational thing is to pay the lowest regional salary worldwide?

inemesitaffia 9 hours ago

If your competitors do, you likely will

palmotea 2 hours ago

bjackman 9 hours ago

As well as rational vs irrational they are also just different types of spending.

Hiring someone vs paying a vendor for a service:

- different level of commitment

- might tie your org to a physical location

- different legal risks

- shows investors a different picture (probably this would even influence a bank loan)

- manager has to fight a different bureaucracy

Not to mention that comparing the cost of a hire by looking at their salary is pretty dumb. ISTR hearing at Google that the overall estimated cost of employing a SWE is like 4X their compensation? Can't remember the exact figures though.

marcosdumay 19 hours ago

Just to put this in context. If every company did this, all over the world, with that same limit, we are talking about something around $45B monthly in revenue for all AI companies to share.

vb-8448 18 hours ago

There are a lot of places in Europe where 1.5k$ is more than 50% of the total cost of an employee.

And the obvious question: what it's the cost of that revenue? Because it looks huge but ...

luisgvv 18 hours ago

Don't you forget about India and Latinamerica... No way I see companies paying that much for outsourced employees

marcosdumay 17 hours ago

One could hire a competent developer here in Brazil for that amount. I know because my workplace has hired competent developers for that amount. You can even call them senior developers, but you can't get "non-startup seniors" with actual experience, those expect a bit more.

I just wanted to take their number at face value. It's not like it needs more real information to make AI a bubble.

FanaHOVA 19 hours ago

Are you saying there are only 30 million people employed in white collar jobs in the world?

marcosdumay 19 hours ago

About 30 million software developers. At least that's what a quick web search says.

piskov 19 hours ago

patrickmcnamara 19 hours ago

45 billion / 1500 $ is 30 million workers. How did we arrive at 30 million?

tuvix 19 hours ago

I think maybe he meant specifically for software engineers?

cousinbryce 16 hours ago

World bank says there are 3.7B employed humans. Putting the total addressable market at around 67T if all of us spend USD 1.5k on tokens every month. This lines up well with current forecasts from the major AI labs

root_axis 15 hours ago

> Putting the total addressable market at around 67T if all of us spend USD 1.5k on tokens every month

However, that's an absurd scenario.

0xc0c0c0 11 hours ago

Congrats, you're hired at Anthropic.

barumrho 15 hours ago

well, you couldn't justify the cost if you still employed all 3.7B

oblio 19 hours ago

That's a bold assumption. Increasing costs by roughly $18 000 per employee worldwide is highly unlikely. For reference even at FAANG in Europe, that would be a 7-15% cost increase for a senior developer. More like 15-30% for non FAANG and even more for non-European markets.

credit_guy 18 hours ago

I don't think it's a bold assumption, but I also don't think the assumption would lead to the conclusion.

1. Why it's not a bold assumption: it's a bit shocking now. But in two years or so, many/most companies will realize this is the cost of doing business. Just like people are ok with using Outlook, or Office 365, or (in the case of Wall Street) Bloomberg terminals, people will realize that developers will need AI coding assistants.

2. Why the conclusion does not follow from the assumption: if the limit is set at $1500/developer/month, it does not mean all developers will use it. Companies will set incentives for people to not be very wasteful. It is more likely that on average developers will consume $100-200 worth of tokens per month, and there will be some outliers who will consume 10, 100, or 1000 times as much, but they'll be few.

oblio 18 hours ago

jkwang a day ago

The $1500 number is less interesting than the fact that they hit a ceiling at all. Most engineering teams I've talked to have no idea what their AI spend is per developer because it's buried in a consolidated cloud bill. Having a hard cap forces two useful conversations: what workflows actually justify API calls vs local inference, and whether the output is being measured against any real productivity metric. Without that feedback loop it's just a race to see who can burn tokens fastest.

simonw a day ago

Both the Anthropic and OpenAI "Enterprise" plans include per-developer analytics:

Anthropic: https://support.claude.com/en/articles/12883420-view-usage-a...

OpenAI: https://help.openai.com/en/articles/10875114-workspace-analy...

Igrom a day ago

I believe you might be replying to a bot account.

lazyasciiart a day ago

c7b 6 hours ago

1,5k. For two months of that spend you could buy a machine that can self-host decent models, plus a year's worth of electricity. It's not up there in terms of quality, but with a bit more effort it works pretty decently. I'm completely baffled that that's not way more common, is it really just the quality?

reddec 6 hours ago

Second here. From recent Alibaba Qwen conference: the all-in-one box (DC in a box - I think I was called Apsara, 0.6x0.6x1.5m) plug and play, 1.5TB GPU RAM, capability to run in a fully air gapped environment, any open models... All of that is roughly $300k one time. And this box can do non LLM tasks as well. Performance (throughput) around 20k t/s. Delivery time - around 2 months. For any medium sized company its perhaps cheaper to just buy it once than spending 1.5k for cloud per user

jon_adler 39 minutes ago

Where can I find more information on this? A web search didn’t reveal much for me.

dmos62 6 hours ago

Decent vs best-money-can-buy. Further, a self-hosted LLM will be much slower.

VBprogrammer 6 hours ago

I think we're all past the "bet-money-can-buy" stage. The most expensive models are an order of magnitude more expensive than the middle ground ones, so you need to be selective about what you run where.

And with a bit of careful routing - there isn't a lot stopping you sending the hard stuff to a cloud model and the average stuff to an on prem model.

dmos62 6 hours ago

VBprogrammer 6 hours ago

I'd think for most companies the pace of change is too high at the moment. Give it a few years, a bit of a plateau in the improvements in frontier models and I can't see how many of these companies don't implode under the weight of competition on inference prices.

blobbers 19 hours ago

I think the main thing companies should try to understand is avoiding the use of 'claude -p'.

I definitely have written a goal file, and then just ran claude in a loop over the goal in order to 'token max'... why not? I'm doing research and have some clear KPIs where research into all kinds of techniques / tuning can improve the results. I can spend my budget on a "experiment with blah blah blah to improve blah blah" or give it a list of things to try that I know will take awhile.

Its no problem hitting hundreds of $ of API spend while sitting at a computer with 3 monitors have 6 windows of useful claude code interactive sessions, while working on 2 or 3 projects and using worktrees, and it's a little weird when you hit your limit by 2 o'clock and have to wait for token budgets to reset; god forbid, I manually edit code... which I did do for the first time in months.

You can also start to generate a lot of token spend if you do something like "hey make me a stylized slide deck using internal skill / agent XYZ based on commits A through C", which as an engineer, makes presentations building much less painful.

This uber limit is not high compared to the big SV companies.

jatora 19 hours ago

I also randomly wrote some code in a bind yesterday, while I was on the toilet, and it felt so strange. That was the first I'd written in probably 6 months.

tcoff91 18 hours ago

You don't even make small tweaks by hand? There's so many things that are honestly faster to do by hand than wait for agents to do.

jatora 14 hours ago

suncemoje 20 hours ago

Lock-in / switching costs are increasingly concerning me. I am using Claude for a good year now and have been accumulating so much "knowledge" in there by now. If Claude became less favorable in terms of price/performance in the future, that would worry me. I've started to think about a distributed solution, where my storage is detached from the inference, but currently Claude is still the way to go for me. Wondering if anyone has similar concerns?

dadoomer 20 hours ago

Isn't all the "knowledge" just text files? I've transitioned between services easily by simply copying the text files.

darkwizard42 20 hours ago

You can even just instruct the LLM to create a context file for you! They are surprisingly good at that as well.

iLoveOncall 18 hours ago

ksajadi 17 hours ago

This.^ I realized this first when moving a design spec from Claude chat to Claude Code and panicked. I literally had to build something like Notion but for agents to act as a portable memory between all cloud and local models and agents. But honestly it paid off!

If you are interested you can try it out at markbase.cloud (disclaimer and all that). I am not charging for it.

NichoPaolucci 16 hours ago

We run a "context" repository that enables us to transition pretty seamlessly from model to model (usually codex to claude and back). It has skills / plugins / connectors / tooling in relatively malleable MD files. That's what I see as the future. Rather than exporting IDE settings we'll just carry our markdown to the next best tool.

It's hedging a bet at this point, but that's why people say there's no moat. If the tools are properly used + maintained, there should be no reason we can't use a new provider even next week (maybe with a little tweaking).

ksajadi 15 hours ago

fg137 19 hours ago

What knowledge?

Unless you work in some obscure domain, chances are that any general "knowledge" Claude has "learned" is already public data somewhere.

If you don't believe me, launch Codex and immediately start working on the same project (s). You might discover that all the knowledge accumulated means almost nothing.

linsomniac 19 hours ago

Claude Code definitely remembers things about you. For just one of the more obvious examples: I was recently asking it to make some suggestions on software alternatives, and part of the answer included (paraphrased) "While a hosted service may be attractive due to your small ops team size, your experience with hosting Linux container-based services puts this squarely in the realm of an option for you." My prompt mentioned nothing about this.

This isn't something that is public knowledge, in the sense that you mean it.

Just earlier today it asked me if I wanted to create a jira ticket for something I asked it about doing. My prompt mentioned nothing about jira.

If you use Claude Code, you might want to take a look at the "auto memories" files that it creates. See "/memory" for some more information.

danny_codes 13 hours ago

Not worried at all. Switching is trivial. Rebuilding context isn't very difficult and harnesses are a dime-a-dozen.

sparrc 20 hours ago

My favorite solution to this is to use the Cline coding agent, which is open and allows you to easily switch between different providers and models.

spicyusername 20 hours ago

Knowledge in there?

Where is the knowledge stored?

All of my knowledge typically gets stored in plans outside of the agent?

And each agent window gets archived regularly, anyways.

john01dav a day ago

Why isn't self hosting (even just renting a GPU server, not necessarily on premise) at large companies or hosting via something like together AI to run the open weight models not more common? I've tried the open weight models and the premium models like Opus and Gemini Pro, and I find that the latter are a little better, but not nearly to the degree to justify the extreme price difference, since the differences largely don't matter for what I've tried them for, and I expect that many other users likely have similar use cases.

soleveloper a day ago

If the premium models are just about 10% better - that could justify the price vs. self hosting a ~0.5-1T open weights model.

Remember that utilization of these huge racks will not be 24h/7, and these are usually not GPU intensive shops that would train models on the spare compute. With prices of 100-200k USD and north with ~2 years lifetime, that would be hard to justify financially.

Self hosting could easily amount to ~1000 USD a month amortized across many developers. In rush hours - there will be hard rate limits.

Would that 1500-1000=500$ monthly USD justify the 10% decrease in "AI Productivity" ? I guess not. In most cases.

For everyone that asks me around, I'd say that in short term, unless there's a really good reason to self host these coding assistant models, then the big 2/3 coding assistants providers are the better choice.

No one got fired from licensing claude code.

Jianghong94 21 hours ago

I just went through a similar discussion in my $WORK (traditional finance company on NYSE with average IT expertise) and I think the thought process is as such: it's one thing to just give your stellar dev/hacker a beefy GPU server and run whatever model they can run; it's another thing to maintain such platform for company wide. You would need human resource (likely way above normal software dev paygrade) to understand and maintain such models, maintain backend, availability etc. All these extra hassle make it just easier to pay a top tier external lab + slap a reasonable spending limit on everybody.

esikich 21 hours ago

Why do you think it would be more common? The pooling of GPUs to serve multiple users and connecting to docs/datalakes while respecting security controls, as a start, is non-trivial. You'd end up paying a team to manage that.

datsci_est_2015 21 hours ago

There’s probably plenty of money to be made in LLMs as a service - but not enough time has passed for the commodification to occur. I’m with you in that when the dust settles I don’t think any of the frontier model providers will have a moat. Just like during the dotcom boom a catchy URL and a webpage that could accept payments wasn’t a moat, either.

malfist 21 hours ago

Where are you buying the GPUs to have enough compute to run a medium size buisness?

fg137 20 hours ago

> I've tried the open weight models ...

You tried that on a personal machine for yourself once. It's completely different calculation when serving a model to 3000 employees with ever evolving hardware and software requirements. You'll need dedicated hardware in data centers and experts to run them. A company will need to figure out how to manage acquisition, assets and expenses plus 1000 other things, in addition to its actual business. Guess who has figured out all of that already? AWS/Azure/OpenAI etc.

fg137 21 hours ago

For the same reasons companies are not building data centers for their "regular" hosting and storage needs but put things on AWS, Azure etc.

It costs money to maintain the hardware and hire experts to manage the services. For something as common as LLM models, there is absolutely no reason a company serves models on their own hardware unless they are maniac about sending bytes to AWS.

linuxhansl 17 hours ago

I use Claude every day. Often for multiple hours a day. Basically doing my job not worrying how many tokens I spend (as in too many or too few). This is a pretty complex code base (database optimizer and related).

Just looked at spent for the past 30 day, didn't even come to $600. 95% of my tokens are from cache. If I were to reach even $1500 I have to let claude run unsupervised over night (and with the amount of mistakes it still makes and guidance it needs, I do not believe we are there yet.)

root_axis 15 hours ago

> didn't even come to $600.

That's still in the ballpark. A modest change in your usage habits or workload could easily get you there.

fontain 16 hours ago

is this with a subscription or pure API billing?

geodel a day ago

> A $1,500 monthly limit per tool strikes me as a rational policy response to over-spending,...

> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.

This whole article seems to me like Multi level marketing "businesses" where 'Diamonds' have made their money by promoting MLM in seminars and telling hopefuls at bottom that "Buying AI subscription now is their one shot to be a winner in life"

Perhaps there is something to MLM vs LLM to create a FOMO effect.

iLoveOncall 21 hours ago

That's just Simon Willison since LLMs came out. It's glaringly obvious that he's a paid shill.

simonw 17 hours ago

Genuine question: what would make me a "paid shill"?

Who do you think would be paying me, and what would they expect in return?

iLoveOncall 11 hours ago

fontain 21 hours ago

oh come on, a paid shill?

Simon is very fascinated by AI and at times he can be a little too optimistic but he is generally balanced and his perspective evolves over time which can be seen in his writing.

Nerd who loves nerd things a little too much? Sure. Paid shill by Big LLM? Nah.

emp17344 19 hours ago

iLoveOncall 21 hours ago

dzonga 16 hours ago

> That means each employee's AI spending cap is ~11% of that median compensation package.

when looking at costs - numbers make sense. however decisions as an org/company/solo founder - costs help you set prices, but to reach profitability you want to model around ROI.

now the question is what's the ROI for a $36K/investment per engineer or $90M for the total org ?

I bet the ROI is negative.

NichoPaolucci 16 hours ago

I'm in a similar boat - it's hard to measure, but let's say you pay an engineer 150K. Giving them a tool that costs 15K a year is effectively a 10% increase in that expense.

If we were seeing 3X, 5X etc improvement from individual engineers, that 10% increase in expense would be a fantastic investment (even 3 engineers for the price of 1.1??!). I have a feeling they are just not seeing that much of an improvement.

CSMastermind 19 hours ago

A blanket cap makes no sense to me. There's a power distribution of AI use in my company and I'd imagine it's the same at a much greater scale at Uber.

I'd guess there should be a few people Uber is bascially allocating unlimited AI spending to and a large swath they're giving basically nothing.

seanlinehan 19 hours ago

I would assume that at least one of two things are true:

1. They're costs are so so out of control that they need to impose a blanket cap immediately. Figuring out an allocation mechanism that can be deployed company wide is time consuming and they need to staunch the bleeding immediately, despite it being obviously suboptimal.

2. The few people who should have unlimited tokens were given exactly that. No reason to introduce such nuance to a public PR move. The hard-cap limit is a great negotiating posture with token providers.

watershawl 5 hours ago

Do you think companies are gonna be like?:

Wait a minute. We didn’t save money by adding AI. We just added an expense.

Now we have to pay for employees AND AI.

szatkus a day ago

That's a lot. On my usual day I burn less than $1 on Opus. I could get beyond $10 only if I have a complex and well-defined problem, which is rare (the second part at least).

sothatsit 18 hours ago

You must not be using coding agents. You can sneeze and spend $1 on Opus in Claude Code.

colonelspace 21 hours ago

If a worker doesn't use their AI/LLM budget, can they get a raise?

asadm 21 hours ago

probably will get fired for lack of performance.

colonelspace 20 hours ago

Let's just say their performance (OKR, KPI, whatever "impact" metric you want) was indistinguishable from a peer that used the AI/LLM monthly allowance in full.

Maybe a $10k raise would be nice?

HDThoreaun 20 hours ago

cdavid 19 hours ago

no because it does not come from the same budget

colonelspace 17 hours ago

Money spent is money spent.

PessimalDecimal a day ago

These are still at currently subsidized prices. We'll see if they think they're getting $1500/month of value when that buys significantly fewer tokens.

square_usual a day ago

There is no evidence that per-token inference prices (which is what Uber is setting a cap on) is subsidized.

jordanscales 5 hours ago

The evidence that per-token inference _is_ subsidized is (a) competition is a bloodbath (b) these companies are raising more money than any company has raised ever (c) a maybe-profitable quarter is maybe-coming for Anthropic after maybe-signing a compute deal with SpaceX that legitimizes both companies.

The evidence that per-token inference _is not_ subsidized is... a quote or two from Dario and Sam Altman

pier25 a day ago

AI companies have more expenses than inference.

RugnirViking a day ago

lelanthran a day ago

Is there any evidence that it's not?

Topfi a day ago

pqtyw a day ago

thejazzman a day ago

pdyc a day ago

afaik, enterprise plans are not subsidized. its 20$/seat+api pricing. Unless you are saying api pricing itself is subsidized.

LurkandComment a day ago

This is market introductory pricing that hasn't factored in cost recovery. Most of it has been run on early investment with the assumption they will recover costs in the long run. The prices are subsidized across the board and they will need to go up signficantly to recover them.

swiftcoder a day ago

pqtyw a day ago

logancbrown a day ago

boringg a day ago

True but they will raise prices slowly so people will optimize their workflow so they aren't just throwing as much inference as fast as possible like the current state. Right now you should do everything you wanted to try out because it is cheap (as long as you don't become dependent ... the risk).

sourcecodeplz a day ago

I understand current Codex $20 sub is worth about $480 GPT5 api credits.

esafak a day ago

pqtyw a day ago

The inference prices for very large open models would indicate that Antrophic's and OpenAI's margins are quite large.

MagicMoonlight a day ago

It's not. They recently forced enterprise customers onto API billing instead of the cheap consumer pricing. Now the pricing is brutal.

pmontra a day ago

I wonder what they are doing with $1500 per month. I'm on Claude Pro $20 plan and I'm doing well. That's 3 days per week. On the other 2 days I'm using a customer's Claude Max, I don't know if it's the $100 or the $200 plan, but I'm sharing it with some of its other developers.

hrpnk a day ago

$1500/mth is token pricing.

Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly. These plans are economical only if majority of users spend less tokens in $ than the plan's costs. This subsidizes the gap vs. power users who spend multiple k$ monthly in API tokens.

pmontra a day ago

> Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly.

Or the fixed cost plans reflect the real cost and the people paying API prices give them the profit.

Anyway, none of my customers will let me bill them $1500 more (about $75 per day) because I'm using AI. And what for? I'm not working to move money from the pockets of my customers to the pockets of AI companies.

fontain 21 hours ago

kingstnap a day ago

Next to no one would be using less than the subscription price given how expensive Opus API is.

flyinglizard a day ago

Yea, I’m sure the personal plans are subsidized. I have $200 Claude Max at home and straight API pricing at work and equivalent work would easily cost me 5x if not more on the API.

SyneRyder a day ago

I'm on a $100 Claude Max plan, my usage is only about 50% of the plan limits, but in the last 30 days my usage was equivalent to API token spend of $1850. If you save all your Claude Code conversations, the saved files include API costs and you can calculate this yourself.

One of my most expensive sessions cost me over $100 in token spend in a single evening. I'd just found out that the time tracking & invoicing SaaS I use is increasing their monthly pricing by 2.4x - so I assigned Claude Opus 4.8 to recreate the entire SaaS for myself, and load in 13 years of my historical data. I've only completed a full read-only implementation so far, with adding & editing of records still to come, but I do expect Claude will have fully recreated the entire SaaS for me at an API cost less than a single 1 year seat of continued subscription to their service. And since I'm actually on a Max plan, it didn't actually cost me $200 of tokens at all.

coff i would not buy the Bending Spoons IPO coff saaspocalypse

I could ramble on about where the other $1750 of usage goes, but I imagine it's similar for most heavy Claude / AI users. Interactive coding sessions, a daily personalized podcast, some automated overnight agentic "proactive" sessions, a daemon that wakes up if I send Claude an email or voicetext to check something when I'm out. I've also noticed that if Claude's tool-use goes haywire & Claude gets confused or lost, sometimes a single email reply session that would normally be just $1 of API might spiral to $12 of API while it bangs its head against trying to run a program that's in a different folder to the one it's currently in. Sometimes a simple 'pwd' would save you a lot of headache, Claude....

idiliv a day ago

Uber is likely on an enterprise plan - these charge tokens at API cost, which can be much more expensive than the $20 flat rate.

deviation 4 hours ago

$300/day at Apple, with an increase to $500 with manager approval.

newobj a day ago

It's also a useful signal for AI value. Looks like it's a max value add of $18,000 per engineer per year.

Anon1096 a day ago

No, that's not what it means at all even if just doing it purely in math terms. Really it is just a reasonable amount to cap at to stop the long tail of super spenders (tokenmaxxers). You could also call it "the amount of AI spend after which Uber has decided there is diminishing returns for the average engineer".

dandellion a day ago

I'm sure if a dev can show useful results at 1k they won't have trouble getting permission for a higher cap as well.

csallen a day ago

It's not so simple to determine and generalize how much value AI adds. It's going to be different on a per-company basis and a per-engineer basis. It's also affected by the competitive market place and how many other companies are using AI for their engineers.

For example, what if you're a tiny startup and you're considering whether to hire an extra engineer or do all the coding yourself. I would estimate that AI is worth far more than $18,000 a year in that situation where you might reasonably decide to put off hiring an engineer.

pqtyw a day ago

I find it really doubtful anyone has managed to quantify that in any meaningful way. Seems like mostly an arbitrary number. Also the article does claim that's its actual several times more than 18k if you are fine with using Codex, Cursor or etc. when you Claude tokens run out.

alasano a day ago

Their initial budget for determining how much value AI adds is $18,000 per engineer.

tfehring a day ago

Not really. There are clearly diminishing marginal returns, so it's likely that the first $2,400/engineer/year adds >>$2,400 of value, even if 18,001st $/engineer/year adds <$1 of value.

themafia 20 hours ago

It means Uber thinks they can sustain that level of expense. Whether engineers at Uber are representative of the rest of the work force is an easily debatable question.

eqvinox a day ago

It's among a wave of fresh "non-insane" takes on AI in the enterprise. Maybe we can reel things in to a sustainable level before a giant bubble bursts.

cmiles8 a day ago

And $1500 a month is on the very high end of where most companies will land. When you run the numbers there isn’t a realistic path that connects the dots between likely market size and the claimed valuation of the AI companies. The math simply does not add up.

schnitzelstoat 7 hours ago

How are people using so many tokens? I'm on the $200/month enterprise plan for Claude Code (because it's a better deal than the API pricing) and I don't come close to the limits.

If you use stuff like opusplan and /advisor so you use Sonnet for most of the work and only Opus for the really complex stuff then it's quite easy to keep costs low without affecting performance.

Deathmax 7 hours ago

All new/renewing enterprise contracts with Claude Enterprise and ChatGPT Enterprise no longer offer usage-based subscriptions, but instead will charge API pricing for all tokens consumed, and as you've said, the subs are better deals than raw API pricing.

Marciplan 6 hours ago

BigCo's are not using the plans we are using, they can't.

throw0606 2 hours ago

When blue-collars were loosing jobs they were told to learn to code and now engineers are vilifying AI for taking jobs

kixiQu an hour ago

Do you believe the same people were saying those things? (Were they really?) The idea that "different attitudes towards labor have been expressed by different people" doesn't feel too remarkable

827a 13 hours ago

This week an S&P 20 company with previously unlimited Claude limits also set a $250/mo/person limit; though its unclear to me how widely the limits are being enforced, may be the case that its just non-software engineers. Do with this info what you will.

etothet a day ago

In my experience, this is far below the cost the average dev will incur per month so this seems very reasonable to me. And, no doubt there are exceptions for heavy users so they can get some extra token usage when they need it.

waffuldrop a day ago

unless they changed something in the like 2 months (edit: besides implementing a cap for claude code specifically, since other tools already had caps) since ive left my job there im pretty sure 1500$ is the very max you can use after maxing out free calls, initial budget, then 2 extensions individually reviewed by your manager

higher ups pushed for these last 2 years to be AI focused so I don't think this restriction is a measure of "don't use too much AI" as much as it is a measure of "don't use only 'manual' AI tooling" since we had a dozen more specialized tools in-house running locally or otherwise that didn't count towards the budget

sameersri2004 5 hours ago

Its a lot when using Chinese models, less when using Opus 4.8

andix 18 hours ago

It finally puts a number on productivity gain of engineers with AI. This is probably less than 10% of the cost of an average uber developer. So they don't assume much more productivity gain from AI than 10%.

(Cost of an employee is much higher than their salary, it includes things like office space, supporting structures like HR/accounting, insurance, hardware/software, and much more)

al_borland 18 hours ago

But is it an accurate number? Does AI reach diminishing returns after $1,500/month, or is that all they are willing to risk/burn to stay in this game?

andix an hour ago

> But is it an accurate number

No. There is no accurate number.

epsteingpt a day ago

Uber engineers reported that loading their workspace and pulling recent commits exhausted that AI limit for Claude Code (4.8 x-high) immediately.

wmf a day ago

I don't think loading up a single context window costs $1,500. Which limit are you talking about?

rasbmn a day ago

Uber is in the business of experimenting with robotaxis and automated food delivery.

They can't say that $0 per employee is the appropriate amount for AI spending. So they capped it, perhaps in order to "send a signal" that is eagerly picked up by the AI boosters.

There is no signal. Uber does not work any better since AI. They still want to promote AI, so they chose the highest number that doesn't bankrupt them so the press and AI promoters pick it up as the new price anchor.

Probably they'll quietly reduce the number more soon.

lazyasciiart a day ago

Is this inside knowledge, or speculation?

LurkandComment a day ago

1) This happened because they fundementally misunderstand how to use AI and how AI is priced 2) Most organizations are throwing everything in for analyses and not limiting the answer they want. You need to be specific of about what you analyze and what answers you want 3) People undervalue prompting or templated responses. I will have written. validated and sanity checked a prompt several times and run it across several models before I say its ready for use. But when it is, I know what it will give me and that the scope of its research and answer is as close to what I want as it can be. As little excess as I can. This all saves tokens

galaxyLogic a day ago

It's probabaly a good things that Uber-developers are now forced to do some coding on their own. Only use AI where it absolutely helps

sva_ a day ago

Or be smarter about their usage. $50 on tokens per day can get you a long way.

estomagordo a day ago

Some people also take weekends off.

aerhardt a day ago

I don't think at $1,500 you're not forced to code on your own at all, in the sense of typing code. You're simply forced to not yolo-max twelve parallel agents at all times.

zkmon 12 hours ago

The big question is, will the productivity gains be absorbed by the needs? Societies don't have a need for infinite amount of luxury and laziness offered by the productivity of the machines. At some point, you would shake off things, get up from the couch and start walking again, breathing afresh.

meszmate 10 hours ago

It still probably produces better results than some junior engineers in a lot of cases.

But yeah, for a company at Uber’s scale, I can see why they would want real engineering discipline around it.

sylwk 11 hours ago

Due to recent Copilot price increase my friend was capped to $70 per month of usage. Not on a subscription…

My $100 subscription is not cheap. At the same time our product burns orders of magnitude more tokens.

packspro 16 hours ago

The tool categories that pay for themselves fastest: (1) Anything that gets invoices out faster and makes it easier for clients to pay. (2) Scheduling links that eliminate email back-and-forth. Everything else is optimization. I keep notes on which freelancer tools hit each threshold at freelancerkit.surge.sh

Galanwe 10 hours ago

I think the logical follow up will be for Uber to lay off a bunch of people so that the remaining ones can token maxx.

To the mooooon!

jwpapi a day ago

If you estimate 10k salary per engineer that means the moment it’s cheaper for them to hire another engineer but that doesn’t mean it’s improving productivity 15% but if 15% is the moment it stopped being better than another human we can assume 7.5%?

Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.

This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?

ilia-a a day ago

Seems odd limit, especially since it highly dependant on Token provider used, with Opus this is not much and could easily be burnt in a week or less, but with something like deepseek the 1500 can literarily be an annual budget.

That being said, I do have to wonder why someone as bug as say Uber, simply not rollout OSS model in the cloud for their team, I'd imagine that would be cheapest & most flexible option, while also keeping all the data shared with LLM private.

iceman28 a day ago

It’s not just about the model but also setting up the system to create and share compute (GPUs) which is quite complicated on its own. Ubers primary business focus isn’t infrastructure.

5701652400 a day ago

eventually tokens will cost price of energy. and china is miles ahead.

china will be major token exporter soon. mark my words.

cmiles8 20 hours ago

Electricity actually is only a small part of the data center costs. There are challenges in getting enough electricity that create problems, but the cost of the electricity really isn’t an issue.

dude250711 21 hours ago

Technically, tokens travel both ways.

5701652400 11 hours ago

Technically, on both sides there is an intelligence producing them.

easygenes 18 hours ago

If I were paying API rates this year, I would have already burned through $20k in tokens. Looking forward to the costs of this level of capability coming down.

era-epoch 19 hours ago

Reading the headline

Oh that's actually really economical! I wonder if they're doing a lot on locally running models or managing a shared context or knowledge-base in some clever way, maybe just encouraging employees to be efficient and mindful.

...

> each employee

...

> per AI coding tool

...

> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI

What on this godforsaken earth are all you rich idiots doing???

transitorykris 21 hours ago

Is anyone doing story point estimation in terms of tokens? If you have a token budget, does this change how you prioritize?

sanex 20 hours ago

I think there's too much variance between what model you're using and how much you turn your brain off. If I just paste a ticket number into 4.8xHigh its going to use a lot more tokens than if I read the ticket, tell Sonnet what it needs to do, make my commit, run unit tests myself, etc.

ewangzzz 19 hours ago

I'm curious how much of the usage comes from vibe coding vs using agents/harnesses in internal tooling

hrpnk a day ago

If budgeted at $1,500/month per user, power users still can get 5-10x of that allocation if the user pool is large enough.

gck1 6 hours ago

A lot of talk about cheaper models here. Just curios, is there any non-Anthropic model that can do UI well? GPT-5.5 is laughably bad, and I'm never restarting my Anthropic subscription after their 6-month sprint of gaslighting, even if opus was really good at UI.

walthamstow 20 hours ago

I think a lot of people are missing that this is $1500 _per tool_ which is still rather a lot of money.

spprashant 19 hours ago

Outside of coding what other tools expend that kind of tokens? People are not creating that many slide decks or videos are they?

LeicaLatte 18 hours ago

If china captures the market now, well deserved. Way cheaper compared to us providers.

ChrisArchitect a day ago

Related:

Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing

https://news.ycombinator.com/item?id=48268871

Uber torches 2026 AI budget on Claude Code in four months

https://news.ycombinator.com/item?id=47976415

Corporate America Is Starting to Ration AI as Cost Skyrockets

https://news.ycombinator.com/item?id=48335388

cloudking a day ago

They are also beholden to enterprise pricing and can't use the subsidized consumer max plans.

cadamsdotcom 20 hours ago

Token costs rising because data center build costs must be paid down.. is not the whole picture. It is actually possible for token costs to fall despite the spending frenzy.

Naively you’d expect to always keep paying more - but growth in token usage is what changes the equation. Amortizing debt over an exponentially growing amount of spend across a growing customer base (not per customer) lets the debt be paid off & costs covered even as each individual’s spend stays steady or even goes down - but it only works if there’s growth beyond some threshold that makes the whole thing hang together. No one on the outside knows how much growth that is, and everyone chases maximum growth.

Jevons Paradox ends up being your friend as well as the friend of the inference providers as well as the friend of the inference financiers.

If it’s a strong enough effect, it has potential to cancel out all the circular financing too, and let everyone ride out the bursting of the bubble.

gck1 6 hours ago

ccusage for codex tells me the medium feature I prompted in codex, with a $200 subscription, running for 72 hours and still not delivering full result would have cost ~ $2200 at API rates.

I also misconfigured something in my agent's configuration and a simple web tool request (maybe 4 turns) through OR went to GPT-5.5 accidentally and that cost me ~$0.4.

I have no idea how any business can afford API rates without having a mindset of casually setting money on fire.

KnuthIsGod 20 hours ago

China will bring down the price per million tokens.

edg5000 15 hours ago

Why are people getting these high spending numbers? A 200 USD subscription for either Codex or Claude should give you plenty of usage. What am I missing? Are they just being dumb?

fontain 15 hours ago

The subscriptions are not available to enterprise users. Enterprise users must pay per-token. A $200 subscription gives you roughly the equivalent of $1500 in per-token billing.

morpheos137 4 hours ago

the real interesting way to address the question of token effectiveness would be internal alpha vs beta testing and measuringing marginal revenue generated by similar teams using ai and at different usage levels. right now $1500 a month is not a meaningful signal of anything beyond current executive willingness to spend. in the long run executives will cut spending where it does not support income generation.

nalekberov 19 hours ago

What is the point of allowing a developer to spend $18,000 a year on AI subscriptions? Can't they hire a decent developer who is capable of producing a quality solution faster? Clearly, these decisions are all made by high-level management team.

I was recently talking to an HR person from a European company, and she goes: 'We are forcing our developers to use AI coding agents, but they are still kind of hesitant.' This person had never written a single line of code, nor did she know what software engineering is. For these people, using AI coding agents = faster delivery without breaking anything.

tmp10423288442 18 hours ago

It costs a lot more than $18,000 to hire a decent developer, pretty much anywhere in the world. Also using a model is better than another developer in some ways, because there aren't two independent minds trying to work with each other.

insane_dreamer 21 hours ago

I still have never hit a ceiling with my Claude Max $100 account, much less the Max $200 account. I'm not burning tokens needlessly, nor running it all day, but I do use CC almost daily. What are these devs doing that they are burning more than $1500 in tokens a month?

Maybe it's just me, but I still find that I really have to "shepherd" the AI and work with it to get the results I want. And I read every line of code added and challenge the model's logic. So that limits my token burning. Maybe these people are just "vibe-coding" without really checking the results?

era-epoch 19 hours ago

I would not be surprised if they have engineers vibecoding 2-3 projects each simultaneously, nonstop, on largely un-moderated review-suggest-iterate-test feedback loops.

All the code gets summarized and fed into their manager's agent contexts, probably duplicated several times across levels and departments, with some generated back-and-forth emails pinging around the org chart, eventually generating 2-3 long-winded reports that nobody will read chock full of generated visualizations that can all get consolidated into a generated slide deck that they'll show (maybe, at some point) to a handful of humans with more money than a human brain can conceptualize to demonstrate all of the innovation they're doing.

I am increasingly convinced that many of these companies are dead trees whose only function is to burn money lest it fall into the hands of the peasantry.

HDBaseT 18 hours ago

You are paying account pricing. Uber is paying API pricing.

You're $100/m plan is likely equivalent to thousands of dollars of API pricing. You are being subsidized by the companies using AI.

kshacker 16 hours ago

And this is why as the freeloader (includes me) volume goes up, they add more and more rules to constrain us.

insane_dreamer 15 hours ago

I wasn't aware the Max $100/user plan wasn't available to Enterprise; it used to be IIRC

human305893 15 hours ago

just don't care about the output. Produce more. Don't check the results.

sremani a day ago

I have strong conviction that companies will now choose tech stack/programming languages based on 'tokenomics'. I am vibe coding using Clojure, a language I can read but cannot write and I never hit the usage limits even when using the latest model on Claude. I have similar experience with F#, which is a bit more verbose than clojure but absolutely beats every OOP language, Python, Typescript etc.

The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.

In my not so humble opinion Lisp(Clojure) still remains the language of AI.

genericone 21 hours ago

Typescript is also hugely represented. My projects are TS in a big way, where I have no experience with it at all.

noncoml 19 hours ago

They want to replace employees with AI, then replace paid AI with unpaid AI.

Their wet dream was never automation. It was zero marginal cost labor. And that dream is starting to rot.

ipunchghosts 21 hours ago

Why aren't they using Claude code 20x for 200/month?

hazelnut 21 hours ago

if you have more than x seats, you have to use Enterprise pricing as far as I know which is pay as you go with a pool.

nphardon 20 hours ago

It's wild; at my shop in Silicon Valley they dropped us from unlimited use to 60% prem budget on copilot. People are walking around like zombies.

conartist6 20 hours ago

Poor people! Thinking takes calories

nphardon an hour ago

This is funny, i get it; but the idea that using LLM's precludes thinking is silly. We're doing some heavy lifting over here. There's a lot of noise around pie in the sky ai show n tell projects, but then there's quieter real work being done as well, with highly skilled engineers. 100x is a thing.

cyanydeez a day ago

no....the fact that you could buy a reasonably prices MAC or AMD395+ thats AI tool pricing; it loads a big enough model and spits out tokens just fast enough that you can read what it's doing and comprehend it instead of magic.

That's the most useful signal. Pre OpenAI mafia RAM pricing, that comes out to $250/month.

jedisct1 a day ago

A lot of things can be done with local models.

rimliu a day ago

Even more things can be done without any models just as well.

dude250711 a day ago

Single developers seeking local models.

fHr 21 hours ago