Claude Code users hitting usage limits 'way faster than expected' (theregister.com)

198 points by samizdis 7 hours ago

garrickvanburen 3 minutes ago

Considering: - Anthropic decides how much a token is worth. - Users have no visibility or ability to control in how many tokens a given response will burn.

This is the only expected answer. https://forstarters.substack.com/p/for-starters-59-on-credit...

midnightdiesel 32 minutes ago

It seems like Anthropic is constantly changing the rules and pulling out rugs, and always entirely by surprise. I’m not sure if they’re incompetent or just careless, but I stopped paying them because of this a while ago, and my days are much more interesting and enjoyable using my own brain instead.

carefree-bob 31 minutes ago

As long as they keep losing money and are reliant on investments to pay their operating expenses, they are going to be thrashing about in search of a sustainable business model and I don't blame them.

pxtail 5 hours ago

Recently after noticing how quickly limits are consumed and reading others complaints about same issue on reddit I was wondering how much about this is real error or bug hidden somewhere and how much it's about testing what threshold of constraining limits will be tolerated without cancelling accounts. Eventually, in case of "shit hits the fan" situation it can be always dismissed by waving hands and apologizing (or not) about some abstract "bug".

The lack of transparency and accountability behind all of this is incredible in my perception.

vintagedave 4 hours ago

I've run into this, and I highly doubt I am one of the more extraordinary users. I have delays between working with it, don't have many running at once, am running on smaller codebases, etc. Yet just a few minutes ago I hit a quota. In the past I did far more work with it without running into the quota.

I emailed their support a few days ago with details, concerns, a link to the twitter thread from one of their employees, and a concrete support request, which had an AI agent ('Fin') tell me:

> While our Support team is unable to manually reset or work around usage limits, you can learn about best practices here. If you’ve hit a message limit, you’ll need to wait until the reset time, or you can consider purchasing an upgraded plan (if applicable).

I replied saying that was not an appropriate answer.

You're absolutely right re the lack of transparency and accountability. On one hand, Anthropic generates good will by appearing to have a more ethical stance then OpenAI, and a better product. On the other hand, they kill it fast through extremely poor treatment of their customers.

If they have a bug, they need to resolve it: and in the meantime refund quotas. 'Unable to' - that's shocking. This is simple and reasonable. It's basic customer service. I don't know if they realise the damage their attitude is doing.

Kim_Bruning 4 hours ago

Fin is the most useless thing ever. There's no obvious way to get reports in front of a human in a timely manner, and there's no clue to believe fin interactions are retained.

This does mean ultimately no loyalty. I can't stay loyal to a brand that doesn't actually respond to inquiries, bug reports or down reports at all.

I do understand that Anthropic is operating at a tremendous scale and can't have enough humans in the loop. This sounds like a good use for ai classification and triage, really!

traceroute66 5 minutes ago

joshuak 33 minutes ago

It is also interesting to observe that your most valuable accounts in this kind of pricing model are the ones that are least used and therefore are not confronted by the limits. Heavy users canceling their accounts in frustration is a win for Anthropic not a punishment, at least a short term.

JambalayaJimbo 5 hours ago

Once you get used to using claude as an abstraction layer you start getting pretty reckless with it.

My organization has the concept of "premium models" where our limits reset every month. I hit my limit pretty quickly last month because I was burning tokens doing things that would have been a simple bash loop in the past - all because I was used to interfacing with Claude at the chat layer for all my automation needs and not thinking any more about it.

devmor 4 hours ago

This is a real danger that I think a lot of people will run into as prices go up more and more in the future.

Completely outside of the productivity debate, offloading cognitive tasks to LLMs leaves you less practiced in them and less ready to do them when the LLM isn't available. When you have to delegate only certain tasks to the LLM for financial reasons, you may find yourself very frustrated.

foxyv 2 hours ago

I suspect that Claude had a bug that undercounted tokens and they fixed it.

mmmlinux an hour ago

I wonder if that was why they were offering the bonus off hours limits. Ease people in to the transition.

joshuafuller 5 hours ago

This feels a lot like the same playbook we’re seeing with dynamic pricing in retail, just applied to compute instead of products. You never really know what you’re getting, and the rules shift under you.

What makes it worse is the lack of transparency. If there were clear, hard limits, people could plan around it. Instead it’s this moving target that makes it impossible to trust for real work.

At some point it stops feeling like a bug and starts feeling like a pricing experiment on users.

bayarearefugee 5 hours ago

The clear trend over the past decade or so has been using analytics and data gathering to extract maximum rents from every customer in every industry and AI is going to massively accelerate this.

The only way out is government regulation which means we are screwed in the US (our government is too far gone to represent average citizen interests in any meaningful way) but Europeans maybe have a chance if they get it together and demand change.

tartoran 5 hours ago

What a horrid glimpse in the future. I hope we won't get there and we all collectively fight back with our wallets.

Tade0 5 hours ago

tjoff 2 hours ago

Working as intended? They openly state that how quickly your limit is reached depends on many factors (that you don't know) as well as current load on their systems.

Could just be that usage has gone up.

thisisit 4 hours ago

They keep running experiments like free $50 in extra use credits or 2x usage outside certain windows where inference is very slow. You can’t help but think this is all a slowly boiling the frog experiment. Experimenting how much they can charge.

nicce 5 hours ago

Are they going to pay back if subscription was payed but token limit was less than advertised? Is there some tiny text somewhere preventing just suing or pulling money back with credit cards?

jadar 5 hours ago

Part of the issue is that they don't actually advertise what the token limit is. Just some vague, "this is 5x more than free, and 5x more than pro". They seem to be free to change the basis however they please, because most of us are more than happy to use what they give us at the discounted subscription pricing.

dinakernel 5 hours ago

This turned out to be a bug. https://x.com/om_patel5/status/2038754906715066444?s=20

One reddit user reverse engineered the binary and found that it was a cache invalidation issue.

They are doing some hidden string replacement if the claude code conversation talks about billing or tokens. Looks like that invalidates the cache at that point.

If that string appears anywhere in the conversation history, I think the starting text is replaced, your entire cache rebuilds from scratch.

So, nothing devious, just a bug.

ibejoeb 4 hours ago

> BUG 2: every time you use --resume, your entire conversation cache rebuilds from scratch. one resume on a large conversation costs $0.15 that should cost near zero.

I use it with an api key, so I can use /cost. When I did a resume, it showed the cost from what I thought was first go. I don't think it's clear what the difference is between api key and subscription, but am I believe that simply resuming cost me $5? The UI really make it look like that was the original $5.

replwoacause 5 hours ago

Nothing devious, but is Anthropic crediting users? In a sense, this is _like_ stealing from your customer, if they paid for something they never got.

arvid-lind 4 hours ago

Not seeing any quota returned on my Pro account. My weekly usage went up to 20% in about one hour yesterday before I panicked and stopped the task. It was outside of the prime hours too which are supposed to run up your quota at a slower rate.

novaleaf 4 hours ago

your linked bug is a cherry pick of the worst case scenario for the first request after a resume.

While it should be fixed, this isn't the same usage issue everyone is complaining about.

kif 5 hours ago

Anecdotally when Claude was error 500'ing a few days ago, its retries would never succeed, but cancelling and retrying manually worked most of the time.

mook 5 hours ago

That is a summary and a picture of https://old.reddit.com/r/ClaudeAI/comments/1s7mkn3/psa_claud... it looks like?

TazeTSchnitzel 5 hours ago

That bug would only affect a conversation where that magic string is mentioned, which shouldn't be common.

dinakernel 4 hours ago

I guess so - but for people working on billing section of a project or even if they include things like - add billing capability etc in Claude MD - it might be an issue, I think

sibtain1997 9 minutes ago

Faced this too. Tried https://github.com/rtk-ai/rtk to compress cli output but some commands started failing and the savings were minimal. Ended up just being more deliberate about context size instead of adding more tooling on top

p2hari 6 hours ago

I cancelled my pro plan last month. I was using Claude as my daily driver. In fact had the API plan also and topped it with $20 more. So it was around $40 each month. Starting from December last year it has been like this. When sessions could last a couple of hours with some deep boilerplate and db queries etc. to architecture discussion and tool selection. Slowly the last two months it just gets over. One prompt and few discussions as to why this and not that and it is done.

ramon156 5 hours ago

After they force OpenCode to remove their Claude integration, and the insane token hogging, I also cancelled my subscription.

aliljet 5 hours ago

There's a weird 'token anxiety' you get on these platforms. And you basically don't know how much of this 'limit' you may consume at any time. And you actually don't even know what the 'limit' is or how it's calculated. So far, people have just assumed Anthropic will do the kind thing and give you more than you could ever use...

sumtechguy 5 hours ago

This reminds me of the early days of cell phones. Limits everywhere and you paid for it by the kilobyte. Think at one point I was paying 45c per text message. I hope this gets better and we do not need gigawatt datacenters to do this stuff.

jauntywundrkind 3 hours ago

Yeah, I've been juggling some patches to opencode to help me see where my codex usage limits are at. As of a month ago, that information was not visible on the ChatGPT web UI.

You just work till suddenly the AI dumps you out, and sit there wondering how many hours or days you have to wait. It's incredible that this experience is at all ok, is accepted

robviren 6 hours ago

I find Claude code to be a token hog. No matter how confidently the papers say context rot is not an issue I find curating context to be highly important to output quality. Manually managing this in the Claude Webui has helped with my use cases more than freely tossing Claude code at it. Likely I am using both "wrong" but the way I use it is easier for me to reason about and minimize context rot.

elephanlemon 6 hours ago

Yesterday (pro plan) I ran one small conversation in which Claude did one set of three web searches, a very small conversation with no web search, and I added a single prompt to an existing long conversation. I was shocked to see after the last prompt that I had somehow hit my limit until 5:00pm. This account is not connected to an IDE or Code, super confusing.

master_crab 6 hours ago

Tool calls (particularly fetching for context) eats the context window heavily. I explicitly send MCP calls to sub agents because they are so “wordy”.

bensyverson 6 hours ago

Everyone who has not hit this bug thinks it’s user error… It’s not. It happened to me a few days ago, and the speed at which I tore through my 5 hour usage cap was easily 10x faster than normal.

Also: sub agents do not get you free usage. They just protect your main context window.

dmd 4 hours ago

piva00 6 hours ago

master_crab 6 hours ago

0xbadcafebee 4 hours ago

I've found a lot of people are almost belligerently pro-Claude. They refuse to consider other providers or agents, and won't consider using any model than the latest Opus. The most common reasons I hear are 1) they don't want to use anything other than the greatest model, afraid that anything else would waste their time, 2) they believe their experience is that it's far better than anything else.

Even if you show them benchmarks that show another model equally as good if not better, they refuse to use it. My suspicion is they've convinced themselves that Opus must be the best, because of reputation and price. They might've used a different model and didn't have a good experience, making them double down.

I hope a research institution will perform an experiment. My hypothesis is that if you swapped out a couple similar state-of-the-art models, even changing the "class" of model (Sonnet <-> Opus, GPT 5.4 <-> Sonnet), the user won't be able to tell which is which. This would show that the experience is subjective, and that bias is informing their decision, rather than rationality.

It's like wine tasting experiments. People rate a $100 bottle of wine higher than a $10 bottle. But if they actually taste the same, you should be buying the $10 bottle. But people don't, because they believe the $100 bottle is better. In the AI case, the problem is people won't stop buying the expensive bottle, because they've convinced themselves they must use the more expensive bottle.

danny_codes 3 hours ago

This has largely been my experience. Can’t tell the difference between Claude and kimi

1970-01-01 5 hours ago

This has been verified as a bug. Naturally, people should see some refunds or discounts, but I expect there won't be anything for you unless you make a stink.

https://old.reddit.com/r/ClaudeCode/comments/1s7zg7h/investi...

Kim_Bruning 3 hours ago

How do you even make a stink? I haven't found an easy way to find a human.

kneel 6 hours ago

I asked it to complete ONE task:

You've hit your limit · resets 2am (America/Los_Angeles)

I waited until the next day to ask it to do it again, and then:

You've hit your limit · resets 1pm (America/Los_Angeles)

At which point I just gave up

dewey 5 hours ago

If this is reasonable or not is pretty hard to judge without any info on that "ONE" task.

kaoD 5 hours ago

I only asked Claude to rewrite Linux in Rust.

kombine 5 hours ago

edbern an hour ago

Yesterday asked claude to write up a simple plan adding very basic features to a project I'm working on and it took 20% of 5-hour pro plan limit. Then somehow Codex seems to be infinite. Is OpenAI just burning through way more cash or are they more efficient?

jditu 26 minutes ago

Still on 2.1.87, exclusively Opus for coding — haven't hit this yet. Wondering if the bug is personal vs team plan specific?

I'm sure it's more complex, but why not improve internal implicit caching and pass the savings on? Presumably Anthropic already benefits from caching repeated prompt prefixes internally — just do that better, extend the TTL window, and let users benefit. Explicit caching stays for production use cases with semi-static prompts where you want control.

The current 5-min default TTL + 2x penalty for 1-hour cache feels punitive for an interactive coding tool.

ZeroCool2u 6 hours ago

I'm finishing my annual paid Pro Gemini plan, so I'm on the free plan for Claude and I asked one (1) single question, which admittedly was about a research plan, using the Sonnet 4.6 Extended thinking model and instantly hit my limit until 2 PM (it was around 8 or 9 AM).

Just a shockingly constrained service tier right now.

notyourwork 5 hours ago

Free is free. Want more, fork over money.

Forgeties79 5 hours ago

They are saying even for free it is very constrained. This isn’t productive.

ZeroCool2u 5 hours ago

jlharter 5 hours ago

I mean, even the paid tier where you fork over money is constrained, too!

pagecalm an hour ago

Hit this myself recently, along with a bunch of overloaded errors. I think it's growing pains for where we are with AI right now.

As the tooling matures I think we'll see better support for mixing models — local and cloud, picking the right one for the task. Run the cheap stuff locally, use the expensive cloud models only when you actually need them. That would go a long way toward managing costs.

There's also the dependency risk people aren't talking about enough. These providers can change pricing whenever they want. A tool you've built your entire workflow around can become inaccessible overnight just because the economics shifted. It's the vendor lock-in problem all over again but with less predictability.

reenorap 5 hours ago

The only way AI will be profitable to companies like Anthropic or OpenAI is to make the cost $1000-2000/month or more for coding. Every programmer will be forced to pay for it because it's only a fraction of their salary (in the US anyway) and it's the only way the programmer will be competitive. Whether the company pays for it, or they pay for it themselves, it will need to be paid.

There's no other way that these companies can compete against the likes of Google, and Facebook unless they sell themselves to these companies. With AWS and GCP spending hundreds of billions of dollars per year, there's no way that Anthropic or OpenAI can continue competing unless they make an absurd amount of money and throw that at resources like their own datacenters, etc and they can't do that at $20/month.

danny_codes 3 hours ago

Even worse, the open weight models are practically indistinguishable from the closed ones. I just don’t see why you’d pay full price to run Claude when you can pay 10x less to run Kimi. There are already loads of inference providers and client layers.

Without heavy collusion or outright legislative fiat (banning open models) I don’t see how Anthropic/OpenAI justify their (alleged) market caps

techgnosis 2 hours ago

* Hardware will manage models more efficiently

* Models will manage tokens more efficiently

* Agents will manage models more efficiently

* Users will manage agents more efficiently

Why are we acting like technology is on pause?

paulbjensen 40 minutes ago

I have found that:

- If I ask Claude to go and build a product idea out for me from scratch, it can get quite far, but then I will hit quota limits on the pro plan ($20pm).

- I have not drunk the Kool-aid and tried to indulge in ClaudeMaxxing (Max plan at $200pm). I need to sleep and touch grass from time to time.

- I don't bother with a Claude.md in my projects. I just raw-dog context.

- If I have a big codebase, and I'm very clear about what code changes I want to make Claude do, I can easily get a lot of changes made without getting near my quota. It's like Mr Miyagi making precision edits to that Bonsai Tree in Karate Kid.

My last bit of advice - use the tool, but don't let the tool use you.

canada_dry 4 hours ago

I hit my limit on the project I've been working on (after I let "MAX" run out and moved to "PRO") after about only 2 hours!

TIP (YMMV): I've found that moving the current code base into a new 'project' after a dozen or so turns helps as I suspect the regurgitation of the old conversations chews up tokens.

canada_dry 4 hours ago

An aside: https://www.buchodi.com/chatgpt-wont-let-you-type-until-clou...

It seems that anthropic has added something similar to their browser UI because just in the last few days chat has become almost unusable in firefox. %@$#%

nitekode 3 hours ago

This could also be because of the recently introduced 1 million token buffer. I also saw my tokens drain away quickly; then in noticed I was pushing 750k tokens through for every prompt :) Sometimes its hard to get into the habit of clearing

delphic-frog 5 hours ago

The token usage differs day to day - that's the most frustrating part. You can't effectively plan a development session if you aren't sure how far you'll likely get into a feature.

stavros 6 hours ago

Anthropic went about this in a really dishonest way. They had increased demand, fine, but their response was to ban third-party clients (clients they were fine with before), and to semi-quietly reduce limits while keeping the price the same.

Unilaterally changing the deal to give customers less for the same price should not be legal, but companies have slowly boiled the frog in such a way that now we just go "welp, it's corporations, what can you do", and forget that we actually used to have some semblance of justice in the olden days.

anon7000 3 hours ago

I think I ran into this yesterday, with Claude Code taking FOREVER on a lot of tasks. But using Claude within Cursor seems way faster

giancarlostoro 6 hours ago

I'm guessing their newer models are taking way more compute than they can afford to give away. The biggest challenge of AI will eventually be, how to bring down how much compute a powerful model takes. I hope Claude puts more emphasis into making Haiku and Sonnet better, when I use them via JetBrains AI it feels like only Opus is good enough, for whatever odd reason.

medwards666 5 hours ago

I get the same. Work has shifted to being agentic first - and whenever I use anything other than Claude Opus it seems that the model easily gets lost spinning its wheels on even the simplest query - especially with some of our more complex codebases, whereas Opus manages to not only reason adequately about the codebase, but also can produce decent quality code/tests in fairly short order.

Oddly though, when using at home I'm using Sonnet via the standard chat interface and that, whilst it will produce substandard code in its output is still reasonably capable - even in more niche tasks. Granted though that my personal projects are far simpler than the codebase I handle at work.

giancarlostoro 5 hours ago

Funny, I use Opus at home, but I have a Max plan, and I only use it during their non-peak hours. I can't bring myself to downgrade to Haiku or Sonnet.

lukewarm707 6 hours ago

please tell me if i'm crazy.

i just refuse to use openai/google/anthropic subscriptions, i only use open source models with ZDR tokens.

- i like privacy in my work, and i share when i wish. somehow we accepted that our prompts and work may be read and moderated by employees. would you accept people moderating what you write in excel, google docs, apple pages?

- i want a consistent tool, not something that is quantised one day, slow one day, a different harness one day, stops randomly.

- unless i am missing something, the closed source models are too slow for me to watch what they are doing. i feel comfortable with monitoring something, usually at about 200-300tps on GLM 5. above that it might even be too fast!

muskstinks 5 hours ago

Its a question of price, quality and other factors.

If my company pays for it, i do not care.

If i have a hobby project were it is about converting an idea in my spare time in what i want, i'm happily paying 20$. I just did something like this on the weekend over a few hours. I really enjoy having small tools based on single html page with javascript and json as a data store (i ask it to also add an import/export feature so i can literaly edit it in the app and then save it and commit it).

For the main agent i'm waiting for like the one which will read my emails and will have access tos ystems? I would love a local setup but just buying some hardware today costs still a grant and a lot of energy. Its still sign cheaper to just use a subscription.

Not sure what you mean though regarding speed, they are super fast. I do not have a setup at home which can run 200-300 tps.

lukewarm707 5 hours ago

i don't use local models, i just use the APIs of cloud providers (eg fireworks, together, friendli, novita, even cerebras or groq).

you can get subscriptions to use the APIs, from synthetic, or ollama, fireworks.

muskstinks 5 hours ago

susupro1 6 hours ago

You are not crazy, you are just waking up from the SaaS delusion. We somehow allowed the industry to convince us that paying $20/month to rent volatile compute, have our proprietary workflows surveilled, and get throttled mid-thought is an 'upgrade'. The pendulum is swinging violently back to local-native tools. Deterministic, privately owned, unmetered—buying your execution layer instead of renting it is the only way to build actual leverage.

muskstinks 5 hours ago

I'm quite aware of my dependency and i'm balancing this in and out regularly over the last 10 years.

Owning is expensive. Not owning is also expensive.

Energy in germany is at 35 cent/kwh and skyrocketed to 60 when we had the russian problem.

I'm planning to buy a farm and add cheap energy but this investment will still take a little bit of time. Until then, space is sparse.

lukewarm707 5 hours ago

i don't use local llms. it's mostly the closed source subscriptions that are not private, it really is a choice.

there are many cloud providers of zero data retention llm APIs, and even cryptographic attestation.

they are not throttled, you can get an agreed rate limit.

l72 38 minutes ago

staticassertion 5 hours ago

No one was convinced to spend money to do the things you're saying. That's just disingenuous. People rent models because (a) it moves compute elsewhere (b) they provide higher quality models.

nprateem 5 hours ago

NoMoreNicksLeft 5 hours ago

If I could buy this to run it locally, what's that hardware even look like? What model would I even run on the hardware? What framework would I need to have it do the things Claude Code can do?

zackify 4 hours ago

After using it all week on pro plan it worked fine for me. Hit limits a couple times.

But if I was doing deep coding on pro plan it would have sucked.

You can't expect to use massive context windows for $20

GrinningFool 4 hours ago

I'm burning through pretty fast with context sizes of only 32-64kb. I regularly clear when I change topics.

A simple "how do I do x" question used 2% of my budget.

I paid extra and chewed through $5 in a few minutes of analyzing segments of log files.

At this rate it's not worth the trouble of carefully managing usage to avoid ambiguous limits that disrupt my work.

If that's the way it is in order for them to make money, that's fine - but I need a usable tool that I don't have to micromanage. This product is not worth it ($, time) to me at this rate.

I hope it changes because when it works it's a great addition to my tools.

ryan42 5 hours ago

claude automatically enabled "extra usage" on my pro account for me (I had it disabled) and the total got to $49 extra before I noticed. I sent an email asking wtf but I don't expect much.

Asmod4n 6 hours ago

When asking it to write a http library which can decode/parse/encode all three versions of it the usage limit of the day gets hit with one sentence. In the pro plan. Even when you hand it a library which does hpack/huffmann.

aperture_hq 4 hours ago

There is no transparent metrics on the token usage count, they just compare their plans with their plans.

sudo_and_pray 5 hours ago

I gave claude code a try at home ($20 sub), since we use it at work without any limits and I wanted to see how I can use it on some of my projects.

It was a big disappointment and it just burned through tokens so fast that I hit first limit after 30 minutes while it was gathering info on my project and doing websearches.

My experience was that when I wanted to use it, maybe 2-3 days per week, Pro sub was not enough. On some days I did not use it at all. The daily or weekly token limit was really restrictive.

arvid-lind 4 hours ago

well, they just had a promo with two weeks of double quota for everyone 18 hours of the day, even free users. of course it feels like we're getting rugpulled.

nprateem 6 hours ago

I literally ran out of tokens on the antigravity top plan after 4 new questions the other day (opus). Total scam. Not impressed.

spongebobstoes 5 hours ago

try codex, it's really good and doesn't have the same limits issues

jdefr89 6 hours ago

Over reliance on LLMs is going to become such a disaster in a way no one would have thought possible. Not sure exactly what, who, when, or where.. Just that having your entire product or repo dependent on a single entity is going to lead to some bad times…

jorvi 5 hours ago

For a second I hoped you were gonna comment on how LLMs are going to rot out our skillset and our brains. Like some people already complaining they "have to think" when ChatGPT or Claude or Grok is down.

Oh well.

Retr0id 5 hours ago

The other day I was doing some programming without an LSP, and I felt lost without it. I was very familiar with the APIs I was using, but I couldn't remember the method names off the top of my head, so I had to reference docs extensively. I am reliant on LSP-powered tab completions to be productive, and my "memorizing API methods" skill has atrophied. But I'm not worried about this having some kind of impact on my brain health because not having to memorize API methods leaves more room for other things.

It's possible some people offload too much to LLMs but personally, my brain is still doing a lot of work even when I'm "vibecoding".

akdev1l 5 hours ago

ahsillyme 5 hours ago

I read that as implied.

toss1 5 hours ago

Unsurprising people complain.

"Thinking is the hardest work there is, which is why so few people do it" — attrib Henry Ford

Now we have tools that can appear to automate your thinking for you. (They don't really think, but they do appear to, so...)

jakobloekke 5 hours ago

bitwize 5 hours ago

AI will totally rot our brains, just like television, video games, and the internet all did before.

windward 5 hours ago

xnx 6 hours ago

> on a single entity

Contrary to the popular opinion here, there are other services beyond Claude Code. These usage limits might even prompt (har har) people to notice that Gemini is cheaper and often better.

bigbinary 5 hours ago

On-premise LLMs are also getting better and likely won’t stop; as costs go up with the technical improvements, I would imagine cost saving methods to also improve

horsawlarway 5 hours ago

kakugawa 4 hours ago

gemini-cli has not been useable for weeks. The API endpoint it uses for subscription users is so heavily rate-limited that the CLI is non-functional. There are many reports of this issue on Github. [1]

1/ https://github.com/google-gemini/gemini-cli/issues?q=is%3Ais...

tasuki 4 hours ago

solarkraft 3 hours ago

Gemini better? What are y’all doing that it doesn’t crash and burn within the first minute of using it?

It might be acceptable for some general tasks, but I haven’t EVER seen it perform well on non trivial programming tasks.

earlyriser 5 hours ago

ikidd 5 hours ago

Last time I used Gemini I watched it burn tokens at three times the rate of any other models arguing with itself and it rarely produced a result. This was around Christmas or shortly after.

Has that BS stopped?

DefineOutside 4 hours ago

kaycey2022 4 hours ago

dewey 5 hours ago

There's so many different models, from hosted to local and there's almost no switching cost as most of them are even api compatible or supported by one of the gateways (Bifrost, LiteLLM,...).

There's many things to worry about but which LLM provider you choose doesn't really lock you in right now.

wutwutwat 5 hours ago

So, like, GitHub then?

gonzalohm 5 hours ago

Or Cloudfare or AWS

classified 2 hours ago

It should be abundantly clear that depending on a single entity will screw you royally, but obviously we don't learn from the mistakes of others. We are condemned to repeat history because we don't know it.

adolph 5 hours ago

I don't get this pov, maybe b/c I'm not a heavy Claude Code user, just a dabbler. Any LLM tool that can selectively use part of a code base as part of the input prompt will be useful as an augmentation tool.

Note the word "any." Like cloud services there will be unique aspects of a tool, but just like cloud svc there is a shared basic value proposition allows for migration from one to another and competition among them. If Gemini or OpenAI or Ollama running locally becomes a better choice, I'll switch without a care.

Subscription sprawl is likely the more pressing issue (just remembered I should stop my GH CoPilot subscription since switching to Claude).

dude250711 5 hours ago

How can automatic slop-prevention be a disaster? It's a feature.

nickphx 5 hours ago

if you rely on the black box of bullshit... you deserve your own fate.

firebot 5 hours ago

The first hit is free.

shafyy 6 hours ago

What is the best way to get start with open weight models? And are they a good alternative to Claude Code?

lukewarm707 5 hours ago

i would recommend getting an API account on fireworks, this is ZDR and typically the fastest provider.

otherwise check the list of providers on openrouter and you can see the pricing, quantisation, sign up directly rather than via a router. ensure to get caching prices, do not get input/output API prices.

GLM 5 is a frontier model, Kimi 2.5 is similar with vision support, Minimax M2.7 is a very capable model focused on tool calling.

If you need server side web search, you could use the Z AI API directly, again ZDR; or Friendli AI; or just install a search mcp.

For the harness opencode is the normal one, it has subagents and parallel tool calling; or just use claude code by pointing it at the anthropic APIs of various providers like fireworks.

MarsIronPI 5 hours ago

If you want to still use APIs, I like OpenRouter because I can use the same credits across various models, so I'm not stuck with a single family of models. (Actually, you can even use the proprietary models on OpenRouter, but they're eye-wateringly expensive.)

Otherwise you should look into running e.g. Qwen3.5-35B-A3B or Qwen3.5-27B on your own computer. They're not Opus-level but from what I've heard they're capable for smaller tasks. llama.cpp works well for inference; it works well on both CPU and GPUs and even split across both if you want.

wolvoleo 6 hours ago

Just install ollama.

And no, they're not as capable as SOTA models. Not by far.

However they can help reduce your token expenditure a lot by routing them the low-hanging fruit. Summaries, translations, stuff like that.

ramon156 5 hours ago

no need for ollama, simonw's llm tool is good enough

scottcha 5 hours ago

We offer multiple SOA models at https://portal.neuralwatt.com at very generous pricing since we have options to bill per kWh instead of per token. Recipes for your favorite tools here: https://github.com/neuralwatt/neuralwatt-tools

raincole 5 hours ago

Opus 4.6 price:

Input $5 / M tokens Output $25 / M tokens

GPT Codex 5.3:

Input $1.75 / M tokens Output $14 / M tokens

> Claude Code users hitting usage limits 'way faster than expected'

No shit, Sherlock.