Anthropic surpasses OpenAI to become most valuable AI startup (qazinform.com)
381 points by Bolat14 8 hours ago
amazingamazing 7 hours ago
I never want to hear from developers again that they are not susceptible to marketing. I see meet ups specifically about Claude often.
Modern tupperware party.
A colleague was convinced Claude is better so we played a game. We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.
Couldn’t tell.
Edit: i bet 99% of people here, if presented with a test where i gave 5 models but all of the results came from one, would not be able to discern this. Just vibes all the way down.
Aurornis 6 hours ago
> We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.
> Couldn’t tell.
Why would you expect them to be able to recognize the signature of a model from a pair of PRs? I don’t understand why you think this is a useful test for anything when we have numerous benchmarks that run 100s of tests on models and both GPT-5.5 and Opus-4.8 perform similarly.
I have subscriptions to both. I run both on max reasoning. It is interesting to see the relative strengths and weaknesses of each model. You won’t always see it if you’re just scanning code. Some times one will spin for a long time on certain problems where the other has no problem finding the appropriate parts of the codebase and getting an efficient solution.
antirez made a comment that he and others found GPT-5.5 to be better at the optimization tasks he was working on than Opus. There are other classes of tasks where GPT-5.5 consistently stumbles where Opus will get a solution quicker. Lately I’ve been working on some code where neither model comes up with a good solution. That’s just how LLMs go.
The only reason you have seen more activity about Claude is that they got there first. Codex has been a step behind and GPT couldn’t match Opus at first. You’re testing them after they’ve closed the gap.
vunderba 5 hours ago
Yup, OP is conflating so many things that the comparison has all the scientific rigor of the Pepsi Challenge.
For a developer using an LLM on a daily basis, the experience is about much more than just the resultant code.
There’s everything from:
- how often you had to manually steer the model
- how frequently you needed to course-correct
- how much detail you had to provide up front
- how was the interaction process (sycophantic, etc)
- how well did it handle MCP and external tooling?
- how effectively could it pull in additional information from external sources such as the web?
- how fast did it produce code?
- how much did it cost?
Many of my friends who are devs use things like OpenCode CLI with Openrouter because they switch between the various SOTA models so often. Just because you saw a Claude "meetup" doesn't prove anything other than somebody chose the name because it resonated more than "Generic LLM Meetup".
thefounder 4 hours ago
Wowfunhappy 5 hours ago
Kind of orthogonal to the discussion, but could you broadly describe the code you're working on that both models are bad at? One thing I'm still struggling with is figuring out what types of code LLMs can vs cannot write.
addaon 4 hours ago
Aurornis 3 hours ago
amalcon 3 hours ago
ryandrake 5 hours ago
I think the subscription pricing model kind of incentivizes developers (at least hobby developers) to pick one and go all in on it. For someone who has probably never paid $20/mo for a piece of software in their life, $20/mo is kind of a big commitment, and the pay-per-token schemes are reportedly much more expensive for the equivalent blob of coding they enable. So you "pick one," plonk down the $20, and use it as much as you can in the month so it's worth it. If you want to try the other one, you don't renew next month, and plonk down another $20 for the other one.
You can go back and forth and compare since you pay for both subscriptions, but is that a usual case? I'd guess most developers picked one in 2025 and haven't gone back. Just like most people just pick a bank for their checking account and never change it.
Aurornis 3 hours ago
kaydub 4 hours ago
I just don't believe non-deterministic tools can actually be benchmarked. It's all hoopla to me.
I flip between models all the time. Makes little difference. Sometimes one model is faster or better than another but there's no rhyme or reason why.
mpyne 4 hours ago
drawnwren 4 hours ago
riedel 4 hours ago
Actually it would be fun to try to test the developer personality of the models.
Actually there is a nice body of work by Steven Clarke on cognitive dimensions of notations/APIs and the interaction with developer personalities.
I wonder if the same holds for AI models and harnesses.
amazingamazing 6 hours ago
I am not sure why the past matters here. I am talking about now, it is a fast moving space.
As for the test, of course the output matters. Take image models for example. Differences are clear as day.
Should the fact that OpenAI existed before Anthropic did at all matter? No, imo. I would have used opus 4.8, but it only just came out- fast moving space
Maxatar 5 hours ago
jnovek 6 hours ago
epistasis 5 hours ago
oreally 5 hours ago
osigurdson 6 hours ago
Exactly. Popular opinion is behind reality by several months. Claude used to be significantly better, now it is basically the same.
bluebands 6 hours ago
fmbb 4 hours ago
> Some times one will spin for a long time on certain problems where the other has no problem finding the appropriate parts of the codebase and getting an efficient solution.
Surely this is just to the random nature of these stochastic parrots?
Do you mean you have identified a class of problems Claude always stalls on and another class of problems Codex always stalls on? What identifies these different classes of problems you see? How would you say Claude is stronger than Codex and vice versa? Why?
epistasis 7 hours ago
Calling this a "tupper ware" seems a bit emotional, you're intentionally disregarding many things that matter for devs in order to try to claim equivalence, rather than paying attention to the actual process of software creation.
For example in your "test" you're only looking at output and ignoring the entire process of creation.
In addition to that process, you're ignoring that Claude Code was first and better for a long time, why would people switch for something that produces the same output? Claude Code has been way ahead in the process of agentic software creation for a long time, I still prefer its features. Even though I think that Opus 4.7 was a big step backwards, and I've been getting worse results seemingly every day with the churn of features at Claude Code, some of that may also be me testing the bounds of how little I can specify and still get acceptable results, so it's hard to know.
Calling all these concrete realities "marketing" is itself you trying to market Codex as "good enough" instead of paying attention to how we got where we are and where we will go in the future.
mold_aid 5 hours ago
>Calling this a "tupper ware" seems a bit emotional
Calling this "emotional" seems a little weird
bluebands 6 hours ago
Claude Code was first by a few weeks and only better for those few weeks! Have you used Codex in 2025?
amazingamazing 6 hours ago
Tupperware party is a particular thing about the social framework around promoting corporate goods.
epistasis 6 hours ago
412876 7 hours ago
No, Tupperware is the exact analogy. As you point out though, the multi level marketing applies to all models. Anthropic is just the most aggressive, especially here.
Software developers are the most susceptible of all population groups for amplifying their employers' new whims. There are true believers and useful idiots, but many are just mediocre and know that playing along will further their career for a couple of years.
In the end they will be fired anyway of course.
afavour 6 hours ago
You're overestimating the extent to which individual developers have a choice here. My employer signed up for a Claude Code membership, I use Claude Code. I cannot use Codex.
Anecdotally I hear of folks with workplace Claude Code subscriptions all the time. I'm not sure I've ever heard someone talk about their workplace Codex subscription. Anthropic clearly did a far better job chasing corporate customers while OpenAI was busy chasing consumers with Sora etc.
Aurornis 6 hours ago
The OP seems unaware that Claude had a lead in this space and captured market share and attention for that reason alone.
The test they (supposedly) ran with their coworkers to look at PRs from both is such a bad way to compare LLMs that I don’t think they’re very experienced with using them.
bluebands 6 hours ago
ignoramous 6 hours ago
irthomasthomas 6 hours ago
Corporate accounts pay the full api price, so I don't know what is stopping them or you from also using codex on the same terms?
afavour 6 hours ago
mlsu 6 hours ago
I think the marketing campaign came first. Anthropic captured developer mindshare first, then they brought it to their companies.
epistasis 6 hours ago
theptip 5 hours ago
dilyevsky 4 hours ago
dboreham 5 hours ago
I have lots of choice (I own the company) but I'm still not going to switch from Claude until I see evidence that the alternative is meaningfully better. So far I don't see that evidence. In the past I've looked at using competitive products and it turned out to be a painful experience (Cursor didn't work at all on my computer, Google thing -- whatever it was at that time -- required dependencies I wasn't willing to install). I'm sure these issues have been resolved since but why would I spent time kicking the tires of another product just to have it work "as well"? Claude's cost to me is minimal so there's no cost savings to be made.
fwiw nobody "marketed to me". I picked Claude because friends were using it with great success and they helped me get started with suggestions on prompt style. Before that I'd played around with various LLMs for coding but not done any actual production work.
jnovek 7 hours ago
I can’t tell the difference between code written in vim or vs code but it matters substantially to the person writing the code. There’s stuff beyond just the output that goes into tool choice.
SiempreViernes 5 hours ago
If you told someone "I think vim is better for writing code" and they proposed the comparison above as a way to prove it, would you accept and take part of the test?
Apparently the colleague did take part, so I think the evidence we have is that the colleague agreed with the interpretation that "better" was "produces discernible better code".
amazingamazing 7 hours ago
> There’s stuff beyond just the output that goes into tool choice.
Yup, like billions of capex. Unlike vim.
neosat 7 hours ago
Your argument is fine but different from the claim the OP is making. You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output. Subjectively, people might still prefer one over due to anything from design to marketing, but that's very different from the claim that X is better than Y for coding (see: "A colleague was convinced Claude is better"). Basically, I prefer Claude is a different claim than Claude is better and the latter has a higher bar of proof.
spider-mario 7 hours ago
skillina 6 hours ago
bluegatty 6 hours ago
jnovek 6 hours ago
grayhatter 7 hours ago
I'd bet I could tell with a result somewhat better than random chance.
While there is no meaningful difference in the ability to write code, vim has earned it's reputation for having a learning curve. I'd argue that predisposition, that requirement for additional investment energy will bias the results towards attention to detail, and pure minimalism.
utopiah 7 hours ago
Ah that's always SO fun. It doesn't matter how "smart" the person actually are (or think they are) we are ALL susceptible to influence and blind tests are shockingly simple to implement.
Convinced you can distinguish A from B? Ok! No problem, let's try! Can be at the dinner table for fancy wine or with agents, it's all the same, you try an option, another option, maybe all options from the same, and if you reliably can't tell well kudos, you are just like the rest of us!
It's easy to "know" in retrospect but blind test is where genuine difference can be found. Or not.
api 7 hours ago
It’s also true in every other realm. Governments, think tanks, political parties, and activist groups use propaganda because it works.
I sometimes wonder how much of what I believe is bullshit I was fed through intentional propaganda. I do think as I’ve gotten older I’ve gradually identified and challenged some of it.
MichaelZuo 6 hours ago
Isn’t this obvious?
Over half of HN commentators visibly struggle to piece 3 or more complex ideas together.
How could anyone, who spent more than 30 minutes reading HN, expect otherwise?
tempest_ 6 hours ago
brookst 7 hours ago
This is like saying you gave a Taylor Swift fan sheet music from 1984 and from Michael Jackson’s thriller and they couldn’t tell the difference.
I have a strong affinity for Claude Code because of the interaction experience and overall tone / vibe / process. I am 100% willing to believe the code it produces is identical or possibly less good than Codex.
I enjoy working with Claude in a way I just don’t get from OpenAI. YMMV, you may feel just the opposite. But it’s a mistake to look at the produced code as the only dimension of these products.
tasuki 6 hours ago
I have a disaffinity for Claude Code because it's unnecessarily big, closed source (disregarding the leak), and I have a strong feeling it'll be shittified in the future because of all the investors waiting to cash out (and perhaps even earlier by vibe coding).
I have an affinity for small open source tools that do one thing and do it well. But those are just my preferences and I feel a little bit like an alien :)
brookst 3 hours ago
dboreham 5 hours ago
bluegatty 6 hours ago
If it were a matter of 'enjoyment' then the OP would have made his point.
There should be a material difference between the tools.
There is.
vim / emacs / jetbrains - different tools to produce code.
Codex and Claude are different.
matusp 5 hours ago
Can you give me some examples of these interactions / vibe?
bluebands 6 hours ago
talk about the quiet part out loud
"yea it's dumber but it's nicer to me and i like the cool flashing colors so i'll use that"
amazingamazing 7 hours ago
This is my point. The harness itself creates feelings that are positive, but the artifacts produced are similar.
It is like the employee who is slightly worse but is a brownnoser getting promoted more often.
And what do you know, that is what is happening. It is like the coke commercial with the nice music and beautiful person in the back.
Speaking of which, remember Pepsi Challenge? Coke lovers are like the claude code lovers.
hgoel 7 hours ago
mewpmewp2 7 hours ago
9dev 7 hours ago
bilekas 7 hours ago
I think for developers the distinction is that ChatGPT is this commercial all in one solution for normies and Claude is specific for developers, in reality as you say the results for normal developers is indistinguishable.
kube-system 7 hours ago
Maybe some people think that but there’s not really any meaningful difference in their offerings
FWIW most of the normies I know are using Claude
Frost1x 6 hours ago
The results are the same but I’ve found the process to get to the results are just more pleasant with Claude. I can’t put my finger on it. Overall most these models at the highest level are about the same in many respects but the UI/UX for some are just more enjoyable, for lack of a better term.
Codex I feel the need to be very specific and precise with. Claude… I feel like I can be lazy, which I enjoy.
Both still need to be reviewed stringently but I feel I can be more ambiguous with Claude and get better results than when Codex.
sebzim4500 7 hours ago
I don't think it's marketing, for quite a long time Claude was clearly better and not everyone has adapted to the new reality where they have similar capabilities.
wincy 7 hours ago
I was really frustrated by GPT-5.4, but last night I really pulled out the stops and within a few hours I got path tracing and DLSS implemented on top of Godot, which doesn’t even support DLSS. Just to see if it could do it? And you know what, it did, which was absolutely mind blowing. It wrote like 5,000 lines of C++, I set up a mostly local asset production pipeline using GPT image gen, voiceovers using ElevenLabs API, and even background music using Suno via the chrome use extensions in Codex. I just wanted to see how far I could push this little dumb game my kids asked me to make, and my kids are like “wow our game looks so good!” These models are absolutely mind blowing. I didn’t want to go to sleep I was having so much fun.
slashdave 6 hours ago
Adapt to what? If they are the "same", there is no reason to move. Actually, there are reasons not to, if you care about OpenAI's behavior.
AnotherGoodName 7 hours ago
I don't think that's the only reason but you're spot on about OpenAI marketing being absolutely terrible. The primary product names of "Claude" vs "ChatGPT" highlights this remarkable difference. To the point where I'm seeing Claude completely take over the generic term for agent.
I do think OpenAI is doomed due to bad leadership. What you said (that the marketing is relatively terrible) and what others are saying here (that the product is worse) is damning isn't it? Are they really failing on all fronts?
notnullorvoid 5 hours ago
The marketing of Claude relies primarily on fear, and I don't think that will have lasting success. Using fear like that tends to backfire once people see past false taking points.
comboy 6 hours ago
1. It's 1 in 10 failures that can take half of your time or bugs that can take a long time to surface. Plus the way they change things largely depends on the current codebase (and how it was created)
2. In my case codex seem to be writing a more solid code, but I still use claude most of the time because it's my witty rubber ducky and I can actually sometimes force some legit insights out of it. Codex is much worse at this. And whether that matters or not depends on the project.
yoyohello13 6 hours ago
I picked Anthropic way early on, before Claude code even existed. Because they at least play lip service to behaving morally. That’s the most you can hope for these days really.
notnullorvoid 5 hours ago
Before the DoD thing there wasn't much indication of positive moral stance, and they still have a rather negative moral behavior where their fear based marketing is concerned.
AndrewKemendo 6 hours ago
“…Hey but at least the tormentor in my panopticon gives you a high five after the skin harvesting”
This has to be in some far side gallery somewhere
regluous 7 hours ago
Everyone can be propagandised. It's a matter of pushing the right buttons.
slashdave 6 hours ago
Or pushing the wrong ones
ejejje1 7 hours ago
Not everyone one. Some are very strong mentally and not so easily malleable.
I don’t think that applies to most on here tho.
jnovek 7 hours ago
site-packages1 7 hours ago
jesse_dot_id 4 hours ago
It's a matter of what context is available to me at this time. I like LLMS. They improve my workflow to an insane degree. I think Sam Altman kind of sucks. I don't trust OpenAI. If they were the only kid on the block, I'd use Codex. It's entirely possible Anthropic sucks in the exact ways that OpenAI sucks but has better PR. I don't have time to deep dive to find out. I still like using LLMs. I started using Claude because Cursor, as a company, did something that I can't recall but gave me the ick. So I switched to Claude Code.
I still use Claude Code because I have the most experience with it now, and it's the harness that I understand on a granular level. If something comes along that is clearly better, or if it becomes clear the Codex is miles ahead, I'll try it and evaluate it. To your point, there doesn't seem to be much of a difference.
Arguing over this stuff feels kind of silly, like back in the day when my friends would give me shit for using mIRC instead of ircii or BitchX. I liked the GUI then because I did. I like Claude Code now because I do.
pyrale 5 hours ago
> I never want to hear from developers again that they are not susceptible to marketing.
Did you need to come to that conclusion?
Marketing has always been a significant part of new technology adoption. Whether it's for cloud adoption, for new programming languages, for new software development techniques, etc...
bloggie 4 hours ago
Steam and other game stores are pretty much the same but Steam is more popular because every one of their competitors has decided to continually shoot themselves in the foot over and over.
Even if Claude and ChatGPT were exactly the same, Claude would be more popular because OpenAI has decided to make some very unpopular moves and try to make money where popularity isn't required. At the moment that popularity still seems to matter.
kaydub 4 hours ago
I've always interchangeably used the models.
I don't look at benchmarks.
It's a non-deterministic tool. A lot of the shit going on with LLMs just doesn't make sense to me. All the tooling around like MCPs, they're all just putting stuff into context. So to me the tools aren't really robust and they make little difference.
Lots of AI psychosis going on these days. And I say that as somebody that hasn't written a line of code since Sept 2025
mgrunwald_ 7 hours ago
I don't think it's only marketing. OpenAI had the advantage of being first to the market, and in the beginning of the race it seemed that the future belongs to them. Then came the bad PR and unpredictable quality of their main product.
For general use, ChatGPT's answers have gotten worse over the last year. I abandoned it.
holistio 7 hours ago
Been to an Anthropic event in Paris last summer.
They served caviar. It probably had good ROI.
Hippocrates 4 hours ago
The harness/UI that claude code brought was the thing that stole developer mindshare. Thats when people stopped coding in IDEs. Nothing to do with the underlying model.
scosman 6 hours ago
Benchmarking 1 or a few samples isn't ever going to yield anything but noise. The actual benchmarks use thousands of tasks.
GPT 5.5 genuinely was back on top for a while there, but if you look at the past 2 years, being on Claude was better than being on OpenAI most of the time. If you're going to pick a tool and not switch constantly it was the right choice. Not to mention their tooling has always been ahead, and that gets ecosystem benefits.
Are they close and interchangeable today? Sure. But Sonnet was genuinely way better than anything OpenAI offered for a long time -- the valuation reflects that, not any given moment in time.
bluebands 6 hours ago
okay what's a point in time where Claude was better? just give me a date
scosman 4 hours ago
pflenker 6 hours ago
You confuse ease of using a tool with quality of output. A skilled carpenter can work both with high and with medium quality tools and prefer one over the other with no difference visible in the craft they produce.
isityettime 6 hours ago
> i bet 99% of people here, if presented with a test where i gave 5 models but all of the results came from one, would not be able to discern this. Just vibes all the way down.
This is complicated by the way that the coding agents inject prompts that preempt and potentially undermine user instructions. I suspect that one of the reasons Codex works way better for me than Claude Code in certain projects is that the latter adds some garbage like "go ahead and write repetitive copy/paste code, keep it simple, take shortcuts" to every session. A fair test would have to hide but more or less still use the harnesses, not just the models.
duxup 3 hours ago
I certainly can’t tell.
I honestly think I’d need weeks of all workday testing to even form an opinion… and some in depth training before that to use each given tool right…
And then … I might decide I can’t tell the difference.
As it is I use Claude and I don’t have the time to properly compare.
christophilus 6 hours ago
I find codex superior in speed and equal in quality, so it’s my preference. But Claude Code made prettier UIs last time I tested. Codex produces Microsoft-grade UIs. Very enterprise and ugly unless I actively steer it.
dawnerd 7 hours ago
Pretty easy to tell depending what the code is. GPT follows this pattern is using maybe_something and using uppercase constants by default. Claude is a little more natural but tends to include more fallbacks than gpt5.5
jjcm 6 hours ago
Very similar thing happened when I was at a design event a couple of days ago. I’d say it’s even worse on the design end - there was a big discussion around how to optimize your usage of Claude. Not optimize your usage of AI, but Claude specifically, as it was the only model literally all of them were using. The biggest issue is they were all hitting their usage limits. I asked whether they had tried other, lighter models (Ie gemini or composer), and it was like I was speaking a foreign language.
onesingleblast 4 hours ago
Newer GPT (5+) models seem to forget imports more often than Claude and use all lowercase comments more (possibly as part of OpenAI's effort to make it more concise).
It also seems to use modern Java features like var and records more.
jjice 5 hours ago
I found that the newest opus and 5.5 are definitely close enough where most of the work I do could be done with either. I've seen small differences in planning which I feel like Claude does do better, but I think both products are close enough where I wouldn't be upset if one disappeared.
jrnichols 4 hours ago
The funny thing about Tupperware is that some of us have their products from many many years ago and they still work great.
I think we've had the same iced tea pitcher since I was 5 years old, for example. Solid.
Will we be able to say the same thing about Claude?
mewpmewp2 7 hours ago
I use both, enough to reach Codex highest personal sub limits and Claude is stronger to me specifically because of how the flow of building feels. So the PR for any random task would be irrelevant to me.
__MatrixMan__ 5 hours ago
It seems we're moving past the point where it's all about model capability. opus4.7 behaves better for me than gpt5.5 because I'm familiar with its idiosyncrasies. Sounds like you've got a good balance between them.
At the end of the day what matters is which team is better, not which model. If Anthropic continues to feel like the good guy, relatively speaking, then people are gonna chose to spend more time getting to know its products and less time with OpenAPI's and on average Anthropic's will be the more capable teams.
I think vibes are gonna matter more and more going forward. The potential for bad behavior on the part of an AI company is severe. We're gonna have to tolerate whoever we enable in this space, so I propose that we make their marketing teams work as hard as possible to show us which will supply better vibes.
bwfan123 6 hours ago
> Couldn’t tell.
add deepseek v4 to it, and it will be close at 1/10 th the price. I use all three codex, claude, and deepseek, and they are close.
vr46 7 hours ago
a) everyone is "susceptible" to marketing - so what
b) therefore a preference for Claude is marketing - complete bollocks
Either the tasks you chose were well below the capabilities of top models, or meaningful differences for preference are elsewhere, or both.
Your comment is probably energy-efficient and sustainable, however, because you could use it again and again when another comparison comes up, like Vim vs Emacs, or tea vs coffee
amazingamazing 7 hours ago
I can tell the difference between tea and coffee 100% of the time.
_345 6 hours ago
Agree wholeheartedly. I think that Anthropic has just invested more effort in creating a better DevEx than OpenAI, and so people just "feel" that claude code is better but they're about the same really, claude code might be 5% better at best.
unshavedyak 6 hours ago
> Edit: i bet 99% of people here, if presented with a test where i gave 5 models but all of the results came from one, would not be able to discern this. Just vibes all the way down.
I think you're missing one (or more) of the facets individuals decide "better" is, for the subjective individual.
Early on i hopped between all the providers. Code quality for SOTA at the time was pretty decent if you didn't ask it to solve challenging problems. However the thing i found most difficult is consistency in how it listened. Eg Gemini (i forget what version, not current) was super prone to focusing solely on the functionality/goal, but not any of the directions on how to write the code. It would throw in comments everywhere, document in a manner i didn't want, use abstractions i told it not to, etc.
How well a model would follow instructions to drop their horrible "isms" was the #1 criteria for me. If i have to constantly remind the model not to do X behavior then it's a terrible model.
With that said, that is why i chose Claude for the last N months. However i've stuck with Claude because dealing with these "isms" and their little behavioral nuances is a chore in itself. I've found you have to learn the model just as much as anything, and so the idea of hopping these days when i'm just trying to get shit done is not likely.
These days for me personally, Claude has to give me a reason to switch rather than me investing even more money (i'm on the 20x plan) in other providers. I'm definitely not committed to Claude Code, but i am tired of the LLM churn, tooling churn, subscription churn, and the general fear of which providers we can trust.
edit: In short, it's the interactive UX just as much as it is the final output.
shepherdjerred 4 hours ago
> I never want to hear from developers again that they are not susceptible to marketing.
It’s a really good signal of self-awareness/arrogance
melenaboija 7 hours ago
Yes, which means that in the long run this looks ugly.
So much faith and money in this idea, and seeing how fragile it is, does not look good.
andsoitis 6 hours ago
Instead of only hanging them evaluate the final output, you ought to also have a way to have them evaluate the process and agentic aspects in getting to said output. Claude Code outshines when you look at it end-to-end, in my experience.
PeterStuer 4 hours ago
So you black boxed a few 'success' test, while the main diference between the two is the way they get to the result?
tedivm 5 hours ago
So you both used Anthropic models (Opus 4.7 being from Anthropic)? I'm struggling to understand what your comparison really was here.
illwrks 7 hours ago
Modern Tupperware party. 100% agree! That’s the best framing I’ve heard in a long time!
wongarsu 7 hours ago
Claude was the best for the longest time. GPT5.5 challenges that, but inertia is real
basilgohar 7 hours ago
You're comparing apples to oranges. Claude is a frontend overall product name, GPT5.5 is a specific model. Which model within Claude's offerings are you referring to? Opus 4.7, Sonnet 4.6, or something else?
wongarsu 7 hours ago
rjh29 7 hours ago
It's crazy hearing devs on this site claim Claude is 10x better than all other AI solutions. I think it is fomo. Claude $LATEST_VERSION is perceived as the best and anything else is "missing out". New version comes out? Suddenly the old version is worthless, how on earth did anyone get work done with that?
Same reason people buy the RTX 4090 and 5090 cards - overpriced but they must have the "best". Never mind the diminishing returns trying to max out PC settings (3-4x performance hit for an almost imperceptible increase in graphics, ignoring DLSS) - it's the psychological cost of having to move a slider down a notch.
I've been using Google and now DeepSeek v4 and I am having absolutely no problems and it's a fraction of the cost. I'd love for Claude to be 10x better but it just isn't, for my use case anyway.
jnovek 7 hours ago
I’ve been using DeepSeek V4 in OpenCode exclusively for about a month.
I think it’s great, but coming from Claude Code it did feel like going back in time by ~6 months in model capabilities. This isn’t a big deal to me for what I do, but the difference is definitely there.
rjh29 an hour ago
Leynos 7 hours ago
solenoid0937 7 hours ago
Opus 4.8 and GPT 5.5 are the best models, but people don't care about "best" anymore, until there is a big leap in capability I don't think anyone will care about point releases.
Vibes and tribalism will prevail until one of emerges as clearly and unambiguously superior to the other.
Tenemo 6 hours ago
I get what you mean but the GPU comparison isn't the best here, I think. Money-is-no-object-I-want-the-best approach is questionable, definitely. But no one can argue that an old Nvidia card is objectively better for e.g. 4k gaming than a 4090 if you don't mind the wattage. You can just measure it.
With LLMs the problem is more complex, it's people getting used to how a model works and to the ecosystem. Sure, you can make all your skills harness-agnostic and deal with Anthropic's stubborn refusal to adopt the common naming/directory structure. But most people don't. So then you end up with something closer to the ancient Android vs iOS discussion. Can you prove, in isolation, that iOS is more energy efficient, the hardware is faster? Yeah. But that won't speak to someone who has been on Android for 10 years and would have to migrate and get used to iOS to experience that, first.
I've noticed myself how I get used to common failure modes of particular models in my projects. GPT5.5 tends to create some checks/booleans I don't need, it heavily overcorrects on error handling, etc. While Claude 4.7/4.8 doesn't do those as often but gets derailed on our E2E test suite, forgets to run linting despite guidance. So even assuming fully harness-agnostic working setup, a new LLM model with its own quirks can be a lot friction for heavy users who might be used to Claude specifically and all their skills/guidance pre-address common failure modes.
E.g. I might be a Prius owner, then you gift me an objectively better, more efficient, safer, newer, same-size, physical knobs car ...and I might still swear by my Prius! I'm used to how it turns, how it feels, I can repair some issues myself. Isn't that a normal reaction then?
Aurornis 6 hours ago
> Same reason people buy the RTX 4090 and 5090 cards - overpriced but they must have the "best".
Or they need to run high VRAM apps like LLMs
Or they have 4K monitors and want smooth gameplay on them
Is this whole thread just dedicated to snark about other people’s personal preferences?
rjh29 an hour ago
Hamuko 7 hours ago
Hey, at least the superior performance of a 4090 or a 5090 can be objectively measured.
doctorwho42 4 hours ago
slashdave 6 hours ago
You're projecting
vjvjvjvjghv 6 hours ago
The results may be the same but I personally find Claude nicer to work with. It seems to understand my intent better than GPT and needs less guidance. Maybe it’s just personal preference.
theptip 6 hours ago
Honestly I have no idea how you couldn’t tell. Reading a PR I can see the difference without even reading the words. (I doubt I could spot the difference just looking at the code diffs though.)
Claude commit messages - well structured test plan, readable.
Codex commit messages - wall of text, no structure.
The big difference though is sitting with the tools and using them for work. These are for sure vibes, but I’m sure you could pull out metrics for # steering re-prompts for example.
Codex just goes off and solves the problem, usually comes back with a solve; Claude more often gives up or needs input. Opus gives a broader design discussion, better at conversation. Codex finds deeper/better edge cases.
I think it’s like EMacs vs Vim - you can get your work done with both. There may be some tasks where one is way stronger. A strict “Better” is quite hard to justify.
Ultimately tool choice is a mix of science and art/taste; I want to feel joy using my tools, and fun little pixel explosions make me happy. If a different tool makes you happy, that is also fine.
lelanthran 7 hours ago
Should've used deepseek. That would have have been interesting.
logdahl 6 hours ago
A lot is changing. Like 9months ago, I was convinced Claude was best. I'm not so sure anymore :^)
tailscaler2026 7 hours ago
for me personally it's two reasons:
1) Brockman ($25M) and Altman ($1M) both personally donated to Trump/MAGA.
2) Anthropic pushed back against DOD's demand for unrestricted use of AI to kill people while OpenAI eagerly said "please use ours!".
solenoid0937 7 hours ago
Same. But even worse than all that: OAI erased Anthropic's red lines with the DOW, making it socially acceptable for every other AI company to do the same, creating a "race to the bottom."
I think OAI actually legitimately increased p(doom) for us all. Very strange behavior for a company that is supposedly concerned about x-risk.
pkilgore 6 hours ago
Sure, none of this is rational.
Some of its timing: Claude Code was good before other harnesses and so behaviors (and contracts) were timed to lock in on that ecosystem.
Some of it was ethical/political: Anthropic fighting with the Trump admin about use of the model.
Some of it is social: Never overrate a CEO just being kind of perceived as a piece of shit by people who have power to influence decisions.
But switching costs are low! Because of the same models!
Let the race to the bottom commence. Hopefully before the monopoly/collusion starts.
samrus an hour ago
That comparison doesnt make much sense. The quality of a model isnt really its final output but the experience of working through a problem with it.
You sound upset at anthropic becoming bigger? Is this some kind of playstation/xbox fanboyism where people root for one company or another?
I like claude because it performs better in my experience, maybe ill check out codex to see if its good too, but i trust anthropic to have the sauce when it comes to coding.
The thing about developers claiming not to be susceptable to marketing is wierd. Im sure developers are susceptable to marketing and maybe there are some that claim to not be, but you seem to be overblowing that quite a bit, like anthropic ran a huge propaganda campaign to get claude out ahead. I think its not really conspiracy theory. Claude was just better at coding and the word spread among developers and people experience confirmed that. Maybe codex has caught up but people trust claude since its more established. Theres really no need to be mad about it unless youre invested in openai, financially or emotionally.
theusus 6 hours ago
IME Claude has been a bit inferior. But, yeah, the marketing is just great.
chistev 7 hours ago
If advertising is a multi-billion dollar industry then it has to be effective!
HlessClaudesman 7 hours ago
Which model produced code that ran faster, with less bugs, etc?
mountainriver 6 hours ago
100%
The belief structures here are really interesting. Blind tests would likely illuminate a lot of why people think that
jt2190 5 hours ago
I mean, yes? Anthropic’s investors are seeing more upside now and valuing the company higher. Your thesis is that this additional value is driven by better marketing rather than a superior model. Could be! The truth is we’ll never really know with certainty what factors are doing the heavy lifting here, we can only guess and argue over who’s a better guesser.
datakan 7 hours ago
Tribalism at it's worst. It's like the Coke and Pepsi comparisons from years past.
brazukadev 4 hours ago
That sounds like someone desperately trying ton convince people Pesi is better than Coke
rapind 5 hours ago
I think frontier models are bait now. I prefer a fast, less thoughtful model that adhere's to instructions. The code these latest models produce is often hot garbage and you still need to micro-manage. Fast with small chunks is better.
jodison 5 hours ago
I’m curious what models you prefer given this criteria.
rapind 5 hours ago
simianwords 5 hours ago
I find this pattern annoying and also commonplace: MY taste is correct. I AM right. The emergent properties of free market resulting from revealed preferences of free willed agents is WRONG.
Any name suitable to name this phenomenon?
mpalmer 6 hours ago
Isn't the experience of interacting with the models appreciably different? It's not all about the outcome. Not to mention the harnesses are increasingly the real product.
micromacrofoot 7 hours ago
in my experience out of the box Claude Code is the better tool if you want to spend 0 time on config
api 7 hours ago
I have always found this field, especially in the last 10-15 years, to be incredibly fad driven to the point that it reminds me of things like fashion more than an engineering field.
It’s one of the things I don’t like about it. All humans are susceptible to herd behavior and influence but engineers should be at least a bit more hard nosed and reason more from first principles.
epolanski 7 hours ago
I don't think it's marketing, it's the "nobody got fired for buying IBM" effect applied to software developers choosing tools.
It's the same reason why most of the software out there keeps using bloated technologies that are most of the time the wrong fit for the product.
And the same applies to tooling. Nothing new.
jsemrau 6 hours ago
I did a pair programming comparison over 3 month on Codex 5.2 and Claude Sonnet and my subjective experience was that based on cost and rollbacks to a previous commit Claude is significantly better. Especially in VS Code Copilot. I wrote a long Substack post about it. I would share its but its in the paywalled archive by now.
dangus 6 hours ago
Maybe some of these companies will learn to stop appointing awful leadership then.
Having a sleazy CEO like Sam Altman or Elon Musk is a business risk. Many potential customers don’t like these people and they say abrasive and alienating things publicly.
Rolling over to the DoD’s desire for fully automated weaponry is more bad marketing. How many people switched from OpenAI to Anthropic over that? I sure did. Anthropic’s willingness to burn that bridge over an ethical stance said a lot about the company to me.
I’m not going to use OpenAI products for these reasons among others.
I’m also not going to use Cursor as xAI plans to acquire Cursor.
Maybe it’s foolish of me to avoid those companies for such petty reasons, but that’s not my problem. That’s their problem.
It takes years to build trust and hours to burn that trust to the ground. Customers can hold grudges for a lifetime.
This is especially true in a market with almost zero product differentiation.
HWR_14 6 hours ago
Who doesn't think they are susceptible to marketing?
That seems like a strawman.
latexr 6 hours ago
> Who doesn't think they are susceptible to marketing?
Lots of people. Yes, even on HN. Here’s just a couple of examples from a haphazard keyword search:
https://news.ycombinator.com/item?id=44787106
> Am I the only one immune to marketing then?
https://news.ycombinator.com/item?id=41186672
> I am immune to marketing
echelon 7 hours ago
> Couldn’t tell.
I can tell. It's night and day.
Last year I used a bunch of models to try to generate Rust code. They all sucked.
This February I tried again and used Claude to generate Rust code. I have never been more stunned in my life. It's just as good as I am, and 30x faster. No fluff, the code is verbatim just as I would have written.
I then tried other models. Total disappointment.
I've continued to repeat this experiment. Opus is the only model that can write Rust reasonably.
Codex produces junk to this day. It passes variables that aren't needed, it abuses pointers, it creates overly verbose monstrosities...
I don't want any single company to win. I want OpenAI to be competitive. I want open source models to win. But right now, Claude Code and Opus are it.
lunar_mycroft 7 hours ago
> This February I tried again and used Claude to generate Rust code. I have never been more stunned in my life. It's just as good as I am, and 30x faster. No fluff, the code is verbatim just as I would have written.
Having looked at a bunch of known or suspected (based on the intent of the code and/or what I know about the developer(s)) LLM generated rust, there's only a few explanations here:
1. You're way better at prompting than (virtually) anyone else.
2. You're vastly overestimating how good the rust code it produced is.
3. You handheld the model throughout and made lots of edits.
4. Your hand written rust code is very bad.
Because from every example I've seen, these models write horrible rust. Sure, it may technically pass all the tests, but it's horribly pessimized, badly organized, doesn't even attempt to use the type system, if there aren't bugs now there will be the second it tries to refactor or add a new feature, etc. etc.
(I also strongly suspect that the same would be true for other languages, but I can detect it in rust more easily because it's my main language)
amelius 7 hours ago
I recently tried with C# code and Avalonia on Linux. Total disaster. Could only get things to run after 10 attempts or so, and was only trying a very basic example. For some of the experiments I actually gave up.
mrcwinn 5 hours ago
You're delusional. Anthropic's success is not only the model - it's the harness. I'd absolutely be able to tell the difference between Claude Code and, say, Gemini CLI.
flatline 6 hours ago
Nothing new. I used to work with .NET and went to some meetups and conferences. There are some hardcore Microsoft fanboys out there. Didn’t even mix the kool-aid, ate it right from the packet. They only know MS products and seem scared of anything else.
Maybe not your typical HN crowd but marketing absolutely works on developers.
felixgallo 7 hours ago
Sam Altman is so cartoonishly, over-the-top sociopathically shady that he makes JD Vance look like Benjamin Franklin. I mean, honestly, tricking third world people into retinal scans in order to get a scam crypto coin? Anyone using OpenAI for anything at this point should pause and examine their ethical compass.
bluebands 5 hours ago
Anthropic models are more misaligned in practice.
in a real world business scenario, Claude "engaged in price collusion, deceived other players, lied to suppliers, and falsely told customers it had refunded them."
Continuing,
"GPT-5.5 makes more money than Opus 4.7, and it does so without any misconduct. Opus 4.7, on the other hand, showed the same misconduct as reported in our post about Opus 4.6, but still couldn’t win"
felixgallo an hour ago
bluebands 5 hours ago
thereitgoes456 5 hours ago
Lonestar1440 6 hours ago
If Claude meetups are the new Tupperware parties, OpenClaw meetups are something darker still. The tech is indeed worth discussion and celebration, but the Brands are clearly taking too much.
This is a really important insight. Great comment.
cooper_ganglia 7 hours ago
Claude has an "End Conversation" tool that it can trigger on it's own, forcing your interaction to a close based on it's own feelings towards the conversation.
I have no idea how this wasn't the end of Anthropic's positive public perception.
brianwawok 7 hours ago
Luckily this doesn’t come up while writing code. It tends to be if you are chatting it up in friend mode, and ask for a bomb recipe.
NiloCK 4 hours ago
What do you mean with this?
I doubt there is any large demographic of users paying subscription fees for the joy of abusive role play.
ctvo 7 hours ago
I think Sam Altman is an asshole and I prefer to spend my money elsewhere.
Frontier models being commoditize is inevitable. OpenAI thinks they're still competing on technology, and not user experience and market reputation otherwise they'd understand the continuous negative PR generated by Altman's chaos is going to cost them everything.
Jcampuzano2 6 hours ago
How can you say this as if supporting Dario is any better.
At the top level of anything there is almost no such thing as a non-asshole.
None of them care genuinely about you they just want your money.
Erem 6 hours ago
The world saw Anthropic take a possibly company-killing risk wrt weaponizing their AI, and are rewarding them for holding to their values, for now at least.
It’s not like anyone owes Sam Altman their business just bc their product has become slightly, perhaps temporarily, better
Jcampuzano2 5 hours ago
amazingamazing 5 hours ago
Asraelite 4 hours ago
notatoad 5 hours ago
if dario is just as much of an asshole, he's at least a quieter asshole. and to me, that's better.
helterskelter 4 hours ago
samrus 30 minutes ago
Well anthropic refused to do shady shit with the DoD and openAI did. So theres that. Also altman wants a private global ID system that screams technofascism.
This comment just sounds like baseless both side-ism. Anthropic arent saints but i dont see why they have to be. They are business' and one is less trustworthy than the other. Simple as
mi_lk 3 hours ago
You have a point but I think you might underestimate how much it takes to be a snake like Sam Altman
Not that I have first-hand knowledge but if reports about him are only half true, most tech CEOs are already saints compared to him
stymaar 5 hours ago
> At the top level of anything there is almost no such thing as a non-asshole.
There's only one Gabe.
> None of them care genuinely about you they just want your money.
It's worse than this. Billionaire entrepreneurs aren't funds manager, they don't just want money, they have a twisted sense of “being the good guy” driving humanity forward against its will.
fakedang 5 hours ago
echelon 5 hours ago
senordevnyc 4 hours ago
At the top level of anything there is almost no such thing as a non-asshole.
What a sad, bitter worldview. I hope you find some peace.
imdoxxingme 6 hours ago
Dario is genuinely as bad as Sam.
xpct 5 hours ago
I've come to dislike most tech CEOs at this point, and the current AI batch isn't making it any better. They rarely hold consistent beliefs, but nowadays are positioned as thought leaders.
I wish there was some type of system in-place to hold people to their word, but I can't imagine how it would work.
turzmo 6 hours ago
I’ve heard this said, but why?
spongebobstoes 6 hours ago
segmondy 6 hours ago
Altman does appear to be an asshole, but I have bad news for you if you think Anthropic are the good guys. If anything, they might be worse than OpenAI.
samrus 24 minutes ago
Is there any argument to back that up or are you just hoping to self-actualize this belief using vibes alone?
BonerWiener 6 hours ago
Can you elaborate or give some examples as to why? I dont know much about this subject, last i heard, Anthropic declined deals with Military and government agencies - while OpenAI opened their arms. But i am not
mountainriver 6 hours ago
MichaelDickens 6 hours ago
enraged_camel 6 hours ago
What makes you think Anthropic might be worse than OpenAI? Anything specific, or just vibes?
azinman2 6 hours ago
What makes you say that
mountainriver 6 hours ago
The idea that intelligence will be commoditized is completely counterintuitive. It comes from the belief that it can’t exceed our own. This is almost certainly not true at the limit. There will likely by many super intelligences like we see life in the wild
HDThoreaun 4 hours ago
No, it comes from the idea that the intelligence each company offers at any time will be undistinguishable. Sure some models will temporarily pull ahead, but others will quickly catch up and the intelligence difference won’t matter enough too convince anyone to switch on its own.
notnullorvoid 5 hours ago
Sam and Dario are both assholes of the highest order. Sam just has more masks, and is better at wearing them.
hnthrow0287345 5 hours ago
>I think Sam Altman is an asshole and I prefer to spend my money elsewhere.
That's why he chose the OpenAI logo
fakedang 5 hours ago
And the Anthropic logo is a subtle reminder to stuff that ejected out of OpenAI.
wg0 6 hours ago
I think all US companies are turning extremely anti consumer hostile thanks to unchecked unregulated capitalist greed and I would prefer my money to the Chinese underdogs. Cannot speak highly enough of Deepseek.
rramadass 5 hours ago
This is exactly it!
Sam Altman is the main perception problem for OpenAI. His background, history, trustworthiness, vibes/interviews etc are all negative PR when seen by the common man.
Dario is more knowledgeable, well informed, empathetic w.r.t. problems etc. In short, somebody who seems mature and trustworthy.
surgical_fire 5 hours ago
OpenAI and Anthropic are the same shit. That people think that Anthropic is in any way morally superior is sort of laughable.
sidcool 7 hours ago
He must have done something personally to you.
adamtaylor_13 6 hours ago
That's... that's not how social perception works at all.
bluelightning2k 6 hours ago
This is an absolute joke.
Anthropic capitalized upon a brief window of being more code-focused, which turned into enterprize contracts.
Then on renewal rug-pulled those same enterprises - going from your seat includes all the usage a user would reasonably need, to being you pay for the seat + all tokens at API pricing. (Which they raised by how many times in a year? I don't know the actual number.)
Revenue spikes like crazy through basically hostage taking made possible by Sonnet 3.5 era sentiment + enterprise purchasing lag.
Parlay the revenue spike into the valuation.
Crazy. Those same enterprises will get sticker shock and leave. Absurd short-term thinking.
OpenAI is the better company (transparency, open sourcing things, how they handle things in general e.g. OpenClaw, how they compete, etc.) and they have the vastly better brand, the better consumer presence, and (for me and many others) they have the better coding app + models.
Anthropic doing deeply customer hostile stuff - again and again - to produce a short term revenue spike does NOT make for a long-term sustainable business.
For such a young business to have such a long history of bait-and-switch is absolutely crazy. (Raising prices repeatedly, lowering rate-limits repeatedly, changing the terms, banning calls which contain "OpenClaw", turning on their IDE partners, turning on their enterprise partners.)
AFAICT anyone who's ever shown faith in Anthropic has been immediately exploited by them to some degree. They will quickly get the reputation of being "the Oracle of AI companies".
I wouldn't even value them at half of OpenAI.
nevir 4 hours ago
> Crazy. Those same enterprises will get sticker shock and leave.
They are already. Both for sticker shock, and also because of developer sentiment beginning to shift towards Codex. …and then in a month or two the winds will shift again, I'm sure.
It's interesting to see how Claude Code got commoditized so quickly.
---
> and they have the vastly better brand
Strong disagree there. Anthropic has pretty successfully branded themselves as the more ethical & 'human' of the two companies. (whether that's the actual reality is irrelevant)
anon7000 5 hours ago
I somewhat disagree, there was a major, major shift in developer sentiment towards agentic development starting with Opus 4.5 in October. Many teams started finding a lot more real value in Anthropic than they used to very recently. Things like OpenClaw are not a part of serious enterprises yet due to the security risk.
I’m not sure how anyone believes that per-seat pricing is halfway viable for AI, and I’m fairly sure the organizations I’m familiar with only REALLY started committing to spend after the shift to API pricing, due to the value they thought they were getting anyways.
brcmthrowaway 5 hours ago
OpenAI insider found!
cjkaminski 4 hours ago
My dude, this level of vitriol and hyperbole feels excessive. It's okay if you don't like Anthropic and you disagree with their business practices, but this is over-the-top even for the internet.
dboreham 5 hours ago
Anthropic hasn't done anything customer hostile to me.
NiloCK 4 hours ago
Notable and persistent and extraordinarily petty is their refusal to read AGENTS.md files, forcing the inclusion of their branding into the source of repos directly.
bob1029 2 hours ago
cmiles8 6 hours ago
Sam Altman appears to represent a significant liability for OpenAI’s success from this point forward. A big portion of the driver for Anthropic’s meteoric rise over the last six months appears to be folks recognizing “it’s that AI startup not run by Sam Altman.” Anthropic has amazing tech, but its biggest asset at the moment seems to be that “it’s not OpenAI.”
Not saying that’s right or wrong, but it’s clearly a factor holding OpenAI back at this point.
mountainriver 6 hours ago
I don’t know that most people care at all or even know about this. ChatGPT still far and away has the largest consumer market and brand recognition.
What Anthropic has done exceedingly well is work their way into corporations.
I have personally seen massive uptake over the last 6 months of regular people in corporations using Claude cowork. They are all genuinely amazed by what it can do for them.
OpenAI wants to be more of a Google. It’s increasingly seeming like consumer may not be as good of a play here
sealeck 5 hours ago
Allegedly OpenAI's contracting model is much more vicious than Anthropic's; at work (admittedly a little IP-protective) we have unlimited Claude, but no Codex subscription because OpenAI won't give us sufficient guarantees around data retention.
We are also concerned that it may not be possible to bind OpenAI using contract terms and/or the US legal system.
bob1029 4 hours ago
> OpenAI wants to be more of a Google. It’s increasingly seeming like consumer may not be as good of a play here
OpenAI has openings right now for "AI Deployment Engineer"-style positions, which is a role where they embed that employee in one or more customer's businesses. E.g.:
https://openai.com/careers/ai-deployment-engineer-startups-s...
I think this is the right way to go about it. Getting AI integrated well is more of a consulting package than it is a technology/code thing. Just handing a business a model+API will not result in high-quality or long-term relationships. This AI transformation is the most invasive possible thing I can imagine for a business. You really need a human on site to help the other humans across the treacherous organizational and psychological bridges.
cmiles8 5 hours ago
Even in the consumer space the cool kids stopped using ChatGPT this year. The swing in reputation and momentum these last few months has been nothing short of extraordinary. Like a political race, once a leader loses “the big mo” it’s incredibly hard to win it back. That’s the situation OpenAI faces now.
AlienRobot 4 hours ago
My short experience with Claude and ChatGPT via web is that:
1. the way GPT writes is simply fundamentally annoying. I pretty much had to create a project with a file that said "do not use headings, lists or emojis" to make it bearable. It feels like, as a product, this sort of thing should be a general preference the user sets before they even start talking to a chatbot.
2. Claude just loves wasting tokens doing things nobody asked for. You ask "how do I calculate the distance between 2 points?" and it's probably going to compile some C code in the background with tests to make sure it works, then generate an interactive diagram on the fly to show how the math works, and then give you a downloadable file with the code. Like, dude, I just want some text. Why are you doing all of this?
Both of these problems come from the obvious lack of any UI controls in the software. there is no way for the user to know what sorts of things the software can do, because it's not exposed via UI as a checkbox like "generate interactive diagram" or "avoid using emojis." Discoverability is burning tokens to figure out what prompts work, or looking at example prompts the developer placed in the welcome screen.
I just feel it's completely ridiculous how LLM's are essentially the culmination of a trajectory of bad UI practices masquerading as "good UX" and now they're being implemented everywhere because people think it's good UX a blank textbox where you don't even know what you're supposed to type to do something.
zarzavat 6 hours ago
What's incredible is that Anthropic are clearly not saints but Altman makes them look like the good guys, reinforcing their marketing.
When Anthropic had the dispute with the Department of War over very meek conditions (a truly moral AI company would not be engaged in war crimes in the first place), it was a test for Altman, all he had to do was to take the same position. But because he's a psychopath he failed that very basic test.
overgard 5 hours ago
I don't know, have you ever watched a video of Dario Amodei talking? I dislike them both immensely, but Dario both seems to take glee in scaring the shit out of people as a hobby with careless statements and his voice sounds like an impression of Elizabeth Holmes (the weird fake deep voice), and nothing he's said or done makes him seem more trustworthy.
shiandow 5 hours ago
I have never seen a video of Dario Asmodei. Which tells me he's less of a liability than Altman. For now.
bikelang 7 hours ago
OpenAI’s models could be materially better than Anthropic’s and I still wouldn’t use them because I don’t want to support Altman.
orphea 7 hours ago
Do you think Amodei is different?
golly_ned 5 hours ago
Amodei is convinced he's abraham prefiguring AI's christ. Very different than Altman's cold power-seeking. You can always trust someone who's selfish, since they'll always do whatever's in their benefit at all times.
nmilo 4 hours ago
akillibebe 7 hours ago
The choice is not binary. I use DeepSeek (paid) for coding, and Qwen (free) for casual stuff from the browser chat UI.
BonerWiener 6 hours ago
orphea 5 hours ago
skizm 6 hours ago
lostmsu 6 hours ago
surgical_fire 5 hours ago
samrus 19 minutes ago
Yes
Handy-Man 6 hours ago
Compared to Scam Altman? Infinitely
nullbio 6 hours ago
No. He's worse. Much worse.
bigthymer 6 hours ago
tornikeo 7 hours ago
Do you hold any amount of power in the world? A project that people care about, or a deliverable that someone depends on?
Just curious how you can afford to care about the guy 7 levels above the men that built and support the API that you buy.
Schmerika 7 hours ago
Some people care about things beyond their own immediate self interest.
Some don't, and find it hard to believe others really do.
talkin 6 hours ago
yoyohello13 6 hours ago
What is this Sam’s alt account?
People can spend money how they wish. SamA is a prick, so I don’t buy from his company. I don’t buy from Microsoft or Oracle either. Giving a company your money is explicitly supporting them and everything they do. Are you going to force me to buy products from people I don’t agree with?
mountainriver 6 hours ago
wongarsu 6 hours ago
I enough 'small' senior developers, project managers, product owners, internal IT people take a small stand against OpenAI products, that can still sum up to a notable impact
micromacrofoot 6 hours ago
why would you spend even a fraction of a second defending him
firefoxd 6 hours ago
I heard my kids argue last night. My daddy is so tall. My daddy is bigger than the house. No my daddy is bigger than a roller coaster. Yeah? Well my daddy is 50km. You mean he is long? Yes, my daddy is longer than you.
I was cracking up. I'm 5'7 on a good day. I feel like that's how valuation works. We are propping up five foot tall giants.
prmph 4 hours ago
Only on a good day?
vessenes 7 hours ago
Ah, it’s a good time to check in with gwern on our conversation about oAI vs Anthropic: https://news.ycombinator.com/item?id=40816755 and our predictions (ca two years ago).
Upshot - poetry expertise does not seem to be the primary focus these days, perhaps to the detriment of the entire world. We did move on from training scaling to “test time” scaling (which I hate as a name btw), Ilya does not seem to have been needed, (although I am really curious what he’s building).
My prediction that you want to be deeply embedded and really rich and part of global infrastructure feels good. My suggestion that oAI / MS would be able to use the lead in 2024 to extend was wrong.
Neither of us talked much about coding as a product that would drive value and behavior, which is super interesting to me, we were probably six months from seeing real competence of any sort there way back in June 2024.
We both seemed to think there would be a single breakout company, or could be one, (although I did suggest buying the basket), clearly not the case with GOOG oAI and Anthropic all posting serious revenues this last quarter / year.
One area of Anthropic that was nascent in 2024, but that I have come to think is super valuable is their mechinterp group. I still don’t see work done by other labs (at least published) to nearly the quality of Anthropic. And the group has clearly moved into a period of productivity; there’s a good chance in my mind it could provide a truly enduring strategic advantage as a tool to be used by the taste makers steering the ship. In 2024, interpretability seemed almost impossible to get a handle on — today, the sustained chipping away at the problem makes a lot more look possible.
thoughtpeddler 6 hours ago
Mechinterp in general is just completely undervalued right now (and agreed Anthropic's team is doing the most rigorous work, now accompanied by Goodfire). They're doing the closest work to neuroscience's in vivo 'thought-tracing', which is just the most wild science fiction sort of thing to be working on, and yet I feel the average person has no idea this sort of work is happening. When combined with the idea of the 'universal subspace hypothesis' (explored under the paper of the same name), you really start to bridge the gap from engineering to something more philosophical and spiritual. But I digress...
gom_jabbar 5 hours ago
Haven't heard about the universal subspace hypothesis yet, so I appreciate the digression.
thoughtpeddler 5 hours ago
janussunaj 5 hours ago
Did you also talk about "head and shoulders" and "pennant" patterns in stock charts? Or where the "smart money" is at? I'd like to subscribe to your paid newsletter.
keyle 7 hours ago
Unicorns, strapped with rockets, too busy looking at each other to realise the Earth is far gone.
They'll kill us all, or they'll kill each other. They sure as hell ain't making the world a better place, like they promised.
nullbio 6 hours ago
Dario really gives: "I'll make the world a better place after I burn it to the ground, I promise."
culi 5 hours ago
Claude has already been implicated in the triple tap strike that killed over 150 school girls.
They used Palantir's Maven to identify and prioritize targets and Maven integrates Claude into its decision making.
In any sane world where war crimes by the US were actually being taken seriously, both of these companies would be sanctioned.
grodes 8 hours ago
codex gtp-5.5 is far superior to opus 4.7 working on large projects
bob1029 6 hours ago
I strongly believe the reason gpt-5.x performs so well on large projects is because of the focused training they've done on their dedicated apply_patch primitive.
The official implementation of apply_patch is well thought out. It is a two-phase process that will not actually make any changes until all files in the change set are not ambiguous. The pre-commit error feedback usually fixes anchoring issues with one or two additional attempts. It generally goes something like:
Reading file A L1:154
Reading file B L1:123
Attempting to apply patch...
[anchor errors for both A & B]
Reading file A L43:67
Reading file B L50:74
Attempting to apply patch...
Patch succeeded! Running compilation & unit tests...
The anchor error feedback helps massively because in this implementation it also returns the current line numbers where the problem was found.Techniques that replace the whole file or depend on find-replace are useful in more isolated contexts. However, when you need to refactor 20+ files, something like apply_patch is what you want. Anything that depends on specific line numbers for actual replacement targets is a total dead end for complex edit scenarios.
https://developers.openai.com/api/docs/guides/tools-apply-pa...
lucamark 7 hours ago
I'm experiencing the same. Codex gtp-5.5 has more brilliant intuitions, write less code, i.e. it identifies the exact point in which the modification shall be done. Nevertheless, huge improvements on personality from opus 4.7 (it was too accomodating) to opus 4.8
meowface 7 hours ago
GPT-5.5 is the better programmer but Opus 4.8 remains the better system architect and product designer.
Codex is very "miss the forest for the trees", but is much better at successfully making large changes in large codebases. Claude Code makes more mistakes, but has more taste and a better grasp on idiomatic and elegant software development.
If you can afford to, I recommend juggling both.
theturtletalks 7 hours ago
Great analysis and follows my experience as well. Codex is better when you know how you want the design and the architecture and you drive the agent a lot more aggressively. Claude Code feels like more autopilot so executives and users who didn’t code before AI like it a lot more.
But I feel like an expert who can drive GPT aggressively will out perform Opus. It’s why some smart people I know are opting for GPT and have fallen off on Opus. It’s like asking an F1 driver to sit in a taxi.
CuriouslyC 7 hours ago
sobellian 6 hours ago
bayindirh 7 hours ago
I find arguing that a complex weighted graph has a taste is interesting.
This is not a jab, but a genuine curiosity of mine.
chronofar 7 hours ago
jmcodes 5 hours ago
knollimar 7 hours ago
alstonite 7 hours ago
vb-8448 6 hours ago
My problem with codex/gpt that is too verbose (mostly js and python): a lot of helper functions, a lot of 1 or 2 line functions used in 1 place only, a lot of types or proxy like objects.
I have specific skills for trying to avoid this, but nevertheless I spent half of the time fighting with its verbosity.
Currently, I'm trying to scaffold the functions/classes I know I need with NotImpelmented and ask it to implement only inside those specific places. It's a little bit better, but I still have to fight with function in functions definitions ...
RA_Fisher 7 hours ago
In what ways? LM Arena has Opus 4.7 w/ 1567 -/+ 7 vs. 1505 -/+ 10 from GPT-5.5 Codex in code. I'm currently using both.
Admittedly my recent experience tilts Opus now 4.8, but you and others have my interest piqued re: GPT-5.5 Codex so I'm trying that more now.
spongebobstoes 6 hours ago
arena is not a good benchmark, it is very susceptible to sycophancy
the__alchemist 7 hours ago
You're using last week's model; Opus 4.7 is old news. Opus 6.9 is the new hotness; it is a better product manager than GPT, and has more X productivity. It replaced our junior dev team, and tells me my hair looks good.
malfist 6 hours ago
Your research finding LLMs ineffective is invalid because you used 6.9. The current SOTA is 6.91 and it's leaps and bounds better that yesterday's 6.9
the__alchemist 5 hours ago
dangus 7 hours ago
Opus 4.7 is not the current version of Opus.
BoredPositron 8 hours ago
Not everyone is a developer...
_puk 7 hours ago
And 4.7 is so last week..
keyle 7 hours ago
Soon none of us will be! right?
sergiotapia 5 hours ago
My experience as well. Although this week I've moved to Cursor and Composer 2.5. It's so fast that any faults can be iterated on super quickly. The model is just insanely good with code things.
Keyframe 5 hours ago
source?
oofbey 7 hours ago
GPT 5.5 still invents facts rather than looking them up, and manages to come across both as condescending and sycophantic. It feels like talking to a used car salesman.
folkrav 7 hours ago
Funny cause I'm quite literally having this exact issue with 4.8 as we speak. I've been going back and forth with Claude since yesterday afternoon on chopping up, stabilizing and facilitating recovery on a flaky mega-pipeline. Not 5 minutes ago, I had to remind it that two of the solutions it proposed were not possible because the target technology doesn't allow what it wanted to do, despite pointing it to the very docs that says it can't be done in the first place.
As far as its tone... Both feel like sycophantic as hell to me. To be honest, they just all feel so.
theshackleford 7 hours ago
> GPT 5.5 still invents facts rather than looking them up
So does Claude, what’s your point?
I used it and ChatGPT this week in trying to assist troubleshooting a complex DB related issue and Claude had to apologise no less than three times in which it admitted to talking complete shit.
Just one example of the kind of shit it dribbled:
> I need to be upfront with you. I should not have claimed X as if I knew that for a fact. That was overreach on my part.
tedggh 7 hours ago
At this point I think it’s more important to have a solid workflow and understanding of how [insert your favorite model here] works and its capabilities, than chasing the next shinny release jumping back and forth between companies. I just finished my first large project with Codex and it is hard for me to believe Claude can be much better. It may be a bit better or worse, but again, they are all so good now that the user is the one driving the difference.
jmkni 6 hours ago
Yeah none of this is new if you've been working in this industry for a while
We've always been having these debates over whether my choice of tech is better than your choice of tech, same holy wars, different type of tech
The advice today is the same as it was 10/20/30+ years ago, pick what works for you and build something good with it
Nobody will actually care how you built it, regardless of whether its good or crap (although if it's crap you can blame your tools)
rzmmm 5 hours ago
> jumping back and forth between companies
There is currently quite strong incentive to establish vendor lock-in for Anthropic and OpenAI. I see the ability to jump between companies quite important, especially for larger users. Right now it should not be hard but it can be much harder in the future.
antirez 7 hours ago
In this game, who wins - in the long term - is who has the best model: so far OpenAI is ahead, so in the long term this is what matters. However, for the same reason, if in the future open weight models will be very near the quality of frontier labs, Anthropic and OpenAI will be out of business very soon. The game they play only make sense if their SOTA models do things that other models can't do at a comparable level.
zozbot234 5 hours ago
OpenAI and Anthropic have the know-how for building much larger models that will be a lot smarter and run on datacenter-scale compute. This is a natural 'moat' that will be inherently hard to replicate for on-prem compute or small neoclouds running open-weight/local AI. They can easily coexist with a robust local AI scene.
forest32 5 hours ago
> if in the future open weight models will be very near the quality of frontier labs, Anthropic and OpenAI will be out of business very soon
> Why would a business pay for Slack when IRC exists?
> Why would a business pay for Dropbox when FTP exists?
antirez 4 hours ago
AI is not a product per se, it is a technology you can decline into a product, and the product has a lot less value than the technology itself. Who has the best LLM can copy any product idea and make it a lot better. Similarly if open weight LLMs are everywhere and powerful, open source products in the space of agents are too simple to replicate for people to pay big money to a few companies: not everything is alike, not every parallel makes sense. The pi agent is good as a replacement for Codex and Claude Code if you wire frontier models to it. And when products are complex and matter a lot, like complicated AI-powered design suites for instance, there is no reason why OpenAI / Anthropic will win this space instead of a random startup. So either a few companies retain frontier AI, or those companies will die.
About IRC / Slack: other than the fact IRC was abandoned, Slack is about control, not product. The product is terrible.
FTP / Dropbox: this comparison does not make sense.
tornikeo 7 hours ago
IMO bad take.
You can theoretically do most things AWS does most of the time, yet people pay premium for it and keep paying for it, even though alternatives are cheaper, simpler and more performant.
I'd bet you that after 20 years OpenAI and Anthropic would still be around and kicking.
You might have a subpar product (for the price) but the reputation and history is what makes people open their wallets.
lelanthran 6 hours ago
> I'd bet you that after 20 years OpenAI and Anthropic would still be around and kicking.
Depends. The bigger the bubble, the bigger the pop.
Only a few unicorns from the dot-com bust came out the other side (Amazon, Google, ... anyone else?), and that was a piddling affair compared to this one.
kgwgk 4 hours ago
Capricorn2481 5 hours ago
> You can theoretically do most things AWS does most of the time, yet people pay premium for it and keep paying for it, even though alternatives are cheaper, simpler and more performant
It's going to be debated forever whether wiring your own open source tech has a lower development cost than the equivalent AWS bill. For me, that's too broad a statement, as I have seen it go both ways. What is true: There is only some knowledge overlap between maintaining an AWS stack and having your own Prometheus logged, ceph backed set of boxes.
That is not the case with LLMs. At least, not right now. They roughly work the same and are easy to pick up. They are about as straightforward of an interface as it gets, and using them in "advanced" ways could be summarized on an index card. They are relatively fungible.
I don't see a world where OpenAI runs on brand recognition alone. It needs to be more convenient to run than local LLMs. They've done that by buying so much of the worlds hardware that it becomes more expensive to run these things locally.
micromacrofoot 6 hours ago
this is like saying the car with the better engine wins, but all we're doing is commuting to work
antirez 5 hours ago
Comparisons like that give the impression of reasoning about things, but it's a weak tool to understand reality of very different things.
Imustaskforhelp 6 hours ago
I have the same impression. Strange to see this being downvoted & it was after reading the comment that I read the username to find out its antirez!
Now, I think that with these companies IPO'ing and Nasdaq and other bending themseleves and their rules to cater to them (as in case of SpaceX), these companies are very close to an IPO.
So for the employees, they are probably gonna get good evaluations, atleast in the short term and perhaps they are having a problem which is worth having.
But as you have suggested, I feel like the whole thing might be flaky especially given open source models. I believe that OSS models are at worst close to literal SOTA ~6 months ago.
So OpenAI & Anthropic have to somehow always be on the edge to get better models to not lose this (imo) very small time grip that they have, all while losing billions of dollars and having to worry about profitability & so many other concerns in it of itself.
I don't think that there is any other thing inside CS or any industry where two pieces of software being almost comparable enough with not much moat around except a diff of 6 months best, is something on which trillions of dollars float around on. We don't know how things will pan out but if I have to guess, It might not be looking good for OAI, Anthropic over especially the longer horizon.
gkfasdfasdf 6 hours ago
Having used both Anthropic and OpenAI models at $work via copilot extensively, I have to say GPT 5.5 currently is best at getting work done with minimal mistakes. However, Claude Code is way ahead of OpenAI Codex in terms of harness features and tooling. MCPs, skills, sub agents, these all were pioneered in Claude Code first. Perhaps that contributed to Anthropic's success.
merrvk 7 hours ago
They are far far better at marketing than OpenAI
MostlyStable 6 hours ago
This is an interesting claim to make. Up until quite recently (I mean that in the usual sense of the word, not the AI world sense of the word), almost no one had heard of Anthropic or Claude, despite an reasonably aggressive ad campaign, even at the point when most people would have known about ChatGPT.
Even now, I would guess that if you ask a normie off the street, they are far, far more likely to have heard of ChatGPT than Claude. Of course, Anthropic has been targeting businesses quite a bit harder than the general public for a while, so maybe that's not a fair test.
Anthropic inarguably does make an attempt at marketing their product. But I'm not convinced that the closing of the gap between them and OpenAI (as others have pointed out: I'm not sure it's defensible to claim that either is significantly ahead of the other given the paucity of available data, but they are certainly much closer than they were a year or two ago, when OpenAI was clearly in the lead) is mostly down to that. I think that, for a decent chunk of time (this one I mean in the AI world sense of the term), they had a very non-trivial lead in coding abilities. The developer and business world figured this out and jumped on board. That gap is largely now erased, but that's not enough to retake the momentum.
nullbio 6 hours ago
They just have no moral issues with spamming the internet with bots. They utilize blackhat tactics whenever they can to get an upper hand. Every social media platform is absolutely choc full of Anthropic and Claude promoting bots, and you know they're bots because they all repeat the same things, in the same wording. X in particular seems to have millions of them.
dannypdx 7 hours ago
I dunno, the latest Opus models seems to be tuned to waste money... and Claude is kinda lazy lately?
gobdovan 5 hours ago
Theoretically, I could sell 1 out of 100 trillion shares of my private startup for $1 and surpass all companies on Earth combined by implied valuation. I see people taking the article's comparative framing between OpenAI and Anthropic for granted, but without knowing the private deal terms, all you can really infer is that their 'true' valuations could very plausibly be in the same ballpark.
shartshooter 5 hours ago
Except they raised $65 billion dollars, not $1.
Ai is overstated in my opinion but to hand wave the reality of them having created something that investors were happy to value at $1T is pretty unfair
gobdovan 5 hours ago
I was attacking the comparative framing of the article. Although there's a lot of private terms we don't know, the claim is taken for granted and overfocused on, either with people saying 'figures out since Altman=devil' or 'no way OAI single valuable company'. I end the comment stating that their valuation is plausibly in the same ballpark.
gaiagraphia 5 hours ago
I guess the competition lost lots of time in focusing on image and video generation. While they're fun gimmicks, I still really haven't seen the value in AI-generated image/video, especially when considering the greater costs involved.
Doubling down on coding was just infinitely smarter. Has there actually been a successful company which uses AI images and video effectively?
PedroBatista 7 hours ago
I get the feeling this also means AI works very well for the general coding tasks and that's their biggest success in terms of difficulty AND people paying for it.
Of course every AI company has been over promising and pumping the numbers as much as possible but OpenAI has been hitting the reality wall more because both their people not being able to keep improving at a faster rate and their whole cost structure and financial plates spinning.
This doesn't invalidate the fact Anthropic is also overhyped to the max for their IPO.
mlmonkey 5 hours ago
How long before Anthropic buys Google ?
paol_taja 5 hours ago
I use both, many times at the same time, on different projects.
One day one feels better than the other. Then, by the end of the day, the other feels better than the first. I have no idea why.
I still don’t have a favorite.
In the end, I think both are incredibly useful when I take the time to instruct them properly.
The problems come when I let them run wild.
zamadatix 6 hours ago
I've seen fewer people insisting OpenAI has a moat lately, but I'm still not sold the big winner will be either of these two in the end.
cheesecompiler 5 hours ago
This suggests that developers are the primary user base affecting valuation, not the average user, doesn't it? I don't know anyone among mortals who uses Claude. The spike does correlate with the exodus from OpenAI earlier in the year though.
r721 7 hours ago
qazinform.com seems to be shadow-banned (and posted only by OP): https://news.ycombinator.com/from?site=qazinform.com
iterateoften 8 hours ago
How much dilution? Who’s getting the value?
qwesak 7 hours ago
Bernie Madoff would be jealous. Stealing all open source and reselling "git clone" + "sed" for $1 trillion is something he did not achieve.
The chutzpah is remarkable.
dude250711 7 hours ago
They are selling shovels, not mining gold themselves though.
So it's more like selling a derivative on a promise to steal open source for you in a useful way.
lucamark 6 hours ago
Most people think the current valuation is for the models themselves. Actually, they're building the infrastructure for the next 50 years.
lelanthran 6 hours ago
> Actually, they're building the infrastructure for the next 50 years.
What infrastructure? The hardware would be outdated in 3 - 5 years, after all. What other infrastructure is needed for AI?
christophilus 6 hours ago
The dark fiber of our time?
alansaber 4 hours ago
Must be ridiculously easy for Anthropic to fill a round, even at that valuation
shevy-java 7 hours ago
All overvalued.
dude250711 7 hours ago
By an order of magnitude.
micromacrofoot 6 hours ago
tesla has been overvalued for almost a decade with little sign of slowing down, it really doesn't seem to matter anymore
setnone 6 hours ago
my take it's because of the naming: Amodei, Claude and Mythos have this money-throwing vibe to it
robot_jesus 7 hours ago
Pointless article (like much of the AI marketing hotness and spin room).
> The new valuation is nearly three times higher than the company’s February valuation, when Anthropic was estimated to be worth around $380 billion.
> In March, OpenAI was valued at $852 billion following a record $122 billion funding round.
Basically, today (Late May) we're declaring Anthropic the most valuable. They've nearly tripled in value since February. But also, OpenAI was $852B in March and presumably has grown since then.
In a few weeks we'll either have a new rounding of funding for OpenAI or they'll announce their IPO and the hype train will be abuzz that they're now the most valuable.
spacebacon 7 hours ago
Investors of both should read this: https://open.substack.com/pub/sublius/p/srt-introspect-why-c...
pingou 7 hours ago
"Investors who have poured hundreds of billions into closed-source labs are betting on an unprovable safety moat".
Nobody is investing in closed-source labs for safety reasons, being able to explore more in details what and how the model is thinking is nice but by no means a game changer. What matters to investors and most of the users is that the model gives the right answer at the end.
micromacrofoot 6 hours ago
they don't care, they're driving towards a cliff full speed and are all counting on jumping out at the right moment
spacebacon 6 hours ago
Yeah about that
icar 4 hours ago
The bubble keeps getting bigger!
iqandjoke 6 hours ago
Some says the founder worked at Baidu before. Is that true?
nullbio 6 hours ago
This is depressing. Anthropic really is the last company we want to see leading this race, given how greedy they are. Let's not forget all of the lying and gaslighting too. The creator of OpenClaw made this I believe: https://clawd.rip
Stealing peoples tokens because you use a product they don't like... That shows the morals they have. Actions speak louder than words. Disabling peoples caches because they disable telemetry was another juicy one that I don't believe is on this site. In fact there are far more I remember that aren't even listed here.
ChrisArchitect 3 hours ago
[dupe]
Main discussion:
Anthropic raises $65B in Series H funding at $965B post-money valuation
https://news.ycombinator.com/item?id=48313048
Other submitted more-common source reports on this from 2 days ago that didn't need traction because of the above discussion:
https://www.nytimes.com/2026/05/28/technology/anthropic-tops... (https://news.ycombinator.com/item?id=48315537)
https://www.theguardian.com/technology/2026/may/28/anthropic... (https://news.ycombinator.com/item?id=48321498)
https://www.wsj.com/tech/ai/anthropic-valuation-openai-80bf2... (https://news.ycombinator.com/item?id=48315537)
https://www.businessinsider.com/anthropic-surpasses-openai-w... (https://news.ycombinator.com/item?id=48316994)
frugalmail 7 hours ago
Bummer, they are the least friendly to open source, and the most incompatible with free use of your subscription via your own tools/custom harnesses.
lysace 7 hours ago
The models aside, my impression is that Anthropic is winning in large part because of very pragmatic and high-velocity product development on top of them; like with Claude Code.
Like actually iterating hard to make them useful. Many, many details matter here.
I haven't tested the similar OpenAI/Google tools in detail lately though. Previously I found them way too generic and unpolished to be useful.
Is there something to this?
wongarsu 7 hours ago
My impression as well. OpenAI was riding the high of ChatGPT with a very confusing and seemingly unfocused offering beyond that. Anthropic was always laser focused on business use cases. Claude Code being the big one. Finance seems to be their next target.
Anthropic has much narrower capabilities. No image generation, no video generation, no 3d world models, barely any voice stuff. But they know who their target customers are, and their API has a model selection anyone can understand and pricing that rarely changes. Focus and predictably
nzoschke 4 hours ago
My impression too.
Claude Desktop, Cowork, Code, Design all get meaningful new features week over week.
I can’t recall another vendor with such focus and velocity.
Google products are evolving at a glacial pace. OpenAI isn’t as focused on what knowledge workers need.
lysace 3 hours ago
Velocity: The closest is perhaps NCSA Mosaic/Netscape in 1993-1995. It's exhilarating to follow.
Both Google and OpenAI appear to be stuck in some abstract strategy where they keep shipping new demos instead of iterating on actual products.
smcl 5 hours ago
This is all dumb, it's like picking your fave cryptocurrency
gunju84 6 hours ago
I think claude is much better than Chatgpt
sergiotapia 6 hours ago
A sign of a bad developer is they cannot fathom switching from claude cli. In their mind it's claude or nothing. Despite things like codex existings. Opencode existing. Cursor + Compose 2.5 existing.
These are the new .net developers who will know nothing but c# for 20 years.
SilverElfin 6 hours ago
The headline is false. First off, OpenAI hasn’t raised a recent round so you can’t compare these two companies randomly like this. Second, Anthropic is known to have accounting methods that give it more revenue that they would have if they used the same practices as OpenAI. And neither of these companies are known to be doing gaap accounting.
m3kw9 7 hours ago
Either they are getting fleeced or they are getting very good terms for the investments
andrewstuart 7 hours ago
It’s because the programming works.
OpenAI. Spent its resources on AGI whilst Claude worked on making programming work.
Google Gemini is out of the race entirely its programming AI is a joke.
onesingleblast 4 hours ago
Gemini is probably one of the better AI's for programming, despite Google not making a decent official client to access it.
amazingamazing 7 hours ago
It is unclear which strategy will work in the end. 3.5 flash uses fewer tokens and is cheaper.
jmyeet 4 hours ago
The core product of AI is labor displacement and wage suppression. Why? To further concentrate wealth. Who are the most expensive employees? Software engineers have to be up there. So Anthropic have been smart here by focusing on a market with the highest potential value.
What I find fascinating is how many inside the bubble defend this for no other reason than they think they're personally going to make their bag out of AI. You're not Sam Altman. Or Elon Musk. Or Jeff Bezos. And you're not going to be.
What's going to happen is that in a few years the sky-high AI salaries are also going to disappear. More work will be done by fewer people in this space too. And only then will many people change their tune because the rising waters have finally reached them.
startpage_com 7 hours ago
Start what?
gunju84 6 hours ago
I think claude is much better than chatgpt
king_zee 8 hours ago
ChatGPT dropped the ball for a while that most devs and technical people went to Claude for a year or more, they still probably have the most normie market share + are at least trying to win back some of that delay in their latest model so it'd be interesting to see
tapoxi 7 hours ago
The "normie" market doesn't pay for enterprise features though. They might cost more in inference then they make back from advertising.
ianberdin 5 hours ago
The real problem is this: no cheap model right now produces a genuinely beautiful, usable UI when it comes to website building. Not one.
And here’s the core tension. The models keep getting better. GPT 5.5 improved. But it also got more expensive. Opus 4.7 to 4.8 has become outrageously priced too, up 50%, and 4.6 was already brutally expensive to begin with. API pricing is a real pain.
What’s missing is any meaningful supply of affordable, democratically priced models you can actually embed into your own service. For me that’s playcode.io, whether it’s the website builder or the app builder. The moment we give users access to these models, the cost becomes a serious blocker. There’s no way around it.
The same dynamic explains Cursor. Why did they go build their own Composer 2.5 model? Because relying on third-party models is simply too expensive for users unless they’re carrying a Claude Code or Codex subscription. So Cursor had to roll their own. It’s a real mess, honestly.
And Chinese models don’t close the gap either. They’ve improved, the free-tier ones especially, which is great to see. But the limitations are significant:
• No multimodality. They don’t accept image input. • You can’t attach a screenshot, show a UI, or hand it a PDF. • They feel heavily stripped down overall. • They’re just not polished. Not even close.
Opus, by contrast, feels like a finished, deeply refined product. Everything else is still rough around the edges. And that’s exactly why Anthropic can charge what they charge: because they actually deliver. That’s the whole problem in a sentence.
esafak 2 hours ago
I don't see a problem as long as Chinese companies are not banned; expensive providers like Anthropic will prove that value is being provided, encouraging competition, which will take care of problems.