Hacker News

by Ryan Harman

Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving (qwen.ai)

637 points by mfiguiere a day ago

alex7o 20 hours ago

Ok I find it funny that people compare models and are like, opus 4.7 is SOTA and is much better etc, but I have used glm 5.1 (I assume this comes form them training on both opus and codex) for things opus couldn't do and have seen it make better code, haven't tried the qwen max series but I have seen the local 122b model do smarter more correct things based on docs than opus so yes benchmarks are one thing but reality is what the modes actually do and you should learn and have the knowledge of the real strengths that models posses. It is a tool in the end you shouldn't be saying a hammer is better then a wrench even tho both would be able to drive a nail in a piece of wood.

mikenew 13 hours ago

GLM 5.1 was the model that made me feel like the Chinese models had truly caught up. I cancelled my Claude Max subscription and genuinely have not missed it at all.

Some people seem to agree and some don't, but I think that indicates we're just down to your specific domain and usage patterns rather than the SOTA models being objectively better like they clearly used to be.

operatingthetan 13 hours ago

It seems like people can't even agree which SOTA model is best at any given moment anymore, so yeah I think it's just subjective at this point.

fwipsy 13 hours ago

hamdingers 11 hours ago

ulfw 6 hours ago

vidarh 39 minutes ago

I feel like it's Sonnet level for implementation, but not matching up to Opus for planning.

But I agree it's close enough that it's worth using heavily. I've not cancelled my Claude Max subscription, but I've added a z.ai subscription...

mettamage 4 hours ago

Hmm

Will try it out. Thanks for sharing!

abustamam 12 hours ago

What is your workflow? Do you use Cursor or another tool for code Gen?

mikenew 6 hours ago

LoganDark 12 hours ago

The value in Claude Code is its harness. I've tried the desktop app and found it was absolutely terrible in comparison. Like, the very nature of it being a separate codebase is already enough to completely throw off its performance compared to the CLI. Nuts.

deaux 10 hours ago

Mashimo 6 hours ago

bink-lynch 12 hours ago

I have been using GLM-5.1 with pi.dev through Ollama Cloud for my personal projects and I am very happy with this setup. I use pi.dev with Claude Sonnet/Opus 4.6 at work. Claude Code is great but the latest update has me compacting so much more frequently I could not stand it. I don't miss MCP tool calling when I am using pi.dev; it uses APIs just fine. I actually think GML-5.1 builds better websites than Claude Opus. For my personal projects I am building a full stack development platform and GLM-5.1 is doing a fantastic job.

zackify 9 hours ago

I'm using pi the same as you. However, I have an MCP I need to use and the popular extension for that support works fine for me.

Really liking pi and glm 5.1!

jadbox 11 hours ago

Why use ollama cloud versus like Openrouter?

bink-lynch 5 hours ago

zackify 9 hours ago

vidarh 40 minutes ago

I don't find GLM 5.1 beating Opus personally, but I do think it is good enough to consider it part of the SOTA pack at this point. It feels like it needs more time and tokens to achieve things, but that's okay - it's so much cheaper per token.

If Qwen3.6-Max is up there as well, it will be very interesting.

jxmesth 18 hours ago

The only reason I'm stuck with Claude and Chatgpt is because of their tool calling. They do have some pretty useful features like skills etc. I've tried using qwen and deepseek but they can't even output documents. How are you guys handling documents and excels with these tools? I'd love to switch tbh.

embedding-shape 18 hours ago

> I've tried using qwen and deepseek but they can't even output documents

What agent harness did you use? Usually, "write_file", "shell_exec" or similar is two of the first tools you add to an agent harness, after read_file/list_files. If it doesn't have those tools, unsure if you could even call it a agent harness in the first place.

jxmesth 18 hours ago

chillfox 11 hours ago

ecocentrik 18 hours ago

When was the last time you used Qwen models? Their 3.5 and 3.6 models are excellent with tool calling.

jxmesth 18 hours ago

sscaryterry 17 hours ago

You can use GLM-5.1 with claude code directly, I use ccs, GLM-5.1 setup as plan, but goes via API key.

zrn900 2 hours ago

You can just use Cline in VSCode to get most of the tooling you need - it works with all models. Including Xiaomi's new Mimo with 1m context window and blazing fast speed. It's much cheaper than Claude's biggest plan and with much, much more quota.

NobleLie 13 hours ago

Yep Claude Code CLI does A LOT (which is now confirmed even more)

jwitthuhn 18 hours ago

I've been using qwen-code (the software, not to be confused with Qwen Code the service or Qwen Coder the model) which is a fork of gemini-cli and the tool use with Qwen models at least has been great.

ycui1986 12 hours ago

qwen3.5 and qwen3.6 are both good at tool calling.

estimator7292 16 hours ago

You can use both codex and Claude CLI with local models. I used codex with Gemma4 and it did pretty well. I did get one weird session where the model got confused and couldn't decide which tools actually existed in its inventory, but usually it could use tools just fine.

Moosdijk 18 hours ago

I wonder why glm is viewed so positively.

Every time I try to build something with it, the output is worse than other models I use (Gemini, Claude), it takes longer to reach an answer and plenty of times it gets stuck in a loop.

pkulak 18 hours ago

I've been running Opus and GLM side-by side for a couple weeks now, and I've been impressed with GLM. I will absolutely agree that it's slow, but if you let it cook, it can be really impressive and absolutely on the level of Opus. Keep in mind, I don't really use AI to build entire services, I'm mostly using it to make small changes or help me find bugs, so the slowness doesn't bother me. Maybe if I set it to make a whole web app and it took 2 days, that would be different.

The big kicker for GLM for me is I can use it in Pi, or whatever harness I like. Even if it was _slightly_ below Opus, and even though it's slower, I prefer it. Maybe Mythos will change everything, but who knows.

tasuki 17 hours ago

Mashimo 18 hours ago

I have used GLM 4.7, 5 and 5.1 now for about 3 month via OpenCode harness and I don't remember it every being stuck in a loop.

You have to keep it below ~100 000 token, else it gets funny in the head.

I only use it for hobby projects though. Paid 3 EUR per month, that is not longer available though :( Not sure what I will choose end of month. Maybe OpenCode Go.

Mashimo 4 hours ago

gck1 14 hours ago

chillfox 3 hours ago

GLM is the first open source model that actually worked for me, where I found the output ok.

And yes, sonnet/opus is better and what I use daily. But I wouldn’t be that upset if I had to drop down to GLM.

Akira1364 18 hours ago

IDK about GLM but GPT 5.4 Extra High has been great when I've used it in the VS Code Copilot extension, I see no actual reason Opus should consume 3x more quota than it the way it does

spaceman_2020 16 hours ago

I think it offers a very good tradeoff of cost vs competency

4.7 is better, but its also wildly expensive

slopinthebag 18 hours ago

You're probably just holding it wrong.

blurbleblurble 7 hours ago

Opus 4.6 was incredible but Opus 4.7 is genuinely frustrating to me so far. It's really sharp but can be so lazy. It's constantly telling me that we should save this for tomorrow, that it's time for bed (in the middle of the day), and very often quite sloppy and bold in its action. These adjustments are getting old. The next crop of open models seems ready to practically replace the big ones as sharp orchestrator agents.

chillfox 3 hours ago

I have never seen a model be “lazy” before (I have seen them go for minimal change). I have been using the models through the api with various agents and no custom system prompt.

So I am curious, how do people get these lazy outputs?

Is it by having one of those custom system prompts that basically tells the model to be disrespectful?

Or is it free tier?

Cheap plans?

enraged_camel 3 hours ago

ternaryoperator 19 hours ago

The models test roughly equal on benchmarks, with generally small differences in their scores. So, it’s reasonable to choose the model based on other criteria. In my case, I’d switch to any vendor that had a decent plugin for JetBrains.

ezekiel68 18 hours ago

Qwen3-Coder produced much better rust code (that utilized rust's x86-64 vectorized extensions) a few months ago than Claude Opus or Google Gemini could. I was calling it from harnesses such as the Zed editor and trae CLI.

I was very impressed.

gck1 14 hours ago

I think claude in general, writes very lazy, poor quality code, but it writes code that works in fewer iterations. This could be one of the reasons behind it's popularity - it pushes towards the end faster at all costs.

Every time codex reviews claude written rust, I can't explain it, but it almost feels like codex wants to scream at whoever wrote it.

lambda 11 hours ago

Their latest, Qwen3.6 35B-A3B is quite capable, and fast and small enough I don't really feel constrained running it locally. Some of the others that I've run that seem reasonably good, like Gemma 4 31B and Qwen3.5 122B-A10B just feel a bit too slow, or OOM my system too often, or run up on cache limits so spend a lot of time re-processing history. But the latest Qwen3.6 is both quite strong, and lightweight enough that it feels usable on consumer hardware.

justincormack 17 hours ago

Codex is pretty good at Rust with x86 and arm intrinsics too, it replaced a bunch of hand written C/assembly code I was using. I will try Qwen and Kimi on this kind of task too.

sirnicolaz 17 hours ago

Consider that SWE benchmarking is mainly done with python code. It tells something

cornedor 19 hours ago

I tried GLM and Qwen last week for a day. And some issues it could solve, while some, on surface relatively easy, task it just could not solve after a few tries, that Opus oneshotted this morning with the same prompt. It’s a single example ofcourse, but I really wanted to give it a fair try. All it had to do was create a sortable list in Magento admin. But on the other hand, GLM did oneshot a phpstorm plugin

dev_l1x_be 18 hours ago

Do you use Opus through the API or with subscription? Did you use OpenCode or Code?

cornedor 17 hours ago

odie5533 16 hours ago

If you showed me code from GLM 5.1, Opus 4.6, and Kimi K2.6, my ranking for best model would be highly random.

mkhalil 5 hours ago

Not to mention, that Opus cost orders of magnitude more money. These are VERY impressive and usage.

FAANGS love to give away money to get people addicted to their platforms, and even they, the richest companies in the world, are throttling or reducing Opus usage for paying members, because even the money we pay them doesn't cover it.

Meanwhile, these are usable on local deployments! (and that's with the limited allowance our AI overlords afford us when it comes to choices for graphics cards too!)

FlyingSnake 19 hours ago

I tried GLM5.1 last week after reading about it here. It was slow as molasses for routine tasks and I had to switch back to Claude. It also ran out of 5H credit limit faster than Claude.

bensyverson 19 hours ago

If you view the "thinking" traces you can see why; it will go back and forth on potential solutions, writing full implementations in the thinking block then debating them, constantly circling back to points it raised earlier, and starting every other paragraph with "Actually…" or "But wait!"

nothinkjustai 19 hours ago

FlyingSnake 19 hours ago

nothinkjustai 19 hours ago

Z.ai’s cloud offering is poor, try it with a different provider.

complexworld 7 hours ago

dev_l1x_be 18 hours ago

Benchmarking is grossly misleading. Claude’s subscription with Code would not score this high on the benchmarks because how they lobotomized agentic coding.

solomatov 18 hours ago

>but I have seen the local 122b model do smarter more correct things based on docs than opus

Could you please share more about this

alex7o 15 hours ago

Maybe a bit misleading. I have used in in two places.

One Is for local opencode coding and config of stuff the other is for agent-browser use and for both it did better (opus 4.6) for the thing I was testing atm. The problem with opus at the moment I tired it was overthinking and moving itself sometimes I the wrong direction (not that qwen does overthink sometimes). However sometimes less is more - maybe turning thinking down on opus would have helped me. Some people said that it is better to turn it of entirely when you start to impmenent code as it already knows what it needs to do it doesn't need more distraction.

Another example is my ghostty config I learned from queen that is has theme support - opus would always just make the theme in the main file

OtomotO 20 hours ago

Many people averted religion (which I can get behind with), but have never removed the dogmatic thinking that lay at its root.

As so many things these days: It's a cult.

I've used Claude for many months now. Since February I see a stark decline in the work I do with it.

I've also tried to use it for GPU programming where it absolutely sucks at, with Sonnet, Opus 4.5 and 4.6

But if you share that sentiment, it's always a "You're just holding it wrong" or "The next model will surely solve this"

For me it's just a tool, so I shrug.

balls187 20 hours ago

> I've used Claude for many months now. Since February I see a stark decline in the work I do with it.

I find myself repeating the following pattern: I use an AI model to assist me with work, and after some time, I notice the quality doesn't justify the time investment. I decide to try a similar task with another provider. I try a few more tests, then decide to switch over for full time work, and it feels like it's awesome and doing a good job. A few months later, it feels like the model got worse.

runarberg 19 hours ago

e12e 19 hours ago

taurath 20 hours ago

I agree - the problem is it’s hard to see how people who say they’re using it effectively actually are using it, what they’re outputting, and making any sort of comparison on quality or maintainability or coherence.

In the same way, it’s hard to see how people who say they’re struggling are actually using it.

There’s truth somewhere in between “it’s the answer to everything” and “skill issue”. We know it’s overhyped. We know that it’s still useful to some extent, in many domains.

balls187 19 hours ago

psychoslave 20 hours ago

What is it that is dogma free? If one goes hardcore pyrrhonism, doubting that there is anything currently doubting as this statement is processed somehow, that is perfectly sound.

At some point the is a need to have faith in some stable enough ground to be able to walk onto.

Wolfbeta 18 hours ago

ecshafer 20 hours ago

All people think dogmatically. The only difference is what the ontological commitments and methaphysical foundations are. Take out God and people will fit politics, sports teams, tools, whatever in there. Its inescapable.

smallmancontrov 18 hours ago

bensyverson 19 hours ago

OtomotO 19 hours ago

taneq 19 hours ago

I wonder to what degree it depends on how easy you find coding in general. I find for the early steps genAI is great to get the ball rolling, but rapidly it becomes more work to explain what it did wrong and how to fix it (and repeat until it does so) than to just fix the code myself.

slopinthebag 11 hours ago

ninjahawk1 21 hours ago

The way to develop in this space seems to be to give away free stuff, get your name out there, then make everything proprietary. I hope they still continue releasing open weights. The day no one releases open weights is a sad day for humanity. Normal people won’t own their own compute if that ever happens.

culi 20 hours ago

I think that's an overgeneralization. We've seen all the American models be closed and proprietary from the start. Meanwhile the non-American (especially the Chinese ones) have been open since the start. In fact they often go the opposite direction. Many Chinese models started off proprietary and then were later opened up (like many of the larger Qwen models)

robot_jesus 20 hours ago

> We've seen all the American models be closed and proprietary from the start

What about Gemma and Llama and gpt-oss, not to mention lots of smaller/specialized models from Nvidia and others?

I would never argue that China isn't ahead in the open weights game, of course, but it's not like it's "all" American models by any stretch.

walthamstow 20 hours ago

InkCanon 6 hours ago

1dom 5 hours ago

embedding-shape 20 hours ago

> We've seen all the American models be closed and proprietary from the start.

Most*.

OpenAI, contrary to popular belief, actually used to believe in open research and (more or less) open models. GPT1 and GPT2 both were model+code releases (although GPT2 was a "staged" release), GPT3 ended up API-only.

culi 20 hours ago

zozbot234 20 hours ago

3836293648 2 hours ago

GPT started off open? They just closed before anyone else even joined the space

visarga 21 hours ago

I think it is in the interest of chip makers to make sure we all get local models

qalmakka 20 hours ago

I think they're in a win-win situation. Big AI companies would love to see local computing die in favour of the cloud because they are well aware the moment an open model that can run on non ludicrous consumer hardware appears, they're screwed. In this situation Nvidia, AMD and the like would be the only ones profiting from it - even though I'm not convinced they'd prefer going back to fighting for B2C while B2B Is so much simpler for them

zozbot234 20 hours ago

BobbyJo 20 hours ago

ycui1986 12 hours ago

zozbot234 21 hours ago

Definitely. Many big hardware firms are directly supporting HuggingFace for this very reason.

ninjahawk1 21 hours ago

True, chip companies have the opposite mindset, Nvidia is making their own open weights I believe

elorant 20 hours ago

This is obviously a strategic move at a national level. Keep publishing competing free models to erode the moat western companies could have with their proprietary models. As long as the narrative serves China there will be no turn to proprietary models.

Barrin92 10 hours ago

>This is obviously a strategic move at a national level.

no it isn't. That's the kind of thing people say who've never worked in the Chinese software ecosystem. It's how the Chinese internet has worked for 20+ years. The Chinese market is so large and competition is so rabid that every company basically throws as much free stuff at consumers as they can to gain users. Entrepreneurs don't think about "grand strategic moves at the national level" while they flip through their copies of the Art of War and Confucius lol

elorant 3 hours ago

stingraycharles 10 hours ago

That has been a viable commercial strategy for most modern, funded businesses. Capture market share at a loss, then once name is established turn on the profit.

try-working 13 hours ago

Exactly. Open source is a commercial strategy for Chinese labs. They have no other effective way of marketing their models and inference services: https://try.works/writing-1#why-chinese-ai-labs-went-open-an...

baq 21 hours ago

Always has been, it’s literally saas; the slight difference is that the lowest tier subscriptions at the frontier labs are basically free trials nowadays, too

Zavora 20 hours ago

Its the new freeware model!

CamperBob2 20 hours ago

I'm a little more optimistic than that. I suspect that the open-weight models we already have are going to be enough to support incremental development of new ones, using reasonably-accessible levels of compute.

The idea that every new foundation model needs to be pretrained from scratch, using warehouses of GPUs to crunch the same 50 terabytes of data from the same original dumps of Common Crawl and various Russian pirate sites, is hard to justify on an intuitive basis. I think the hard work has already been done. We just don't know how to leverage it properly yet.

thesz 20 hours ago

Change layer size and you have to retrain. Change number of layers and you have to retrain. Change tokenization and you have to retrain.

altruios 19 hours ago

CamperBob2 19 hours ago

dTal 19 hours ago

pduggishetti 20 hours ago

I do not think it's common crawl anymore, its common crawl++ using paid human experts to generate and verify new content, weather its code or research.

I believe US is building this off the cost difference from other countries using companies like scale, outlier etc, while china has the internal population to do this

testbjjl 21 hours ago

Any reason for them to do this other than altruism? I don’t think this can be regulated.

Rohansi 20 hours ago

Bake ads into them.

WarmWash 20 hours ago

The Chinese state wants the world using their models.

People think that Chinese AI labs are just super cool bros that love sharing for free.

The don't understand it's just a state sponsored venture meant to further entrench China in global supply and logistics. China's VCs are Chinese banks and a sprinkle of "private" money. Private in quotes because technically it still belongs to the state anyway.

China doesn't have companies and government like the US. It just has government, and a thin veil of "company" that readily fool westerners.

subw00f 20 hours ago

As opposed to the US, which just has companies and a thin veil of “government”.

culi 20 hours ago

zozbot234 20 hours ago

I'm not sure how local AI models are meant to "entrench China in global supply and logistics". The two areas have nothing to do with one another. You can easily run a Chinese open model on all-American hardware.

WarmWash 20 hours ago

devilsdata 14 hours ago

I'm Aussie. Please explain to me; why should I care whether Chinese SOEs or the US tech companies are winning? Neither have my best interests at heart.

jillesvangurp 20 hours ago

Like with nuclear technology, it's not healthy for only one country to dominate AI. The cat is already out of the bag and many countries now have the ability to train and run models. Silicon Valley has bootstrapped this space. But it should be noted that they are using AI talent from all over the world and it was sort of inevitable that this technology would get around. Lots of Chinese, Indian, Russian, and Europeans are involved.

As for what comes next, it's probably going to be a bit of a race for who can do the most useful and valuable things the cheapest. If OpenAI and Anthropic don't make it, the technology will survive them. If they do, they'll be competing on quality and cost.

As for state sponsorship, a lot of things are state sponsored. Including in the US. Silicon Valley has a rich history that is rooted in massive government funding programs. There's a great documentary out there the secret history of Silicon Valley on this. Not to mention all the "cheap" gas that is currently powering data centers of course comes on the back of a long history of public funding being channeled into the oil and gas industry.

WarmWash 20 hours ago

OtomotO 20 hours ago

So an OPEN model that I can run on my own fucking hardware will entrench China in global supply and logistics how?

Contrary: How will the closed, proprietary models from Anthropic, "Open"AI and Co. lead us all to freedom? Freedom of what exactly? Freedom of my money?

At some point this "anti-communism" bullshit propaganda has to stop. And that moment was decades ago!

Zetaphor 20 hours ago

grttsww 20 hours ago

So what?

I still prefer that over US total dominance.

Let them fight it out.

joquarky 19 hours ago

spwa4 20 hours ago

darkwater 20 hours ago

Well, isn't this what the US and really any other power in the world has always done, since forever?

ai_fry_ur_brain 19 hours ago

Why is it sad? These things are useles all around, along with the people who overuse them.

It would be a great day for humanity if people would stopping glazing text autocomplete as revolutionary.

seanw265 17 hours ago

Kimi K2.6 also released today. I think it's fair to compare the two models.

Qwen appears to be much more expensive:

- Qwen: $1.3 in / $7.8 out

- Kimi: $0.95 in / $4 out

The announcement posts only share two overlapping benchmark results. Qwen appears to score slightly lower on SWE-Bench Pro and Terminal-Bench 2.0.

Qwen:

- Teminal-Bench 2.0: 65.4

- SWE-Bench Pro: 57.3

Kimi:

- Terminal-Bench 2.0: 66.8

- SWE-Bench Pro: 58.6

Different models have different strong suits, and benchmarks don't cover everything. But from a numbers perspective, Kimi looks much more appealing.

archon810 11 hours ago

I wonder if this means a better Cursor Composer model update is coming, since it builds on top of Kimi K2.

mchusma 15 hours ago

i think as the pricing has gone up on the Chinese models it has made them less appealing, and with the introduction of Gemma-4 not many are at the pareto frontier (also in my experience, not just the stats): https://arena.ai/leaderboard/text/overall?viewBy=plot

0xbadcafebee 21 hours ago

Everybody's out here chasing SOTA, meanwhile I'm getting all my coding done with MiniMax M2.5 in multiple parallel sessions for $10/month and never running into limits.

Aurornis 21 hours ago

For serious work, the difference between spending $10/month and $100/month is not even worth considering for most professional developers. There are exceptions like students and people in very low income countries, but I’m always confused by developers with in careers where six figure salaries are normal who are going cheap on tools.

I find even the SOTA models to be far away from trustworthy for anything beyond throwaway tasks. Supervising a less-than-SOTA model to save $10 to $100 per month is not attractive to me in the least.

I have been experimenting with self hosted models for smaller throwaway tasks a lot. It’s fun, but I’m not going to waste my time with it for the real work.

zozbot234 21 hours ago

You need to supervise the model anyway, because you want that code to be long-term maintainable and defect free, and AI is nowhere near strong enough to guarantee that anytime soon. Using the latest Opus for literally everything is just a huge waste of effort.

senordevnyc 18 hours ago

dandaka 21 hours ago

0xbadcafebee 13 hours ago

You don't magically get better results by spending 10x more on a model. If your prompt is crap and harness is crap, you get crap results, regardless of model. And if you run into limits, you aren't working at all.

Buying the most expensive circular saw doesn't get you the best woodworking, but it is the most expensive woodworking.

itake 11 hours ago

slopinthebag 16 hours ago

$100 / month will get you rate limited to much to rely on with the Claude plans. People still report getting rate limited on the $200 / plan.

Also not everyone wants to use Claude Code, so if they're paying API pricing it's more likely thousands of dollars a month. If you can get the same results by spending a fraction of that, why wouldn't you?

chillfox 2 hours ago

esperent 10 hours ago

gck1 14 hours ago

AnonymousPlanet 20 hours ago

For actually serious work, it's a stark difference if your proprietary and security relevant code is sent abroad to a foreign, possibly future hostile country, or is sent to some data center around the corner. It doesn't even need to be defence related.

flatline 19 hours ago

chatmasta 19 hours ago

Who are you paying $10/month? OpenRouter?

0xbadcafebee 13 hours ago

OpenCode Go, BlackBox, Chutes. https://codeberg.org/mutablecc/calculate-ai-cost/src/branch/...

chatmasta 12 hours ago

tgrowazay 18 hours ago

https://platform.minimax.io/docs/guides/pricing-token-plan

xutopia 18 hours ago

How do you use this? Do you use opencode or another frontend?

0xbadcafebee 12 hours ago

yep, OpenCode with a few plugins (context management, memory, a few MCPs)

jjice a day ago

With them comparing to Opus 4.5, I find it hard to take some of these in good faith. Opus 4.7 is new, so I don't expect that, but Opus 4.6 has been out for quite some time.

SwellJoe 20 hours ago

The thing is, Opus 4.5 is where the model reached Good Enough, at least for a wide variety of problems I use LLMs for. Before that, I almost never thought it was a more productive use of my time to use AI for development tasks, because it would always hallucinate something that would waste a bunch of my time. It just wasn't a good trade.

But, if for some reason everything stopped at Opus 4.5 level and we never got a better model (and 4.6/4.7 are better, if only marginally so and mostly expanding the kind of work it can do rather than making it better at making web apps), we could still do a lot of real work real fast with Opus 4.5, and software development would never go back to everyone handwriting most of the code.

A model as good as Opus 4.5 (or slightly better according to the mostly easily gamed benchmarks) at a 10th the price is probably a worthwhile proposition for a lot of people. $100 a month, or more, to get Opus 4.7 is well worth it for a western developer...the time the lower-end models waste is far more expensive than the cost of using the most expensive models. For the foreseeable future, I'll keep paying a premium for the models that waste less of my time and produce better results with less prodding.

But, also, it's wild how fast things move. Open models you can run on relatively modest hardware are competitive with frontier models of two years ago. I mean, you can run Qwen 3.6 MoE 35B A3B or the larger Gemma 4 models on normal hardware, like a beefy Macbook or a Strix Halo or any recentish 24GB/32GB GPU...not much more expensive than the average developer laptop of pre-AI times. And, it can write code. It can write decent prose (Qwen is maybe better at code, Gemma definitely has better prose), they can use tools, they have a big enough context window for real work. They aren't as good as Opus 4.5, yet.

Anyway, I use several models at this point, for security and code reviews, even if Claude Code with Opus is still obviously the best option for most software development tasks. I'll give Qwen a try, too. I like their small models, which punch well above their weight, I'll probably like the big one, too.

Someone1234 a day ago

If money is no object, then nothing else is worth considering if it isn't Codex 5.4/Opus 4.7/SOTA. But for many to most people, value Vs. relative quality are huge levers.

Even many people on a Claude subscription aren't choosing or able to choose Opus 4.7 because of those cost/usage pressures. Often using Sonnet or an older opus, because of the value Vs. quality curve.

dd8601fn 21 hours ago

Also us weirdos with local model uses. But your point stands.

seplite 21 hours ago

CamperBob2 21 hours ago

Cost may or may not be a factor in my choice of model, but knowing the capabilities and knowing they will remain consistent, reliable, and available over time is always a dominant consideration. Lately, Anthropic in particular has not been great at that.

jpfromlondon 20 hours ago

anecdotally the quality of output isn't significantly different, the speed seems to be what you're really paying for, and since the alternative is free I'll stick to local.

paprikanotfound 19 hours ago

elAhmo 20 hours ago

Codex 5.4 is not out?

wahnfrieden a day ago

Codex subscription is very generous at pro tiers

oidar 21 hours ago

Opus 4.6 performance has been so wildly inconsistent over the past couple of months, why waste the tokens?

vidarh 21 hours ago

When Sonnet 4.6 was released, I switchmed my default from Opus to Sonnet because it was about en par with Opus 4.5. While 4.6 and 4.7 are "better", the leap is too small for most tasks for me to need it, and so reducing cost is now a valid reason to stay at that level.

If even cheaper models start reaching that level (GLM 5.1 is also close enough that I'm using it at lot), that's a big deal, and a totally valid reason to compare against Opus 4.5

jasonjmcghee 20 hours ago

Wow I couldn't disagree more.

For me, Opus 4.5 and 4.6 feel so different compared to sonnet.

Maybe I'm lazy or something but sonnet is much worse in my experience at inferring intent correctly if I've left any ambiguity.

That effect is super compounding.

hirako2000 a day ago

You compare with what's most comparable.

In any case a benchmark provided by the provider is always biased, they will pick the frameworks where their model fares well. Omit the others.

Independent benchmarks are the go to.

culi 20 hours ago

Opus 4.6 was released in February. It can take quite some time to run all these benchmarks properly

alex_young 21 hours ago

Quite some time is a little over 2 months. I understand this is actually true right now, but it’s still a bit hard to accept.

cute_boi 20 hours ago

Comparing it with Opus 4.6 is difficult, since Anthropic may ban accounts and accuse users of state-sponsored hacking.

bluegatty 21 hours ago

I think its only been like 10 weeks. I meant that's forever in AI time, but not a long time in normie people time.

jdw64 19 hours ago

https://www.alibabacloud.com/help/en/model-studio/context-ca... I’ve also been testing models like Opus, Codex, and Qwen, and Qwen is strong in many coding tasks. However, my main concern is how it behaves in long-running sessions.

While Qwen advertises large context windows, in practice the effectiveness of long-context usage seems to depend heavily on its context caching behavior. According to the official documentation, Qwen provides both implicit and explicit context caching, but these come with constraints such as short TTL (around a few minutes), prefix-based matching, and minimum token thresholds.

Because of these constraints, especially in workflows like coding agents where context grows over time, cache reuse may not scale as effectively as expected. As a result, even though the per-token price looks low, the effective cost in long sessions can feel higher due to reduced cache hit rates and repeated computation.

That said, in certain areas such as security-related tasks, I’ve personally had cases where Qwen performed better than Opus.

In my personal experience, Qwen tends to perform much better than Opus on shorter units like individual methods or functions. However, when looking at the overall coding experience, I found it works better as a function-level generator rather than as an autonomous, end-to-end coding assistant like Claude.

ezekiel68 17 hours ago

TBF, it's certainly best practice, advised by the model providers themselves, to cut sessions short and start new ones.

Anthropic's "Best Practices" doc[0] for Claude Code states, "A clean session with a better prompt almost always outperforms a long session with accumulated corrections."

[0] https://code.claude.com/docs/en/best-practices

hedora 16 hours ago

Unless stuff changed since I last checked, context caching just reduces cost / latency. It does not change what tokens are emitted.

fr3on 16 hours ago

The irony of this announcement is in the name: Max-Preview is proprietary, cloud-only. The Qwen models that actually matter — the ones running on real hardware people own — are the open weights series. I run the 32B and 72B variants locally on dual A4000s. The gap between those and the hosted Max is real, but it's shrinking with every release. The interesting question isn't how Max compares to Opus. It's how long until the open-weight tier makes the cloud tier irrelevant for most workloads.

greyskull 12 hours ago

I've been using Claude Code regularly at work for several months, and I successfully used it for a small personal project (a website) not long ago. Last weekend, I explored self-hosting for the first time.

Does anyone have a similar experience of having thoroughly used CC/Codex/whatever and also have an analogous self-hosted setup that they're somewhat happy with? I'm struggling a bit.

I have 32GB of DDR5 (seems inadequate nowadays), an AMD 7800X3D, and an RTX 4090. I'm using Windows but I have WSL enabled.

I tried a few combinations of ollama, docker desktop model runner, pi-coding-agent and opencode; and for models, I think I tried a few variants each of Gemma 4, Qwen, GLM-5.1. My "baseline" RAM usage was so high from the handful of regular applications that IIRC it wasn't enough to use the best models; e.g., I couldn't run Gemma4-31B.

Things work okay in a Windows-only setup, though the agent struggled to get file paths correct. I did have some success running pi/opencode in WSL and running ollama and the model via docker desktop.

In terms of actual performance, it was painfully slow compared to the throughput I'm used to from CC, and the tooling didn't feel as good as the CC harness. Admittedly I didn't spend long enough actually using it after fiddling with setup for so long, it was at least a fun experiment.

ihowlatthemoon 8 hours ago

I run a setup similar to yours and I've had the best results with Qwen3.5 27B. Specifically the Q4_K_M variant. https://unsloth.ai/docs/models/qwen3.5

I use llama-server that comes with llama.cpp instead of using ollama. Here are the exact settings I use.

llama-server -ngl 99 -c 192072 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --host 0.0.0.0 --sleep-idle-seconds 300 -m Qwen3.5-27B-Q4_K_M.gguf

greyskull 7 hours ago

Thanks, I'll have to continue experimenting. I just ran this model Qwen3.6-35B-A3B-GGUF:UD-Q4_K_XL and it works, but if gemini is to be believed this is saturating too much VRAM to use for chat context.

How did you land on that model? Hard to tell if I should be a) going to 3.5, b) going to fewer parameters, c) going to a different quantization/variant.

I didn't consider those other flags either, cool.

Are you having good luck with any particular harnesses or other tooling?

ihowlatthemoon 2 hours ago

martinald 12 hours ago

Try using a MoE model (like Gemma 4 26b-a4b or qwen3.6 35b-a3b) and offload the inference to CPU. If you have enough system RAM (32GB is a bit tight tbh depending on other apps) then this works really well. You may be able to offload some layers to GPU as well though I've had issues with this in MoE models and llama.cpp.

You can keep the KV cache on GPU which means it's pretty damn fast and you should be able to hold a reasonable context window size (on your GPU).

I've had really impressive results locally with this.

I'd strongly recommend cloning llama.cpp locally btw (in wsl2) and asking a frontier model in eg Claude code to set it up for you and tweak it. In my experience the apps that sit on top of llama.cpp don't expose all the options and flags and one wrong flag can mean terrible performance (eg context windows not being cached). If you compile it from source with a coding agent it can look up the actual code when things go wrong.

You should be able to get at least 20-40tok/s on that machine on Gemma 4 which is very usable, probabaly faster on qwen3.6 since it's only 3b active params.

greyskull 11 hours ago

Thanks! These things you're mentioning like "You may be able to offload some layers to GPU...", "You can keep the KV cache on GPU..." configured as part of the llama.cpp? I wouldn't know what to prompt with or how to evaluate "correctness" (outside of literally feeding your comment into claude and seeing what happens).

Aside: what is your tooling setup? Which harness you're using (if any), what's running the inference and where, what runs in WSL vs Windows, etc.

I struggle to even ask the right questions about the workflow and environment.

Ey7NFZ3P0nzAe 6 hours ago

In my case, I was also running an ASR model and a TTS model so it was a bit much for my RTX 3090. I opted to offset like 5 layers to the cpu while adding a GPU-only speculative decoding with their 0.8B model.

Working well so far.

madtowneast 12 hours ago

You are experiencing the fact that you might not have enough VRAM to load the entire model at a time. You might want to try https://github.com/AlexsJones/llmfit

greyskull 11 hours ago

It's certainly part of the problem. Thanks, I'll give that a shot.

daemonologist 11 hours ago

First of all nothing you can run locally, on that machine anyways, is going to compare with Opus. (Or even recent Sonnet tbh - some small models benchmark better but fall off a bit in the real world.) This will get you close to like ~Sonnet 4 though:

Grab a recent win-vulkan-x64 build of llama.cpp here: https://github.com/ggml-org/llama.cpp/releases - llama.cpp is the engine used by Ollama and common wisdom is to just use it directly. You can try CUDA as well for a speedup but in my experience Vulkan is most likely to "just work" and is not too far behind in speed.

For best quality, download the biggest version of Qwen 3.5 27B you can fit on your 4090 while still leaving room for context and overhead: https://huggingface.co/unsloth/Qwen3.5-27B-GGUF - I would try the UD-Q5_K_XL but you might have to drop down to Q5_K_S. For best speed, you could use Qwen 3.6 35B-A3B (bigger model but fewer parameters are active per token): https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF - probably the UD-Q4_K_S for this one.

Now you need to make sure the whole model is fitting in VRAM on the 4090 - if anything gets offloaded to system memory it's going to slow way down. You'll want to read the docs here: https://github.com/ggml-org/llama.cpp/tree/master/tools/serv... (and probably random github issues and posts on r/localllama as well), but to get started:

  llama-server -m /path/to/above/model/here.gguf --no-mmap --fit on --fit-ctx 20000 --parallel 1

This will spit out a whole bunch of info; for now we want to look just above the dotted line for "load_tensors: offloading n/n layers to GPU" - if fewer than 100% of the layers are on GPU, inference is going to be slower and you probably want to drop down to a smaller version of the model. The "dense" 27B will be slowed more by this than the "mixture-of-experts" 35B-A3B, which has to move fewer weights per token from memory to the GPU.

Go to the printed link (localhost:8080 by default) and check that the model seems to be working normally in the default chat interface. Then, you're going to want more context space than 20k tokens, so look at your available VRAM (I think the regular Windows task manager resource monitor will show this) and incrementally increase the fit-ctx target until it's almost full. 100k context is enough for basic coding, but more like 200k would be better. Qwen's max native context length is 262,144. If you want to push this to the limit you can use `--fit-target <amount of memory in MB>` to reduce the free VRAM target to less than the default 1024 - this may slow down the rest of your system though.

Finally, start hooking up coding harnesses (llama-server is providing an OpenAI-compatible API at localhost:8080/v1/ with no password/token). Opencode seems to work pretty reliably, although there's been some controversy about telemetry and such. Zed has a nice GUI but Qwen sometimes has trouble with its tools. Frankly I haven't found an open harness I'm really happy with.

greyskull 9 hours ago

Thank you for all this, I'll give it a shot. Out of curiosity, are there any resources that sort of spell this out already? i.e., not requiring a comment like this to navigate.

> nothing you can run locally, on that machine anyways, is going to compare with Opus

Definitely not expecting that. Just wanted to find a setup that individuals were content with using a coding harness and a model that is usable locally.

What does your setup look like? Model, harness, etc.

unethical_ban 7 hours ago

This is exactly what I have been looking for: Something straight to the point. Thanks a lot!

trvz a day ago

The fun thing is, you can be aware of the entire range of Qwen models that are available for local running, but not at all about their cloud models.

I knew of all the 3.5’s and the one 3.6, but only now heard about the Plus.

Alifatisk 21 hours ago

Their Plus series have existed since Qwen chat was available , as far as I remember. I can at least remember trying out their Plus model early last year.

wg0 20 hours ago

Notice the pattern that Chinese providers are now:

1. Keeping models closed source.

2. Jacking up pricing. A lot. Sometimes up to 100% increase.

embedding-shape 20 hours ago

Huh yeah, that's truly a unique trait these Chinese companies don't share with companies in other countries.

aerhardt 18 hours ago

No it is not, but they had a unique positioning around open-source and the parent commenter means that they are losing it.

esperent 10 hours ago

halJordan 13 hours ago

Qwen max has always been cloud only. And its a 1T+ model so it would be expensive

nicce 18 hours ago

> Jacking up pricing. A lot. Sometimes up to 100% increase.

How is that different from American?

Tepix 19 hours ago

Are you talking about GLM 5.1, DeepSeek V3.2 or Kimi K2.6 (released one hour ago!)?

Oh wait, it doesn't apply to those…

Kerrick 18 hours ago

Z.ai's Coding Plan with GLM 5.1 (Max) did more than double in price. It was $80 two weeks ago, and now it's $160.

slopinthebag 17 hours ago

dingocat 18 hours ago

Yet.

OtomotO 20 hours ago

US companies hate that trick?!

rc_kas 19 hours ago

you mean: invented

sunaookami 17 hours ago

cnlwsu 19 hours ago

what only Oracle can do it?

cute_boi 20 hours ago

Well, they can't subsidize forever. And, it is kinda expected?

gpm 19 hours ago

Considering the propaganda value in controlling the inputs to the machine that answers peoples questions, I rather expect them to be subsidized forever.

bigyabai 18 hours ago

ai_fry_ur_brain 19 hours ago

Yeah, its almost like the casinos started rigging the game after they got all the addicts hooked. Who saw that coming???

If you overuse LLMs or get excited about them at all, you're ngmi and a complete idiot.

atilimcetin 21 hours ago

Nowadays, I'm working on a realtime path tracer where you need proper understanding of microfacet reflection models, PDFs, (multiple) importance sampling, ReSTIR, etc.. Saying that mine is a somewhat specific use case.

And I use Claude, Gemini, GLM, Qwen to double check my math, my code and to get practical information to make my path tracer more efficient. Claude and Gemini failed me more than a couple of times with wrong, misleading and unnecessary information but on the other hand Qwen always gave me proper, practical and correct information. I’ve almost stopped using Claude and Gemini to not to waste my time anymore.

Claude code may shine developing web applications, backends and simple games but it's definitely not for me. And this is the story of my specific use case.

wg0 20 hours ago

I have said similar things about someone experiencing similar things while writing some OpenGL code (some raytracing etc) that these models have very little understanding and aren't good at anything beyond basic CRUD web apps.

In my own experience, even with web app of medium scale (think Odoo kind of ERP), they are next to useless in understanding and modling domain correctly with very detailed written specs fed in (whole directory with index.md and sub sections and more detailed sections/chapters in separate markdown files with pointers in index.md) and I am not talking open weight models here - I am talking SOTA Claude Opus 4.6 and Gemini 3.1 Pro etc.

But that narrative isn't popular. I see the parallels here with the Crypto and NFT era. That was surely the future and at least my firm pays me in cypto whereas NFTs are used for rewarding bonusess.

wg0 20 hours ago

Someone exactly said it better here[0] already.

[0]. https://news.ycombinator.com/item?id=47817982

esperent 10 hours ago

To be fair, I've had the extreme misfortune of working on Odoo code and I can understand why an LLM would struggle.

Yearly breaking changes but impossible to know what version any example code you find is related to (except that if you're on the latest version, it's definitely not for your version), closed and locked down forum (after several months of being a paying customer, I couldn't even post a reply, let alone ask a question), weird split between open and closed, weird OWL frontend framework that seems to be a bad clone of an old React version, etc. etc. Painful all around. I would call this kind of codebase pre-LLM slop, accreted over many years of bad engineering decisions.

amarcheschi 20 hours ago

a semester ago i was taking a machine learning exam in uni and the exam tasked us with creating a neural network using only numerical libraries (no pytorch ecc). I'm sure that there are a huge lot of examples looking all the same, but given that we were just students without a lot of prior experience we probably deviated from what it had in its training data, with more naive or weird solutions. Asking gemini 3 to refactor things or in very narrow things to help was ok, but it was quite bad at getting the general context, and spotting bugs, so much that a few times it was easier to grab the book and get the original formula right

otoh, we spotted a wrong formula regarding learning rate on wikipedia and it is now correct :) without gemini and just our intuition of "mhh this formula doesn't seem right", that definitely inflated our ego

muyuu 19 hours ago

for Anthropic and OpenAI there is a very real danger that people invest serious time finding the strengths of alternative models, esp Chinese/open models that can to some degree be run locally as well

it puts a massive backstop at the margins they can possibly extract from users

zozbot234 21 hours ago

What size of Qwen is that, though? The largest sizes are admittedly difficult to run locally (though this is an issue of current capability wrt. inference engines, not just raw hardware).

atilimcetin 21 hours ago

I'm directly using https://chat.qwen.ai (Qwen3.6-Plus) and planning to switch to Qwen Code with subscription.

jasonjmcghee 20 hours ago

You may be interested in "radiance cascades"

hedora 16 hours ago

What do you use instead of the Claude code client app?

jansan 21 hours ago

How "social" does Quen feel? The way I am using LLMs for coding makes this actually the most important aspect by now. Claude 4.6 felt like a nice knowledgeable coworker who shared his thinking while solving problems. Claude 4.7 is the difficult anti-social guy who jumps ahead instead of actually answering your questions and does not like to talk to people in general. How are Qwen's social skills?

zozbot234 20 hours ago

Qwen feels like wise Chinese philosopher. Talks in very short elegant sentences, but does very solid work.

Alifatisk 20 hours ago

johnnyApplePRNG 6 hours ago

Nowhere near the power of ChatGPT 5.4 Pro imho... thought for maybe 15 seconds on a problem that pro would have spend 15 minutes on... and the results really show :/

djyde 15 hours ago

I've been using glm5.1 for pretty much all my coding work, but Claude is too expensive for me. Haven't tried qwen yet though. China's coding models are now very cost-effective.

djyde 14 hours ago

But I've recently found that Cursor's composer2 is also really good to use.

freely0085 12 hours ago

Composer 2 is just Kimi 2.5, it's not their own model.

piotraleksander 5 hours ago

Oras 21 hours ago

I find it odd that none of OpenAI models was used in comparison, but used Z GLM 5.1. Is Z (GLM 5.1) really that good? It is crushing Opus 4.5 in these benchmarks, if that is true, I would have expected to read many articles on HN on how people flocked CC and Codex to use it.

ac29 21 hours ago

GLM 5.1 is pretty good, probably the best non-US agentic coding model currently available. But both GLM 5.0 and 5.1 have had issues with availability and performance that makes them frustrating to use. Recently GLM 5.1 was also outputting garbage thinking traces for me, but that appears to be fixed now.

cmrdporcupine 21 hours ago

Use them via DeepInfra instead of z.ai. No reliability issues.

https://deepinfra.com/zai-org/GLM-5.1

Looks like fp4 quantization now though? Last week was showing fp8. Hm..

wolttam 21 hours ago

coder68 21 hours ago

In fact it is appreciated that Qwen is comparing to a peer. I myself and several eng I know are trying GLM. It's legit. Definitely not the same as Codex or Opus, but cheaper and "good enough". I basically ask GLM to solve a program, walk away 10-15 minutes, and the problem is solved.

Oras 21 hours ago

cheaper is quite subjective, I just went to their pricing page [0] and cost saving compared to performance does not sell it well (again, personal opinion).

CC has a limited capacity for Opus, but fairly good for Sonnet. For Codex, never had issues about hitting my limits and I'm only a pro user.

https://z.ai/subscribe

kardianos 21 hours ago

Yes. GLM 5.1 is that good. I don't think it is as good as Claude was in January or February of this year, but it is similar to how Claude runs now, perhaps better because I feel like it's performance is more consistent.

vidarh 21 hours ago

GLM 5.1 is the first model I've found good enough to spring for a subscription for other than Claude and Codex.

It's not crushing Opus 4.5 in real-life use for me, but it's close enough to be near interchangeable with Sonnet for me for a lot of tasks, though some of the "savings" are eaten up by seemingly using more tokens for similar complexity tasks (I don't have enough data yet, but I've pushed ~500m tokens through it so far.

pros 21 hours ago

I'm using GLM 5.1 for the last two weeks as a cheaper alternative to Sonnet, and it's great - probably somewhere between Sonnet and Opus. It's pretty slow though.

bensyverson 19 hours ago

This is what kills it for me… The long thinking blocks can make a simple task take 30 minutes.

Alifatisk 21 hours ago

GLM-5 is good, like really good. Especially if you take pricing into consideration. I paid 7$ for 3 months. And I get more usage than CC.

They have difficulty supplying their users with capacity, but in an email they pointed out that they are aware of it. During peak hours, I experience degraded performance. But I am on their lowest tier subscription, so I understand if my demand is not prioritized during those hours.

ekuck 20 hours ago

Where are you getting 3 months for $7?

Alifatisk 19 hours ago

culi 20 hours ago

If you only look at open models, GLM 5.1 is the best performance you can get on on the Pareto distribution

https://arena.ai/leaderboard/text?viewBy=plot&license=open-s...

c0n5pir4cy 21 hours ago

I've been using it through OpenCode Go and it does seem decent in my limited experience. I haven't done anything which I could directly compare to Opus yet though.

I did give it one task which was more complex and I was quite impressed by. I had a local setup with Tiltdev, K3S and a pnpm monorepo which was failing to run the web application dev server; GLM correctly figured out that it was a container image build cache issue after inspecting the containers etc and corrected the Tiltfile and build setup.

cleaning 21 hours ago

Most HN commenters seem to be a step behind the latest developments, and sometimes miss them entirely (Kimi K2.5 is one example). Not surprising as most people don't want to put in the effort to sift through the bullshit on Twitter to figure out the latest opinions. Many people here will still prefer the output of Opus 4.5/4.6/4.7, nowadays this mostly comes down to the aesthetic choices Anthropic has made.

Oras 21 hours ago

Not just aesthetics though, from time to time I implement the same feature with CC and Codex just to compare results, and I yet to find Codex making better decisions or even the completeness of the feature.

For more complicated stuff, like queries or data comparison, Codex seems always behind for me.

throwaw12 21 hours ago

maybe they decided OpenAI has different market, hence comparing only with companies who are focusing in dev tooling: Claude, GLM

edwinjm 21 hours ago

Haven’t you heard about Codex?

throwaw12 21 hours ago

blockcipher 21 hours ago

Yeah GLM’s great for coding, code review, and tool use. Not amazing at other domains.

esafak 21 hours ago

I use it and think its intelligence compares favorably with OpenAI and Anthropic workhorses. Its biggest weakness is its speed.

XCSme 19 hours ago

A bit weird to be comparing it to Opus-4.5 when 4.7 was released...

chatmasta 19 hours ago

Is this going to be an open weights model or not? The post doesn’t make it clear. It seems the weights are not available today, but maybe that’s because it’s in preview?

zozbot234 19 hours ago

The Max series has never been open.

marsulta 20 hours ago

I think the benchmarks and numbers need to be easier to read. Those benchmarks are useless to the regular consumer.

digimantis 11 hours ago

i dont get why people defend $200/month models against open source model that cost 1/10 of the price, like literally

o10449366 18 hours ago

I have the M3 Max MBP with 128 GB of memory and the 40 core GPU. What's the best local model I can run today for coding?

alx-ppv 17 hours ago

You can try https://github.com/AlexsJones/llmfit

fragmede 7 hours ago

This thing on Celebras is going to be ridiculous.

Aeroi 16 hours ago

why do people continue to benchmark their sota models against older models.

xmly 18 hours ago

Very impressive!

DeathArrow 21 hours ago

I am trying since one week to subscribe Alibaba Coding Plan (to use Qwen 3.6 Plus) but it's always out of stock.

They brag about Qwen but don't let people use it.

dakolli 18 hours ago

ToKeN PrIcEs ArE gOiNg tO PluMmEt, InTelLigEnCe WiLl Be AfForDaBlE FoR EvErYOnE

Hacker News

by Ryan Harman

Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving (qwen.ai)

alex7o 20 hours ago [-]

mikenew 13 hours ago [-]

operatingthetan 13 hours ago [-]

fwipsy 13 hours ago [-]

hamdingers 11 hours ago [-]

ulfw 6 hours ago [-]

vidarh 39 minutes ago [-]

mettamage 4 hours ago [-]

abustamam 12 hours ago [-]

mikenew 6 hours ago [-]

LoganDark 12 hours ago [-]

deaux 10 hours ago [-]

Mashimo 6 hours ago [-]

bink-lynch 12 hours ago [-]

zackify 9 hours ago [-]

jadbox 11 hours ago [-]

bink-lynch 5 hours ago [-]

zackify 9 hours ago [-]

vidarh 40 minutes ago [-]

jxmesth 18 hours ago [-]

embedding-shape 18 hours ago [-]

jxmesth 18 hours ago [-]

chillfox 11 hours ago [-]

ecocentrik 18 hours ago [-]

jxmesth 18 hours ago [-]

sscaryterry 17 hours ago [-]

zrn900 2 hours ago [-]

NobleLie 13 hours ago [-]

jwitthuhn 18 hours ago [-]

ycui1986 12 hours ago [-]

estimator7292 16 hours ago [-]

Moosdijk 18 hours ago [-]

pkulak 18 hours ago [-]

tasuki 17 hours ago [-]

Mashimo 18 hours ago [-]

Mashimo 4 hours ago [-]

gck1 14 hours ago [-]

chillfox 3 hours ago [-]

Akira1364 18 hours ago [-]

spaceman_2020 16 hours ago [-]

slopinthebag 18 hours ago [-]

blurbleblurble 7 hours ago [-]

chillfox 3 hours ago [-]

enraged_camel 3 hours ago [-]

ternaryoperator 19 hours ago [-]

ezekiel68 18 hours ago [-]

gck1 14 hours ago [-]

lambda 11 hours ago [-]

justincormack 17 hours ago [-]

sirnicolaz 17 hours ago [-]

cornedor 19 hours ago [-]

dev_l1x_be 18 hours ago [-]

cornedor 17 hours ago [-]

odie5533 16 hours ago [-]

mkhalil 5 hours ago [-]

FlyingSnake 19 hours ago [-]

bensyverson 19 hours ago [-]

nothinkjustai 19 hours ago [-]

FlyingSnake 19 hours ago [-]

nothinkjustai 19 hours ago [-]

complexworld 7 hours ago [-]

dev_l1x_be 18 hours ago [-]

solomatov 18 hours ago [-]

alex7o 15 hours ago [-]

OtomotO 20 hours ago [-]

balls187 20 hours ago [-]

runarberg 19 hours ago [-]

e12e 19 hours ago [-]

taurath 20 hours ago [-]

balls187 19 hours ago [-]

psychoslave 20 hours ago [-]

Wolfbeta 18 hours ago [-]

ecshafer 20 hours ago [-]

smallmancontrov 18 hours ago [-]

bensyverson 19 hours ago [-]

OtomotO 19 hours ago [-]

taneq 19 hours ago [-]

alex7o 20 hours ago

mikenew 13 hours ago

operatingthetan 13 hours ago

fwipsy 13 hours ago

hamdingers 11 hours ago

ulfw 6 hours ago

vidarh 39 minutes ago

mettamage 4 hours ago

abustamam 12 hours ago

mikenew 6 hours ago

LoganDark 12 hours ago

deaux 10 hours ago

Mashimo 6 hours ago

bink-lynch 12 hours ago

zackify 9 hours ago

jadbox 11 hours ago

bink-lynch 5 hours ago

zackify 9 hours ago

vidarh 40 minutes ago

jxmesth 18 hours ago

embedding-shape 18 hours ago

jxmesth 18 hours ago

chillfox 11 hours ago

ecocentrik 18 hours ago

jxmesth 18 hours ago

sscaryterry 17 hours ago

zrn900 2 hours ago

NobleLie 13 hours ago

jwitthuhn 18 hours ago

ycui1986 12 hours ago

estimator7292 16 hours ago

Moosdijk 18 hours ago

pkulak 18 hours ago

tasuki 17 hours ago

Mashimo 18 hours ago

Mashimo 4 hours ago

gck1 14 hours ago

chillfox 3 hours ago

Akira1364 18 hours ago

spaceman_2020 16 hours ago

slopinthebag 18 hours ago

blurbleblurble 7 hours ago

chillfox 3 hours ago

enraged_camel 3 hours ago

ternaryoperator 19 hours ago

ezekiel68 18 hours ago

gck1 14 hours ago

lambda 11 hours ago

justincormack 17 hours ago

sirnicolaz 17 hours ago

cornedor 19 hours ago

dev_l1x_be 18 hours ago

cornedor 17 hours ago

odie5533 16 hours ago

mkhalil 5 hours ago

FlyingSnake 19 hours ago

bensyverson 19 hours ago

nothinkjustai 19 hours ago

FlyingSnake 19 hours ago

nothinkjustai 19 hours ago

complexworld 7 hours ago

dev_l1x_be 18 hours ago

solomatov 18 hours ago

alex7o 15 hours ago

OtomotO 20 hours ago

balls187 20 hours ago

runarberg 19 hours ago

e12e 19 hours ago

taurath 20 hours ago

balls187 19 hours ago

psychoslave 20 hours ago

Wolfbeta 18 hours ago

ecshafer 20 hours ago

smallmancontrov 18 hours ago

bensyverson 19 hours ago

OtomotO 19 hours ago

taneq 19 hours ago

slopinthebag 11 hours ago

ninjahawk1 21 hours ago

culi 20 hours ago