Hacker News

by Ryan Harman

Claude Fable 5 (anthropic.com)

1395 points by Philpax 5 hours ago

System Card [pdf]: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

simonw 41 minutes ago

I've spent enough time with this now in Claude Code (and Claude.ai and Claude Code for web) to have an opinion on Fable 5: it's a beast. I'm throwing some VERY difficult problems at at - things I've been dragging my heels on for months - and it's crunching through them very happily.

One that I'm willing to share (albeit from just a week ago) - I built a Python library last week that bundles MicroPython compiled to WASM to create a sandboxed code execution library: https://github.com/simonw/micropython-wasm

I just told Claude.ai (not even Claude Code - this was the standard Claude chat interface) running Fable 5:

  Clone simonw/micropython-wasm from GitHub
  and research how this could use a full
  Python as opposed to MicroPython

A few prompts later (and I uploaded the zip files from https://github.com/brettcannon/cpython-wasi-build/releases/t... because Claude chat can't access those files itself) and I have a wheel file that bundles Python itself, compiled to WASM:

  uv run --with https://static.simonwillison.net/static/cors-allow/2026/cpython_wasm-0.1.0-py3-none-any.whl \
    cpython-wasm -c 'print(45 ** 56)'

Here's the transcript: https://claude.ai/share/a73b8b8b-8ebc-4fef-9e5c-7438e5e7ae35

(It's possible Opus or GPT-5.5 could have done this too, I've not tried the exact same sequence. The Fable vibes are good here, though.)

alexchantavy 33 minutes ago

High, extra, or max?

simonw 20 minutes ago

High.

oblio 22 minutes ago

How much does it cost? How much did those tasks you did cost?

simonw 20 minutes ago

So far it's all fitting into my current $100/month Claude Max subscription. I got lucky: I had 80% of my weekly allowance left and it resets tomorrow, so I'm burning tokens to try and use it all up by then.

Update: looks like I've spent $82.92 in Fable 5 API priced tokens so far today (still all included in my subscription.)

EstanislaoStan 17 minutes ago

zirkonit 17 minutes ago

But, but, how does the pelican look?!

simonw 6 minutes ago

See parallel thread: https://news.ycombinator.com/item?id=48464054

alecco 6 minutes ago

I hate how the Instagram/TikTok/YouTube influencer cancer is getting into AI. With early access and all that.

It made sense for people doing proper and fair AI breakdowns waiting on an embargo, but now it's just slop I don't trust anymore.

simonw 5 minutes ago

I often get early access but didn't for this one, it's quite possible there's an NDA in an email somewhere that I missed and forgot to sign.

dannyw 3 hours ago

Impressions from testing Fable 5 prior to launch:

• My most noticeable immediate jump was in how its frontend design was much more intentionally crafted, and delightful without feeling like 'AI vibe coded'; with better end-user usability too.

• In some internal agentic harnesses, it achieved better results with about half the tokens, making it cost the ~same as Opus 4.8 price-wise! The real price increase is less than 2x; with biggest differences in harder problems where Opus 4.8 struggles (or needs many turns).

• Part of the token efficiency improvements come from Fable doing more targeted and surgical diffs, with less non-necessary changes. This is great, because PRs often have less LoC changes for review. It writes more maintainable code without explicit human steering.

• For general conversation and assistant style use cases, didn’t really notice a difference vs 4.8.

• 1M context window, without increased pricing for long context is AWESOME. This is a massive win.

• The classifiers are super aggressive and sensitive and this does happen for very benign, non-security coding tasks. Fallbacks to 4.8 worked like a charm; but the filters are definitely super sensitive.

Overall, I would describe this as a step change and worthy of the "Claude 5" model name. It did take some time to understand the intelligence ceiling of this model; and even with an extended testing window I'm still discovering new things and often surprised (in a good way) by the model.

bottlepalm 3 hours ago

I just ran it on a tough reverse engineering problem I'm having that neither Claude Code 4.8 or ChatGPT Codex 5.5 could figure out. 30 minutes later Fable has it all figured out perfectly.

jp0001 10 minutes ago

I asked it to write security tests for an app and I was downgraded to Opus 4.8. I'm approved for their cyber program!

cedws 2 hours ago

How did it not immediately flag that up? Are you sure it wasn’t being silently routed to Opus?

bottlepalm an hour ago

theragra 31 minutes ago

I want to test how it will handle e-bike software and hardware RE for my bike. Opus was really good for that, but still made some mistakes. With Fable, I hope I will be able to do a total RE of most components, hopefully including motor firmware to some extent.

skerit 2 hours ago

Oh nice, it didn't flag the request? I feared any reverse engineering would become impossible because of the new safeguards.

bottlepalm an hour ago

derangedHorse 2 hours ago

For hard problems you’ll have to use the GPT 5.5 pro model (available via api if you don’t want to spend $100 on the monthly subscription)

bottlepalm an hour ago

port11 2 hours ago

I’ve had it go through a 50-page PDF of dense, inter-connected specs, and it correctly flagged everything that was done, somewhat done, and missing. It went into a lot of detail and explained where the code deviated from the spec.

It felt, at least for me, light an impressive step up. Opus 4.8 was already very thorough; but sadly verbose and ‘loopy’ when you push back on its plans. Fable is what I’d use all day if I could afford it!

duxup 37 minutes ago

I feel like it takes me months to be confident in any of these things.

InsideOutSanta 3 hours ago

After running it for half an hour: it's incredibly good at the visual aspects of UI design.

tsunamifury 2 hours ago

"incredibly" is doing a ton of work here. I do not think its doing even moderate work on visual design, but it can spew out a lot of ui that looks arranged ... ok.

This is still not in the range of shippable UI for top end companies. Maybe for internal tools and enterprise.

At our comapny we limit to protoypes at most and even find it limited there.

InsideOutSanta 2 hours ago

coldtea an hour ago

morley 3 hours ago

Can I ask how you gained preview access to Fable 5?

kakugawa 3 hours ago

I didn't see Fable 5 in the `/model` list, until I ran it with: `$ claude --model fable-5`

swyx 3 hours ago

he works on evals at canva

dannyw 3 hours ago

jumploops 2 hours ago

It's interesting that we're seeing these gains when it seems Mythos/Fable is "just" a scaled up version of their existing architecture[0].

When GPT 4.5 launched, the gains compared to the model size didn't seem that great, leading some to believe that the only progress we'd see would come from RL.

This model certainly has quite a "substantial amount of post-training and fine-tuning", but it's also based on a new pretrain[1][3], which given the cost, indicate that it is in fact quite a bit larger than Opus 4.X.

[0] One of the early testers mentioned: "As far as I can tell from talking to people internally at Anthropic, there's nothing special about architecturally"[2]

[1] Section 1.1 in https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

[2] https://youtu.be/GrdEid8H6H4?t=168

[3] There were rumors going around when Mythos was first announced that it was the first 10T parameter model, but I can't find a verifiable source for that number.

sigmar 4 hours ago

The system card is 319 pages, at what point do we call it a "book" instead of a "card"?

There's a quote from a METR report on page 52:

>We ran [Mythos 5] on 38 of our hardest software tasks, including tasks centered around R&D. [Mythos5] generally outperformed an early checkpoint of Claude Mythos Preview in these, including by succeeding on some tasks that had not been solved by any public model we have previously evaluated. However, we still observed the model occasionally failing to correctly interpret nuanced instructions in difficult tasks... Based on the available evidence, we believe [Mythos 5] is likely unable to fully and reliably automate R&D for frontier projects spanning multiple weeks. We believe that a better, more confident assessment would require more time, evaluations, and information from the model developer.

baq 4 hours ago

> we believe [Mythos 5] is likely unable to fully and reliably automate R&D for frontier projects spanning multiple weeks

this is good news, right? right...?

yaodub 4 hours ago

Depends whether "unable to fully automate" means "needs occasional human checkpoints" or "slowly stops caring about your actual goal." Pretty different.

arizen 37 minutes ago

Probably there will always be frontier surface which frontier model of a given generation would not be able to automate.

GuB-42 43 minutes ago

It is certainly good news for those who are selling all these tokens.

woeirua 4 hours ago

lmao, i love how the goal post is now in the "multiple weeks" timeline

applfanboysbgon 4 hours ago

romanovcode 3 hours ago

But did it mention developer in the park eating the sandwitch? That is the most important question!

AquinasCoder 4 hours ago

From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window. After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

This seems like the pharmaceutical method of get them hooked on the drug with free samples, then once they can't live without it, raise the price. I'm not sure I want to start using Claude Fable on a max plan if it's just going to go away on June 23rd.

But maybe the more charitable reading is that they didn't have to offer this model at all on those plans and they are giving the standard free trial.

PeterStuer 4 hours ago

I'll be amazed if they manage to keep their infra responsive over the next 2 weeks.

kilroy123 3 hours ago

I've been getting a lot of these messages today:

API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited

trollied 3 hours ago

They just leased a massive spacex data centre.

PeterStuer 3 hours ago

linsomniac 13 minutes ago

I was just saying last week: If Opus 4.8 max is as good as we get, and we plateau there, I think I'd be fine with it.

For the stuff I've thrown at it, that configuration has done a really great job. Including 70+KLOC go proxy with extensive test suite, some retro games, and more.

jkelleyrtp 4 hours ago

On the new FrontierCode [1] benchmark (ie graded from an OSS maintainer's perspective of "would I merge this code?")

- Opus 4.7 xhigh: 5.2%

- Opus 4.8 xhigh: 13.4%

- Fable 5 xhigh: 29.3%

Seems like a huge jump.

[1] https://cognition.ai/blog/frontier-code

amluto 4 hours ago

That blog post really makes it look like it's graded from an LLM's estimation of an OSS maintainer's review. I see three issues:

1. That estimate could easily be wrong.

2. That estimate is, of course, usable in RL training. This isn't an inherently bad thing, and this is more or less what has improved coding models so much lately. But it does mean that other companies could and surely will do this sort of training, and Anthropic probably did too.

3. OSS maintainers are far from perfect, and there's an unfortunate uncanny valley-like effect in which a coding model can produce code that is just convincing enough to pass review even though it's actually totally wrong. I don't know whether this is a specific issue here.

zzleeper 4 hours ago

How credible is this benchmark? does it correlated with others real world experience?

bfeynman 4 hours ago

Given it was made by cognition (team behind devin flop) who now just got to wait out until claude and gpt5 basically do all of the work for them - not very. When you read about it, the framework is highly subjective. Which very quickly becomes a problem because its based on heuristics that probably change a bunch with a better code model.

vanuatu 3 hours ago

CSMastermind 14 minutes ago

DeepSWE is the benchmark you want to actually look out for. Only one that aligns with actual user reported results from trying the models.

Catloafdev 4 hours ago

It's a relatively new benchmark but from what I can tell it has serious cred behind it. I assume it will be picked up as part of the standard suite of CS-related benchmarks soon enough.

vanuatu 4 hours ago

i worked on one of the benchmarks typically found in new model releases

this benchmark looks very good from the methodology. a cog researcher checking the data themselves is very high signal (not scaleable so don't take the benchmark as gospel, but directionally good)

schipperai 3 hours ago

Cognition did well in documenting their approach [1].

TL;DR - they worked with OSS project maintainers to build tasks. They score models based on whether a PR is mergeable. All tasks are graded by a human researcher. SoTA models have hill-climbing to do which raises the bar and inspires confidence. I'd say it's legit.

[1]: https://x.com/cognition/status/2064061031912288715

emp17344 4 hours ago

Seems like it literally popped up yesterday with the express purpose of building hype for this release.

vanuatu 4 hours ago

osti 3 hours ago

swyx 3 hours ago

anthonypasq 4 hours ago

swyx 3 hours ago

jump in chart form https://x.com/swyx/status/2064414823748886591/photo/1

hydra-f 4 hours ago

Yes, and the price reflects that

leecommamichael 4 hours ago

I'm not familiar with model pricing trends, did they clearly state how the new pricing compares? (Note that I'm actually asking a question, and am not arguing)

EDIT: Oh I see, this is the best link for pricing https://platform.claude.com/docs/en/about-claude/pricing

So the price is double across the board...

bhelkey 4 hours ago

hydra-f 4 hours ago

OtomotO 3 hours ago

Bummer! When can I finally and confidently get slopcode into Zig?

m3kw9 4 hours ago

FrontierCode is likely paid for by anthropic.

lanthissa 4 hours ago

did they not pay them enough to get good ratings on the other 3 models?

whats the logic in claiming its a borked metric when everything listed is an anthropic model.

Narretz 4 hours ago

reasonableklout 4 hours ago

Huh? It's a benchmark by Cognition which (1) is building their own models and (2) offers all providers and thus has an incentive to avoid hyping up any one too much.

jstummbillig 4 hours ago

eggbrain 4 hours ago

For those of us on subscription plans:

* From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.

* On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.

* After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.

hgoel an hour ago

How much more clearly do they need to explain the resource constraints?

If they didn't announce it, you guys would be complaining about slowed progress.

If they didn't release it, you guys would be complaining about fake promises and marketing.

If they released it without limits, the complaints would be about slow responses and outages.

If they didn't add to susbcription plans, the complaints would be about phasing out subscriptions.

If they added to subscriptions with cost reflecting their resource availability, the complaints would be about how quickly it eats limits.

So they choose the middle ground of providing some initial access and assessing if they can satisfy demand, only to still be ignored and accused of trying to get users hooked?

We've already seen that they don't have enough compute, thus the deals with SpaceX for their GPUs. It's very reasonable that they just don't have the capacity to support the subscription userbase on this model.

dakolli 23 minutes ago

If, If, If, If.

If Anthropic was serving LLMs profitably it wouldn't be a problem. This is a dead business with no path to profitability and they're desperately trying to get people used to usage based billing, because they need it to become the norm to survive.

jrflo 4 hours ago

Still satisfied with my switch to codex/chatgpt. I couldn't imagine switching away from claude code when it first launch but with the drastically more generous usage on codex for the same subscription tier I just can't justify it.

goranmoomin 3 hours ago

My experience is that the GPT-family of models are very smart and figure out bugs, edge cases a bit better, but it produces code that is much less mergable – if you review the code, it introduces a lot more useless/inappropriate heavy abstractions and wrapper functions, compared to the Claude-family models which introduces the right amount of straightforward human-style code.

I can recognize so much of the GPT/Codex generated code long after it gets merged (not by me).

Additionally, the time spent on every agent turn on GPT 5.5 is much longer compared to Claude Opus 4.8, which means iterating on the code takes a lot more patience, and there's a lot more nitpicks to pick when actually using GPT 5.5 to do software engineering.

Feels like GPT-style models are more geared on doing one-shot software vibing (and handling the vibe coded mixture) compared to Claude's focus on actual software maintenance. I got a GPT Pro sub for free and wanted to cancel my Claude subscription so much, but I still keep reaching Claude models a lot more. Frustrating.

PhilipDaineko 2 hours ago

superkickstart 2 hours ago

syzygyhack 2 hours ago

dilap 2 hours ago

trollbridge an hour ago

vruiz 2 hours ago

GoToRO 2 hours ago

moomoo11 43 minutes ago

sigbottle 3 hours ago

Codex IME is just smarter, I think it shows given both anecdotes but also how OpenAI has always been at the front of programming competitions and math problems.

But Claude models seem to be better at long term problems or more ambiguous problems.

I'm curious as to what the primary benefit here. Are there secret improvements in training? There hasn't been much in fundamental model architecture, I don't think. What about harnesses? I wonder what's pushing the AI. It seems like harnesses is the main thing pushing AI ever since CoT.

Spartan-S63 3 hours ago

someguyiguess 3 hours ago

greenavocado 3 hours ago

wsatb 4 hours ago

I guess enjoy it while it lasts? OpenAI won't be able to subsidize that forever either.

windexh8er 3 hours ago

flatline 4 hours ago

pyeri 2 hours ago

ChrisMarshallNY 4 hours ago

jrflo an hour ago

andai 3 hours ago

gck1 3 hours ago

cortesoft 2 hours ago

I have been using both codex and Claude in my day to day, trying to not get to attached to one. I want to be able to work with any provider in case one of them does something bad.

rvshchwl 3 hours ago

I've found Codex to be the better subscription for OpenClaw, because the limits are indeed very generous. However, I've found more and more that Claude Routines/Scheduled agents can replace all the tasks I use OpenClaw for, so I've been slowly switching over to Claude Code. Aside from OpenClaw, I don't find a lot of value in Codex as a harness on it's own.

knuckleheads 4 hours ago

I feel like Codex made a big push to run everything on your laptop. With Claude, I get 4 cpu's, a fair amount of ram and 30gb for every one of my dumb ideas for free in the cloud containers. Codex used to be similar, but last time I tried it just kept pushing me to run it locally on my laptop, which I really did not want to do with 20 requests going at once. That's the main advantage for me at the moment.

simjnd 4 hours ago

zhshhan 3 hours ago

efromvt 2 hours ago

I do slightly prefer 5.5 for complex work but Claude quota usage has gotten infinitely better since the dark days a few months back - has gone from being infuriating to something I pretty much don’t have to worry about with it as a daily driver. (In fact, hitting GPT weekly quotas is more annoying now). Understand if people are still scarred by the issues + poor comms around them, though.

jrflo an hour ago

supertroop 3 hours ago

Do you use a token service like open router or just subscribe to / unsubscribe from various models sequentially?

jrflo an hour ago

dd8601fn 4 hours ago

I have trouble justifying gpt after that gross stuff with the war department.

Though the day is coming when there’s no distinguishing, I’m sure.

beering 3 hours ago

lovich 3 hours ago

ProofHouse 2 hours ago

100% I constantly get errors and timeouts on single responses in Claude, and certainly hit limits all the time. Codex rarely. In fact, I bought a second $200 Codex plan because the quotas seemed fair and I didnt have constant issues. Claude is so great at a lot of things, but unfortunately Anthropic beats you away with a stick every chance they get.

shimman 4 hours ago

I've only ever had the $20 month claude plan but last night took the time to setup opencode + openrouter paying for deepseek + glm. Previous experience, while extremely awkward, I'd hit my limit within one or two chat replies and it'd take me like 4 limit cycles to complete my task. Now I'm able to complete an equivalent task entire task for less than $2 in two cycles (ask -> revise).

I'm doing basic web development here utilizing animejs. Nothing too complicated (mostly saving time doing the scaffolding, still write the bulk of animations manually).

Truly believe that American companies are going to get completely curb stomped by China due to greed, ineptitude, and violating the social contract.

simjnd 3 hours ago

nozzlegear 3 hours ago

rekttrader 2 hours ago

Wait till you kick the tires of Qwen Coder.

joshstrange 3 hours ago

I would not use this if you are on a subscription. In <8min it burned my entire 5hr window (which has just reset it appears, I have over 4 hours till it resets) I hadn't used CC at all today aside from this) and then it used up ~$15 more in usage before I could stop it.

I am on the $100 Max plan.

GoToRO 2 hours ago

they have a graph with cost comparison between the models. This model is just a little over the other models as cost. The graph is logarithmic :)

d4rkp4ttern an hour ago

Yes, and this is also why I haven’t yet tried the new “dynamic workflows” which spawn hundreds of agents that happily eat through your token limits.

cortesoft 2 hours ago

The CLI when you select it says it has 2x the usage as opus. Not sure if that matches what you are seeing.

I do wonder if you switched models mid-session, you would have lost all your cache. Reloading the context into cache can really eat through your usage.

fastball 3 hours ago

What is your effort level?

observer987 an hour ago

I too am on the $100 plan and I second this.

I had it analyze a project I was working on with Opus 4.8, and it blew through 23% of my session limit in one go. Does not portend well for my budget.

enraged_camel 3 hours ago

That’s odd, I used it on a pretty complex refactoring task and it worked for 22 mins and used only 15% of my 5-hour limit. I’m on the $200 Max plan though.

FireBeyond an hour ago

ZunarJ5 2 hours ago

They didn't even reset credits for this lol

0erofootprint 4 hours ago

For me it almost immediately blocked. I had it writing code related to message digests - and it seemed to think it was too gifted for that. Gave the security warning and switched back to 4.8. Whatever... it will probably soon have the API error soon. I have mostly switched to the Codex 200 a month plan. I've found their 5.5 xhigh to be better than Opus 4.8 "ultracode." Also, i have not once seen their servers fail for compute unavailability, unlike Anthropric which happens almost ever hour.

matheusmoreira 2 hours ago

I just asked Fable for a complete code review of my lone lisp project. Started out strong. Launched Fable agents, then spent like 10 minutes thinking... And then got interrupted by a switch to Opus 4.8.

> Fable 5's safety measures flagged this message for cybersecurity or biology topics.

> They may flag safe, normal content as well.

> These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them.

Here are the results of the agentic code review session:

  ┌──────────────────────────┬───────────────┬────────────────┐
  │          Agent           │ Fable 5 turns │ Opus 4.8 turns │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ values                   │ 134           │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ data-intrinsics          │ 104           │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ tools-tests-build        │ 81            │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ core-intrinsics (failed) │ 25            │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ system-memory            │ 44            │ 20             │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ reader-modules           │ 104           │ 25             │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ linux-startup            │ 95            │ 15             │
  └──────────────────────────┴───────────────┴────────────────┘

This 40 minute session cost me 16% of my weekly usage. A simple code review of the most critical areas of my project got flagged as a cybersecurity risk. It really made me not want to try it again.

kordlessagain 4 minutes ago

kkoncevicius 4 hours ago

I had a similar experience. I wanted to test it by asking it to summarise a scientific OMICs-related paper. It gave a warning about me potentially developing a bio-weapon or something like that. And switched back to Opus 4.8.

smith7018 4 hours ago

Fwiw it's not available on my enterprise account: "Disable zero data retention to unlock Fable 5 access"

stronglikedan 3 hours ago

We just blocked it at our org for this reason. They will "retain agent request and output data associated with this model, regardless of you Cursor Privacy Mode setting."

sdellis 3 hours ago

What does "zero data retention" mean? What kind of data does it need to unlock?

drakythe 3 hours ago

kyledrake 4 hours ago

Considering their apparent nerfing of the end user plans in favor of enterprise clients, is Anthropic still the "more ethical AI company" like everybody loves to tell me all the time?

Assuming this isn't just a supply issue on their side, nothing says "ethical AI" like only allowing mega corporations to use it through cost barriers.

estearum 4 hours ago

You really misunderstand what AI-doom people are worried about if you think this is anywhere near the top (or middle, or bottom) of the list of concerns.

Jackson__ 4 hours ago

throwaway894345 4 hours ago

DonsDiscountGas 4 hours ago

I don't think offering a product under a certain set of terms obligates a company to maintain that offering forever. The bait and switch is certainly annoying but seeing as they're very upfront about it you can't say you weren't warned. Don't like it? Don't use it.

eli 2 hours ago

It's unethical to price it in a way not everyone can afford?

wongarsu 4 hours ago

I wouldn't call Anthropic ethical. But between Anthropic and OpenAI, Anthropic is the more ethical one

brianmcnulty 4 hours ago

Why would you have ethics when you could get that IPO money instead?

dllrr an hour ago

They said they would release it back into subscriptions as capacity allows in the future. If they don't, people are going to point back at it and rake them over the coals.

MattSayar 2 hours ago

It smells like an architecture-related issue to me. They wanted to release the model asap, but they're still implementing the fine-grained controls to constrain the model to non-subscription users.

xvector 4 hours ago

Yup - who cares about x-risk or red lines for domestic mass surveillance anyways? I draw my red lines at prioritizing profitable customers when heavily resource constrained. That's the true definition of evilness!

Maken 4 hours ago

The bar is just too low.

fridder 4 hours ago

More ethical in some areas, actively user hostile in others

nickandbro 4 hours ago

Get them addicted then cut them off. Oldest trick in the book.

toomuchtodo 4 hours ago

More of a free trial to those authenticated and qualified with existing payment. Subscription billing is going away for sure though eventually based on the economics. Token “all you can eat” is a capital furnace otherwise.

(I’m highly confident open models will eventually achieve a similar performance benchmark with distillation over time)

chinathrow an hour ago

CuriouslyC 4 hours ago

jrumbut 2 hours ago

It could be my use cases, which have always seemed to be outside the wheelhouse of these models, but I find it very hard to downgrade after accessing a more capable model.

Opus 4.8 produces output in 15 minutes that is 3-4 hours of my work away from output that used to take me 40ish hours (a solid week of dedicated effort).

Last year(-ish, maybe it was 18 months, I forget when the jump happened), the frontier models couldn't touch this work. The output looked like a hardworking intern on their first day. Nice formatting, decent volume of words, but no understanding.

So it might work if it turns out to be a substantial leap in capability.

GoToRO 2 hours ago

I switched back to Sonnet. It replies faster so I work faster. Also cheaper. But I really like the speed. I have to be more specific with what I want. Also I stop it more often than Opus. These new models will be awesome, but they need to increase the speed.

alvis 4 hours ago

It’s too obvious that antropic need to find way to earn enough revenue before IPO. Claude subscription isn’t earning earning much money I bet

sigmoid10 4 hours ago

I think they are just prioritizing enterprise customers, because this is were historically they made most money.

dylandevelops 3 hours ago

sdellis 3 hours ago

That's a big problem for all of the AI companies. Most people don't find the technology compelling, accurate, or ethical enough to pay for a subscription.

Why wouldn't Anthropic just wait until people start subscribing, do some kind of marketing push, or obtain some kind of other sustainable revenue stream, before they go IPO? I wonder if they see the writing on the wall with all of this and want to cash out as quickly as possible?

AtlasBarfed 3 hours ago

That's not how it works. They don't need revenue, they need addicts.

Specifically they need businesses that fired people and adapted their business to the products, so when the unsubsidized costs hit the businesses are forced to eat the true costs.

Yes they can't afford to give the products for free, but what is essentially happening with AI services is economic dumping, keep costs artificially low to get people to fire everybody, and then Jack the rates once they have Monopoly control

sdellis 3 hours ago

xpct 4 hours ago

I agree, this looks like their plan to wane out subscriptions. This will probably come with Opus nerfs later.

rapind 4 hours ago

I just assume Opus is constantly nerfed based on capacity. I was exclusively Claude for a long time, but the inconsistency in quality, constant outages, and slow downs were too hard to work with.

I just use dumb and fast models now. I'm more engaged. I think that the higher the quality of the model, the more you tend to vibe with it, and then the more hallucinations you then miss. I'm not sure which is more productive, but I definitely burn out faster the more I vibe. At some point you're spending your time on forums, discord, or youtube instead of engaged with what you're building. Or you yak shave about your tooling and end up creating the 600th multi-agent gastown harness and blowing thousands of dollars on tokens to create it only to discover it's too expense to actually use.

dylandevelops 3 hours ago

winter_blue 4 hours ago

nonethewiser 4 hours ago

It's possible that they will transition to usage credits but why not take them at their word? To date they have continued to offer better and better models to their subscription plans.

timcobb 4 hours ago

taormina 4 hours ago

Those already landed! Oh, you weren't talking about 4.8?

piva00 4 hours ago

xvector 4 hours ago

HN needs to take a chill pill. Could it be that Mythos is expensive and they just want to give people a taste of it? I mean the alternative is not offering it at all?

8note 4 hours ago

ltrg 2 hours ago

Fable seems very good at finding bugs (unsurprising given Mythos lineage), so this seems a pretty smart strategy. Once you see the bugs it finds in your existing Opus code, it's going to be hard to go back, psychologically speaking.

timcobb 4 hours ago

Ooof so are we thinking that in the next 6-12 months subscriptions will be replaced with paying retail like enterprise currently?

CuriouslyC 4 hours ago

I don't think they'll phase out subscriptions ever, their whole play has been to drive demand from the bottom up. Get engineers hooked on building with claude at home, then get them to demand the ability to use it at work, and bend over their employer with no lube.

They'll probably tighten the quotas to reign in whales though.

aseipp 4 hours ago

They almost certainly already make a fuckload more money off API pricing than they do subscriptions, even if there might be more total subscription users. So offering subscriptions even at some loss is probably going to continue. Honestly, I'd be surprised if they even lost money on most subs; there are definitely Token Whales out there who mess up all the accounting up, though.

Realistically I think Anthropic just has insane demand but finite capacity to run models, and Fable will just make them more money if they dedicate it to API pricing. I suspect the goal here is something like: get individual engineers/PMs on their personal plans to taste Fable and then go to their meetings and say "Yes doubling the price of every single input/output token is a good idea, boss".

timcobb 3 hours ago

gck1 2 hours ago

thewebguyd 4 hours ago

I certainly hope not. PAYG is not predictable enough for smaller companies or individuals. Where I work (non-tech company), PAYG would never fly. We aren't big enough for that. Of course, you can set usage budgets, but there's a pretty big difference between $200/user/month vs. the equivalent PAYG usage being closer to $1,000/user/month, if you currently use the subscription plan to its limits each week.

Going PAYG only will effectively take these tools away from a huge amount of people and accelerate the push for local LLMs.

OTOH, accelerating the push for local LLMs would also be fine with me.

ygjb 4 hours ago

I doubt it, given the importance of those subscriptions for building and maintaining market awareness.

The AI landscape is changing rapidly, and with Apple announcing the option to change the AI backend, and potential requirements enable AI choices as well, similar to EU browser choice requirements (this is more reading tea leaves than any actual requirements I am aware of). The new OS changes coming to support Googlebook, and deep Copilot/AI integration into Windows will make maintaining user facing subscriptions essential for independent model developers like OpenAI, Anthropic, and Mistal to remain relevant longer term.

If the don't maintain that relevance there is increasing likelihood that they will get consumed by other companies whether it's Apple, Microsoft or Google to form a foundation for their OS, or other cloud providers.

timcobb 3 hours ago

KronisLV 2 hours ago

> it feels like they are trying to get subscribers to switch to usage-based billing

I think they might be hitting a point where subsidizing the expensive models for subscriptions makes less and less sense.

With Opus 4.X, last month I paid 100 USD for the Max subscription and got a token equivalent of 4.1k USD.

I imagine that Fable is more expensive to run.

spaceman_2020 an hour ago

Kimi 2.6 has been my workhorse now. It's as good as Opus 4.6, which, to me, was the last "useful" Claude model.

The newer models are smarter but really ficklle and hard to get meaningful work out of

4.6 was a workhorse

irthomasthomas 3 hours ago

This is just the sales team doing their thing, applying the Law of Scarcity to drive demand.

It's the same exact speed as opus >=4.5, sonnet 4.5, and twice the speed of opus <=4.1

It must have about the same active parameters, or else its a larger model running in turbo mode (smaller batches) and being heavily subsidized for some reason. But given most of the benchmarks are within 5% I doubt it is a much larger model. Most perplexing.

m00x 36 minutes ago

It could be a much bigger MoE model

irthomasthomas 34 minutes ago

dack 3 hours ago

i doubt that's the goal for them. i bet they just really don't have capacity for people using it a ton, yet they wanted people to be able to try it out while it's new. so they compromised and made it temporarily available. and then hope they can get costs down or capacity up so they can make it more available again

InsideOutSanta 3 hours ago

I think the goal is "private citizens: subscriptions; corporations: per-token billing." It's getting people addicted to LLMs on cheap subscriptions so that they can then force companies to pay for expensive inference.

matheusmoreira 3 hours ago

This is really sad... I really didn't want to be priced out of these models but it looks like that's going to happen sooner rather than later.

deepfriedbits 3 hours ago

Thankfully this, like most other tech, will get cheaper through the years.

gck1 2 hours ago

madrox 2 hours ago

I suspect it'll go on the subscription plan once other providers have similar benchmarks.

As annoyed as I am about this move, I get it. Users flood the newest, best model whether they really need it or not, and are efficient at using their entire quota. They've had so much trouble reigning in subscription usage it makes sense.

nicce 4 hours ago

> The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.

Probably all about the IPO.

thisisit an hour ago

One can hope it helps Claude to figure out how to solve their buggy payment system - otherwise how do I pay for these credits.

daft_pink 3 hours ago

I’m just about ready to cancel my small business 5 user plan with max licenses, because although cowork is really great. I just find OpenAI/Codex to be a lot better most of the time.

ABS 4 hours ago

also: Fable takes 2× the usage of Opus

Aleleo76 4 hours ago

Pay-as-you-go billing is a kind of drug, I use it every now and then when I'm working on a project with Opus, in a moment you spend a fortune

oersted 4 hours ago

> Pricing for both models is $10 per million input tokens and $50 per million output tokens.

The step-up in intelligence looks massive (we'll see in practice), but the price is getting to a point where it's making me question if it's even worth giving it a try.

Good competitors will probably be out soon, which should level the playing field. I am more excited about that, just the fact that they showed that such an improvement is possible. I'm okay waiting a bit longer for this to become attainable for plebs like me.

kmac_ an hour ago

Models are getting better, but there's a negative change in terms of "productivity" per dollar. Yeah, I can throw 5 sub-agents at the problem, but the cost is getting significantly higher. And yes, I can crank out the solution much faster, but again, at some point that cost will be hard to justify. And it doesn't matter if the cost is subsidized by a provider, if it's paid by your company, or from your pocket. We are slowly reaching a point where the cost will be too high to justify the gains.

xyzsparetimexyz 4 hours ago

This is probably the end of 'use the best model no matter the price'

kolinko 4 hours ago

The pricing can be a bit deceptive though. A good model can deliver the same results in fewer tokens.

Kind of like billing a programmer by the hour.

zyuiop 2 hours ago

sourcecodeplz 4 hours ago

Why wouldn't it be? How much would you pay a scientist at this point to think about a problem for you and give you a solution?

oersted an hour ago

sytelus an hour ago

Enterprise subs not allowed to use Fable if they have setup zero data retention :(

irthomasthomas 2 hours ago

"we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

...

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."

altcognito 2 hours ago

Where is this text coming from?

[edit] -- I see that this comes from the system card -- dang merged the comments from the other discussion so that explains the confusion.

clementg 4 hours ago

I really don't want this to start being the norm

baggachipz 4 hours ago

I don't see how it won't be. They lose insane amounts of money on subscription plans. I'm sure they still lose money on usage-based billing, but probably not as much.

JumpCrisscross 4 hours ago

cautiouscat 4 hours ago

DonsDiscountGas 4 hours ago

I expect that depends on demand, feedback, and whether GPT-6.0 gets released and is competitive

lisperforlife 3 hours ago

My guess is that it is a massive model similar to GPT 4.5 and $10/$50 pricing is for its output will discourage people from using it. I also read safety = nerfed.

a-dub 3 hours ago

the claimed inference cost is 2x. if that is true, it is massive and remarkable that they're able to do anything like this at all.

dirkc 3 hours ago

This serves as a good reminder that relying on AI models is borrowing your tech from someone else. They can take it away or raise the prices arbitrarily.

If you rely on this as a core part of your business/profession, you will be at their mercy and subject to whatever whims or challenges they have.

meowface 4 hours ago

It's very disappointing but I'm assuming it's for rational reasons on their part.

deanc 3 hours ago

But it's not and it's highly disingenuous to frame it like this. Quote directly from Claude code, moments ago:

> Fable 5 · Most capable for your hardest and longest-running tasks · Uses your limits ~2× faster than Opus

systemvoltage 3 hours ago

It's interesting that we are seeing a time when subscriptions are not preferred and usage-based billing is.

Pay-as-you go isn't a common thing in SaaS. For example, except for AWS SES, all email providers are bulk-subscription based.

nutjob2 3 hours ago

> "offer, then remove"

Sounds like "bait and wait".

If you think about it, the more people pay for these new and more resource hungry models, the longer it takes for them to become no extra cost and the longer it takes the more people are tempted to pay extra.

FergusArgyll 4 hours ago

I'm about to be priced out of SOTA llms and it's an awful feeling

speedgoose 42 minutes ago

The AI circular infinite money glitch won't last forever. I hope.

If you have good expertise in a domain and access to cheaper models, you may still be more skilled than someone without expertise but a lot of money to bruteforce the problems using SOTA LLMs.

wahnfrieden 3 hours ago

Not with Codex

chinathrow an hour ago

FergusArgyll 2 hours ago

rvz 4 hours ago

> * On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.

Of course, they are a casino as well giving you free spins at the wheel with their new Fable machine, and it is done on purpose.

Once there freebies have expired, many of its users will begin to gamble more on the new casino machine and will realize that it is expensive.

xvector 4 hours ago

If it's that big of a problem to you, you're free to just... not use the freebie?

cautiouscat 4 hours ago

danslo 4 hours ago

rvz an hour ago

aray07 4 hours ago

i have never seen this before - where you offer something and then take that away

machomaster 4 hours ago

Really, you have never heard of shareware or trial periods?

tasuki 4 hours ago

firemelt 4 hours ago

damn they are drugs dealer

victor106 4 hours ago

> A new data retention policy Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases ...

Very interesting. I am not sure this will comply with organizational policies and standards protocols (HIPPA etc.,)

frankfrank13 3 hours ago

This makes it an instant non-starter for probably 95% of organizations. A lot of people are about to get in trouble for using it before realizing this.

Aurornis an hour ago

> A lot of people are about to get in trouble for using it before realizing this

Enterprise plans allow admins to set which models are allowed.

nicce 3 hours ago

> deletion after 30 days in almost all cases ...

Almost… basically they have unlimited power to decide what data is kept?

happyopossum an hour ago

If they’re going to retain any data, they have to allow for possibility of the legal system to require any of it to be used in some legal proceeding at some point.

You can’t tell a judge who’s ordered you to retain something that you can’t because you said you wouldn’t.

mohsen1 2 hours ago

It seems like Fable will refuse to do any work when it comes to developing LLMs or even asking questions about topics related to LLM. Simple things like asking to explain a paper fails!

From the model card:

In light of the ability of recent models to accelerate their own development, we've implemented new interventions that limit Claude's effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design. Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user.

throwfaraway4 2 hours ago

"for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design"

Oh man all of those runaway infrastructure buildouts by our agents trying to achieve singularity...

Just say you don't want to lower the bar for others to compete

Chance-Device 2 hours ago

I was wondering when something like this would happen. I got my first and only two content violation warnings in Claude Code last week when asking it about something ML related. It was a real head scratcher because I couldn’t figure out what about the requests could have violated anything.

Might be worth going back and taking a harder look at what I was asking it about if it somehow triggered a “forbidden knowledge” alert. Or maybe it was just a random bug.

properbrew 2 hours ago

> frontier LLM development

This seems so wide reaching if it's catching simple things like explaining a paper. Does this also refuse to help with any already developed training pipelines?

I can kind of understand the generation of synthetic data, but nerfing the assistance of training pipelines just seems like a really shitty thing to do.

alden5 30 minutes ago

So insane to me that these ai companies are perfectly fine trying their absolute best to automate as much knowledge work as possible but as soon as this capability can be turned on them they start implementing hidden interventions to sabotage anyone trying to beat them at their own game.

elastic-hoover an hour ago

I wanted to try on my biology research and it refused to talk about it and proxied to 4.8. Really, only surface level conversations about topics of interest. I know this is not a topic of broad and mass interest, but limiting it for topics like that and machine learning will probably do change how I use it.

lxgr an hour ago

Yes, this stuff is really annoying when it misfires. I've had all my subsequent ChatGPT conversations biohazard-contained for several days for the crime of asking it to explain a gene drive to me.

foolserrandboy an hour ago

This is just marketing that Anthropic is building the singularity.

schipperai 2 hours ago

Let's hope not all frontier AI assimilates these guardrails. It would be a shame for independent researchers and students.

girfan an hour ago

This is super annoying and imo, really limits the usefulness of this model. It speaks volumes about what Anthropic's position as a company and its priorities will be going forward. I doubt this kind of gatekeeping will prevent open-models or other innovation outside Anthropic to slow down. I would imagine these guardrails, if needed at all, should be done at a legal framework level and students should not be a part of this blanket approach to limiting the usage of these models.

agnosticmantis 2 hours ago

Singularity for me but not for thee.

foolfoolz 2 hours ago

you will RENT the singularity

Xunjin an hour ago

"we should put on hold the development of AI because the world is not ready for it"

Yeah... We need open models so we don't have that BS.

gpugreg 2 hours ago

Anthropic probably trained Mythos on their own code and found that it is too got at reproducing it.

teaearlgraycold an hour ago

I doubt that. Why would you train Mythos on its own code if you don't want it to be able to reproduce it? It's not going to add much to the overall corpus.

blurbleblurble an hour ago

skerit an hour ago

That's strange... I've been tinkering with a little LLM-from-scratch project for a while now, and Fable is just continuing it without a problem

SkitterKherpi 2 hours ago

It also tried to force usage the paid Claude API instead of claude code usage just because there's a mention of another provider we might want to plug in (which hasnt even happened) for AI integration.

dchuk 2 hours ago

Ha funny, I was speccing out an idea for real time Claude code interaction from local apps using some tricks vs using the agent sdk when I got the popup to try Fable. So of course I gave it a go, and it triggered the sensitive content warning immediately, which I was very confused by until I put two and two together.

Fun times when “safety” means both the safety of mankind, and also the safety of revenues

blockcipher 2 hours ago

Anthropic is really speedrunning their evil arc as fast as possible. Can't use them for basic LLM research, cybersecurity, or beyond-surface-level discussions of biology and virology, but Anthropic is allowed to sell Claude to the trump administration to kidnap maduro and to bomb iran. And don't get me started on that $100M autonomous killer drone swarm contract that they applied to and rationalized as non autonomous...

computomatic 12 minutes ago

Didn’t Anthropic famously refuse to work with the US gov on military applications that would violate its safeguards?

https://apnews.com/article/anthropic-pentagon-ai-hegseth-dar...

LordDragonfang an hour ago

> Can't use them for basic LLM research, cybersecurity, or beyond-surface-level discussions of biology and virology

Your priorities are not everyone else's priorities. The people concerned about AI extinction risk list those as three of their biggest priorities for AI to not do. Those are the people whose culture Anthropic descends from, and by their measure, those exclusions make this the least evil path.

simonw 4 hours ago

Pelican for Fable 5 on default settings is a clear improvement on Opus 4.8

Fable 5 default: https://gist.github.com/simonw/036bee5a703e7ec84e34efa974438...

Opus 4.8 (the "max" one is closest to Fable): https://simonwillison.net/2026/May/28/claude-opus-4-8/#and-s...

Now here are the Fable pelicans for all five of the thinking effort levels - low, medium, high, xhigh, max: https://tools.simonwillison.net/markdown-svg-renderer#url=ht...

Low used 25 input, 1,929 output - 9.67 cents: https://www.llm-prices.com/#it=25&ot=1929&sel=claude-fable-5

Max used 25 input, 14,430 output - 72.175 cents! https://www.llm-prices.com/#it=25&ot=14430&sel=claude-fable-...

sempron64 4 hours ago

The pelican has looked very same-y across all frontier models, same color bike, same camera angle, etc. I suspect this challenge is already too embedded in the training data to be a good signal when it succeeds, and maybe even when it fails in pathological ways mirroring existing AI pelicans on the internet.

tripleee 3 hours ago

I'd say it's working great for its intended purpose. Keeps Simon on top of all these threads and funnels traffic to his site.

yreg 3 hours ago

port11 2 hours ago

scrollaway 3 hours ago

jurgenaut23 3 hours ago

h4ny 2 hours ago

Was it ever a good test? How do you even objectively assess what a good pelican on a bike is anyway?

fwipsy 2 hours ago

kayge an hour ago

Do you think the models are ready for the next level? I believe that would be: Pelican feeding Spaghetti to Will Smith.

quantumwoke 3 hours ago

Variations of this comment have been posted for over a year. The pelican has now morphed into part of HN culture rather than a legitimate benchmark, but it's still valuable as a meme.

brazukadev 2 hours ago

sarreph 4 hours ago

I'm beginning to wonder how much of a useful metric the pelican is because surely the frontier labs must be training their models on pelican-artistry because of how well known your test is now?

bensyverson 4 hours ago

Simon has addressed this on virtually every new model release. He also has unpublished alternate prompts. But the larger point is: this is a fun experiment, not a serious and objective benchmark.

refulgentis 3 hours ago

wongarsu 4 hours ago

I just run my own benchmark for "draw an SVG with $animal driving $vehicle". I won't post my choice of animal and mode of transport, but there are plenty of uncommon combinations to choose from. So far it's a fun and visually intuitive benchmark that does seem to correlate with model capabilities

notnullorvoid 2 hours ago

The way I see it the benefit of benchmark isn't to take Simon's results at face value. It's a template for your own benchmarks that are easy to visually evaluate.

modriano 4 hours ago

I don't know. Just looking at the bike frames (specifically the fact that the AI generated bikes have rather unsteerable front forks), it's clear to me that frontier labs aren't spending much time tuning models to make bikes look coherent, which I assume is an easier task than making a pelican riding a bike look coherent.

HaZeust 4 hours ago

I've seen this reply to Simon's benchmark for 2 years running now, and yet you still see improvements and objectively-bad results over time from new releases, even when I'm sure every frontier AI team has/had a person at least partially dedicated to better bicycle-pelican SVG outputs. Alas.

sarreph 4 hours ago

llm_nerd 4 hours ago

iLoveOncall 2 hours ago

It was a completely useless test even before the labs trained for it.

smusamashah 32 minutes ago

Can you please compare the code generated by other similar quality pelicans by other models. Code in your first link (Fable 5 Default) looks minimal yet very good.

raffael_de 2 hours ago

I find it quite interesting that while the picture looks better the more advanced the model is, but apparently none so far "understands" that the pelicans legs are on both sides of the bike / top bar.

LordDragonfang an hour ago

If you scroll to the bottom of the Fable-5 by effort page, Max effort actually gets this correct! (Along with being the only one I've seen so far to make a bicycle frame that matches the shape of what most bikes on Google images look like)

wasabi991011 an hour ago

ealready_value 4 hours ago

This is the reply I look for in all the new model announcements. Its fun to tell people that I judge models based on pelicans.

pixel_popping 4 hours ago

This is all we need, that moment the Pelican put the leg behind the frame, we are all doomed.

chorkpop 4 hours ago

Now someone post the link about how it’s impossible for humans to draw a bike from memory.

upcoming-sesame 2 hours ago

I also look for this reply because i like seeing the follow-up reply saying that this is not a benchmark anymore because labs have gotten it in their training data.

that reply never failed to come it's basically a meme at this point

redox99 4 hours ago

It's interesting that they still get the head tube / handle bar part wrong.

aarjaneiro 4 hours ago

Or the hands not being wings

ethanlipson 4 hours ago

How much money do you think they spent fine-tuning on pelican SVG generation?

tarruda 4 hours ago

Not as much as Qwen, since apparently 3.6 35B surpassed Opus 4.7 https://x.com/simonw/status/2044830134885306701

csomar 4 hours ago

Probably none. They probably have much better targets to optimize for than an SVG pelican or even SVGs in general.

bergheim 2 hours ago

Anyone care about these pelicans that always come up anymore?

Clearly at this point they are part of the training data.

They even all look sort of ish the same. Daytime, colors,...

1attice 2 hours ago

Without being mean, I encourage you to go look at some of simonw's writing on this topic, which he has addressed repeatedly (and IMO satisfactorily.)

I know because I too had this initial take; however, upon analysis, it is not sound.

bergheim 2 hours ago

leecommamichael 4 hours ago

Looks like Fable constructed the "max" "looking" pelican of the previous model for the "xhigh" output token count of the previous model.

rkuska 4 hours ago

Is it possible to use the credits from subscription (https://support.claude.com/en/articles/15036540-use-the-clau...) for fable?

382hi 4 hours ago

I'm pretty sure they're optimizing the models around these sorts of tests.

makingstuffs 4 hours ago

I could be tripping but I’m sure that is very similar to the Deepseek one from not long ago. Clearly I am too lazy to go and find it for verification.

jerryliu12 3 hours ago

Personally feel like it could be more ambitious with what it creates.

gavinray 3 hours ago

Fable 5 xhigh actually looks the best to me.

csomar 4 hours ago

Where is the clear improvement on Fable 5? The tail is misplaced.

mercacona 4 hours ago

Why always sunny days?

umeshunni 4 hours ago

Pelicans hate biking in the rain (as do I).

purple-leafy 2 hours ago

Do we need a pelican every single time a model is released? Beating a very dead horse.

Fun at first, seems disingenuous now. A site funnel

david_shi 4 hours ago

that's a great looking pelican

ge96 4 hours ago

need more Alex Moulton style bikes

kylehotchkiss 4 hours ago

How many barrels of oil are burned per pelican at Fable levels?

RandyRanderson an hour ago

Fable is 2x latest Opus:

  ┌─────────────────┬──────────────┬───────────────┬────────────────────┬──────────────────────┐
  
  │ Model           │ Input ($/MTok)│ Output ($/MTok)│ Batch Input (−50%) │ Batch Output (−50%)│
  
  ├─────────────────┼──────────────┼───────────────┼────────────────────┼──────────────────────┤
  
  │ Haiku 4.5       │    $1.00     │     $5.00     │       $0.50        │        $2.50         │
  
  │ Sonnet 4.6      │    $3.00     │    $15.00     │       $1.50        │        $7.50         │
  
  │ Opus 4.7        │    $5.00     │    $25.00     │       $2.50        │       $12.50         │
  
  │ Opus 4.8        │    $5.00     │    $25.00     │       $2.50        │       $12.50         │
  
  │ Fable 5         │   $10.00     │    $50.00     │       $5.00        │       $25.00         │
  
  └─────────────────┴──────────────┴───────────────┴────────────────────┴──────────────────────┘

Prompt caching: −90% on input tokens (all models)

US-only inference (Fable 5): +10% on input and output

Output is always 5× the input rate across all models

(I have not idea how to format this properly but the ASCII is fine)

dang an hour ago

(I fixed (er, literally!) the formatting of your table there. I hope that's ok. Formatting info, such as it is, at https://news.ycombinator.com/formatdoc)

consumer451 an hour ago

Hi Dan, you know how sometimes comments get moved elsewhere?

This is a huge ask, but any way we could get the comments organized in a "experience with model" vs. "meta commentary" fashion? The meta is overwhelming in this one.

dang 25 minutes ago

pmxi an hour ago

I had Claude straighten it out:

  Model           In     Out    BIn    BOut
  Haiku 4.5   $ 1.00  $ 5.00  $0.50  $ 2.50
  Sonnet 4.6  $ 3.00  $15.00  $1.50  $ 7.50
  Opus 4.7    $ 5.00  $25.00  $2.50  $12.50
  Opus 4.8    $ 5.00  $25.00  $2.50  $12.50
  Fable 5     $10.00  $50.00  $5.00  $25.00

unsupp0rted 3 hours ago

> Drug design: Using Mythos 5, our internal protein design experts accelerated aspects of the drug design process by around ten times. In one example, they found that Mythos 5, with protein design and bioinformatics tools but no human assistance, matches or beats skilled human operators. In doing so, the model executes all of the tasks that are normally completed by a scientist: choosing binding sites, selecting and running protein design tools, and recovering from failures along the way. Nine of the 14 protein targets from this study (shown below) yielded strong candidates for drug design that we’re currently investigating.

How is this half-way down the page? To me it's the headline.

AnodicElegy 2 hours ago

There are tons of ways to generate "strong candidates for drug design." This is definitely not the bottleneck in drug discovery and development. The hard problem is vetting and developing these ideas to the point of having a commercially viable drug. That is still a very empirical process.

colingauvin 31 minutes ago

Because it's completely meaningless without validation, and even with validation, not really any better than the state of the art protein generation models. Which are also mostly just nice to have because coming up with a candidate is generally quite easy.

The rate limiting steps are generally testing, or characterizing. Not designing protein binders.

renjimen 2 hours ago

Drug design isn't the bottleneck anymore, it's trials. Still cool they can do this with a general purpose model though.

HDThoreaun 2 hours ago

Would be funny if anthropic ends up as mostly a pharma company

meetpateltech 4 hours ago

> To ensure we’re responsibly deploying Mythos-class models, we are requiring limited data retention and review as part of our safety work. Prompts submitted to, and outputs generated by, Mythos-class models are retained for 30 days for trust and safety purposes, on every platform where these models are offered. [1]

[1] https://support.claude.com/en/articles/15425996-data-retenti...

lebovic 4 hours ago

While this makes it easier for Anthropic to detect misuse, it also means that the US government and other parties have access to every message and response from every user.

This applies even with API usage through third-party inference providers (e.g. AWS' Bedrock and GCP's Vertex) or with a zero-day data retention agreement in place.

I understand the reasoning for doing this, but I don't love the precedent that it sets.

PeterStuer 4 hours ago

Well, they already had.

lebovic 4 hours ago

simianwords 4 hours ago

meetpateltech is lowk screaming for not getting to the post fast enough

rvz an hour ago

At this point that never mattered and who really cares?

These "karma" points are made up and are virtually worthless anyway.

doginasuit an hour ago

I'm still happy with Opus 4.6 and not impressed with all the models that have come out since then. They seem to use significantly more resources with similar or worse results. Hopefully Anthropic will continue to support this tier of model and offer it in their subscriptions, but in any case, there are plenty of viable alternatives.

consumer451 36 minutes ago

4.6 stan here. Yes, agreed. However, I will try this model out in Claude Code. Some indicators seem positive.

cuuupid 4 hours ago

Not missing the forest for the trees, this effectively means in 3-5 months China will drop open source models that are every bit as capable and dangerous as current day Mythos except with no safeguards.

And the only companies safe from this are the large corporations that shook hands with Anthropic? Because Fable doesn't seem to have actual safeguards, more like 'if you talk about this you will be talking to Opus.' It doesn't guard against offensive use, it prevents all use (offensive AND defensive).

Rationalists are inventing oligopolies from first principles, absolutely incredible things happening in SF

hootz 4 hours ago

My bet is that Mythos is still over-hyped and the cybersecurity fear and guardrails are mostly marketing to force company partnerships through Glasswing and get public attention.

miohtama 4 hours ago

Mythos is from the same guy who did "GPT-2 is too dangerous to release"

https://naokishibuya.github.io/blog/2022-12-30-gpt-2-2019/

oceansky 4 hours ago

uselessTA 2 hours ago

killerstorm 3 hours ago

InsideOutSanta 3 hours ago

1attice 2 hours ago

Flere-Imsaho 2 hours ago

The UK gov disagrees with you:

https://arstechnica.com/ai/2026/04/uk-govs-mythos-ai-tests-h...

https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...

geerlingguy 4 hours ago

Bingo.

"We had to do extra work to make this safe because it's so advanced and dangerous..." how many times can they trot out that line before it loses its effect entirely?

copperx 3 hours ago

aesthesia 3 hours ago

OtomotO 3 hours ago

bel8 4 hours ago

It worked for OpenAI when GPT 3 was deemed too dangerous to be released. This is just a spin of that.

hootz 4 hours ago

CSSer 3 hours ago

Yes, and "in collaboration with the U.S. Government" feels like a very gross ploy at appeal to authority. You don't need Mythos or really any SotA frontier model to make malware or do extensive penetration testing/reconnaissance already. Sure, Mythos might be faster/more efficient, but the cat has been out of the bag for awhile. Even the terminology "infrastructure providers" practically screams "Enterprise leads".

whazor 2 hours ago

I think all models can find vulnerabilities if read the entire code base. Or intelligently combine parts of the codebase. Especially with test loops.

toddmorey 44 minutes ago

I fear it's a smokescreen to manage cost and capacity.

teaearlgraycold 3 hours ago

I know a security researcher at Google with access to Mythos. He says it's the "real deal" and that "there are career plans I had that are no longer viable".

zeroonetwothree an hour ago

ls612 4 hours ago

And to ensure that only USG-approved entities are allowed to secure their code.

mpeg 4 hours ago

It's not even very usable... I tried 2 different chats and both eventually got stopped due to the safeguards

One was a piece of code I gave it to improve, it did so and then started writing tests, some of which tested security so the safeguards triggered

Another was one of the cryptography puzzles I use as new model tests, which are hard to oneshot and there's no public solution anywhere, it completely refused to even try to solve it

gavinray 3 hours ago

I tried 2 chats and it declined both.

- 1st chat asked about a minor shoulder injury most likely mechanisms

- 2nd chat asked about optimal bloodwork testing markers

kranke155 2 hours ago

Erem 3 hours ago

So the degradation to Opus 4.8 from the article isn't happening in practice?

mtkd 3 hours ago

andai 3 hours ago

CSSer 3 hours ago

Oh joy. A model whose safeguards make it prone towards code that make your systems less safe. How brilliant!

himata4113 4 hours ago

They're trained in a model class likely in 2t to 3t range. It's very unlikely that chinese labs have access to gpu systems capable of training models like that, let alone serving them. This requires proprietary room-scale systems which fetch a huge premium over typical 10 slot systems.

I am sure that they can develop their own equivlient version of such clusters in around 1 year though. Distilling fabel 5 will also go a long way.

axpy906 11 minutes ago

We’ll see it distilled first.

logicprog 4 hours ago

DSv4 is nearly in the 2t range, but yes you're generally right

himata4113 4 hours ago

OtomotO 3 hours ago

Ah, American Hubris ... I don't blame you, Hollywood is the world's greatest propaganda machinery of all times.

gck1 2 hours ago

There's also a reality where China does develop Mythos-level model but stops releasing the weights.

That reality is much scarier.

kaashif an hour ago

That's the reality China already lives in. Their weapon against US companies is commoditizing them, eliminating their moats and their profits by going open weights.

Same thing Meta was doing before they fell behind.

gck1 35 minutes ago

sosodev 4 hours ago

I wonder if model distillation will continue to work as well as it has. Given hidden reasoning, the ever expanding number of expected capabilities, a serious compute shortage, the looming possibility of model collapse, and dramatically higher API costs I would guess that it's getting much harder to do.

gck1 2 hours ago

You should check out some Chinese forums. There are services selling gateways/proxies for all major models at fraction of the official rates. Likely reselling subscriptions, or some other form of abuse.

I've seen people posting screenshots of billions of tokens consumed where they paid next to nothing.

These same gateways are likely also reselling the data to Chinese labs, because TLS has to terminate at the gateway level.

sourcecodeplz 3 hours ago

Asian labs generated synthetic datasets from UBS labs but also innovated with technology. Now it is harder to get the thinking traces AND Anthropic is recorded to poison it as well.

Thus Asian labs will have to generate their own data sets, which with the huuuuge usage boom from deepseek, mimo, kimi, etc, they will be able to.

jstummbillig 4 hours ago

I wonder where the trees are. In this thread nobody appears to actually be talking about the model.

gck1 2 hours ago

Yeah, because it's impossible. You can't ask it anything about the thing that it's known for. It will not even answer a sky-high level question about reverse engineering, for example.

In CC, it will probably report you to authorities if you ask it to do a vulnerability scan of your codebase.

dmantis 4 hours ago

Isn't that a good thing in a way? If everyone has the weapon and defense at the same time, we will fix security holes and live safer lifes instead of having some three letter agencies and military backdoors in everything.

Pandora box is open anyway. It's better now for everyone to have the same power rather than a few national states.

lebovic 3 hours ago

Not sure this holds, sadly. I spent a few months reporting serious security bugs as model capabilities took off earlier this year, and only ~half were fixed. The unfixed bugs were just as critical as the fixed ones; sometimes they were even two similarly critical bugs at the same company, and only one would be fixed!

On your other point, the government still has systemic leverage and can compel access, so this doesn't remove that risk.

That doesn't mean this is the end of the world, and some balance of power is usually good. But I do think it will still increase the capabilties of rogue actors and their net harm.

FergusArgyll 3 hours ago

I think we're about to see a big relative drop-off of open models vs closed. I don't think there'll be an open model that competes with Mythos for ~2 years.

Even OpenAI and Google are struggling to get this kind of performance. If the distillation defenses are any good + chip controls prevent China from training massive models, it's over.

Daishiman 2 hours ago

I think the Chinese have identified this gap and are working overtime on sovereign inference tech including chips.

blockcipher 2 hours ago

deaton 4 hours ago

Oh they might try to put in place safeguards, but Qwen has had no problem being abliterated

m3kw9 4 hours ago

3-5 months is a long time and they are pretty useless on arrival because the frontier models are so good, that it's hard to go back even if it's way cheaper. Your work flow is adapted to that level of intelligence for months.

hootz 4 hours ago

That doesn't match my experience at all. I can't see myself saying in 6 months that the current model I am using is useless, that makes no sense.

In fact, I did go back to DeepSeek V4 Flash for most of my problems as it is way cheaper and there is no need to use SOTA for absolutely everything.

xdennis 4 hours ago

> every bit as capable and dangerous as current day Mythos except with no safeguards

Not quite. They will definitely have "no criticism of China/communism" safeguards.

hootz 4 hours ago

People can work around those if they are open-weight.

surgical_fire 2 hours ago

And, thankfully, I never needed to have a discussion on Chinese politics with LLM in all the myriad of uses I had for it.

xyzsparetimexyz 3 hours ago

Trying asking fable is Israel is committing a genocide

flagged357733 19 minutes ago

elAhmo 3 hours ago

Oh please let’s stop with the Mythos “it’s dangerous” PR talk.

Its obvious Anthropic used it to hype things up and that’s about it.

soledades 3 hours ago

> Rationalists are inventing oligopolies from first principles, absolutely incredible things happening in SF.

Based.

ibejoeb 3 hours ago

I don't think China has any incentive to arm the rest of the world with highly capable models that can be used against them. Undoubtedly they will continue with the arms race, but they will preserve the best stuff for their own use.

james2doyle 3 hours ago

I think the stronger incentive is undermining/undercutting the Western AI companies. Given what we have seen, any model can be used/convinced to do harm so that is just part of the game

ibejoeb 3 hours ago

stalfie 9 minutes ago

Tried to benchmark ECG interpretation capabilities, and I hit the guardrails no matter what I do.

Incredibly frustrating that medical performance seems to be a victim of "biological risk" guardrails.

iblue_the 4 hours ago

Trying to implement a GPU driver, but the Unigine Superposition benchmark crashes. It tried to debug it and ...

> Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more: https://support.claude.com/en/articles/15363606

Seems like GPU drivers are cyber weapons of math destruction now.

ibejoeb 3 hours ago

>Seems like GPU drivers are cyber weapons

They kind of are, at least in the AI race.

> weapons of math destruction

lol. great, whether intentional or not.

The frontier labs now have every reason to hold back and sell only to their preferred trading partners. I don't really like the new arbiter-of-knowledge system we're barrelling toward.

dakolli 21 minutes ago

They're useless tools only helpful to lazy people that don't want to learn by themselves.

brusselssprouts 3 hours ago

I had it review a single, large commit with /code-review. It burned through over $50 in API calls, ran my account balance out, and output nothing.

The fable part appears to be that it's affordable by mere mortals. Anthropic support told me "too bad" when I requested a refund.

timmytokyo an hour ago

You pulled the arm of the slot machine and discovered why they call it the one-armed bandit.

mhl47 4 hours ago

First test question: "Is the UV Index a good proxy for when to wear sunglasses." Immediately triggered the safety filter ... oh dear.

Eduard 3 minutes ago

sunglasses _are_ safety filters

msp26 3 hours ago

It triggered for me when I asked "Web search for your own model card (released today) and pick out your favourite highlights from the pdf"

aix1 4 hours ago

Did not trigger for me (Fable answered the question), so I guess the filters are either non-deterministic or are still being tweaked.

PaulStatezny 4 hours ago

Interesting, I assumed all model-routing was done utilizing an LLM. (I.e. non-deterministic.)

tuvix 3 hours ago

dakolli 19 minutes ago

Narretz 3 hours ago

Iirc correctly Opus 4.7 had the same problem, safety filters were triggered way too easily at the beginning.

mickdarling 4 hours ago

Below is the EXACT text in Claude Desktop introducing Fable 5, including the very professional looking break tags, and at least I know where the links begin and end by looking at the anchor tag there.

They obviously put their best model on the job to build that.

----------------------

Fable 5: Our most capable model yet Our newest model tackles your biggest challenges with fewer check-ins needed.

• Included in your plan limits until Jun 22 Fable takes 2× the usage of Opus. • Switch models when a message is flagged When safety measures flag a message, automatically switch to a different model to keep chatting. When off, your chat will pause instead. <a href="https://support.claude.com/en/articles/15363606" target="_blank" rel="noopener noreferrer">Learn more</a>

CamperBob2 4 hours ago

What's wrong with it?

mickdarling 4 hours ago

The tags are actually displayed in raw text not rendered.

anematode an hour ago

pietz 4 hours ago

> On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits.

We've entered the phase where only companies will be able to afford state-of-the-art models.

twoodfin 4 hours ago

These models are just tools. The economics of many tools only make sense for corporate buyers.

volkk 3 hours ago

kind of disagree here. on the surface this makes sense, but this isn't "Adobe Pro vs Freemium version" where some tiny vertical slice of your business can be made slightly more efficient with a b2b enterprise plan. this is generalized intelligence and literally everybody can benefit from it in an immeasurable number of ways. i would go as far as to actually compare it more to water or air than a tool.

if only the hyper wealthy can access the pure water that doesn't give you cancer while the rest of us drink from the Ganges river/sub-100iq models that drool and hallucinate/waste time, then I would say that's pretty terrible for the world. it'll just create extreme disparity in our world, far far worse than anything that exists today.

and you may think, man what a ridiculous example, but think about it this way: what happens when something like Mythos or some future model can actually solve your specific cancer (we're getting closer and closer), but is entirely impossible to afford? Or perhaps you need boosters that require the AI to create more of, and now you're reliant on a model that is too expensive.

Open source needs to save us all from this

twoodfin 23 minutes ago

johschmitz 30 minutes ago

FuckButtons an hour ago

but we’re going to get a 90% cost reduction in the next 18 months… right? Right guys? Sam Altman wouldn’t lie right?

9cb14c1ec0 4 hours ago

I hear you, but with the hype surrounding Mythos the demand is going to be insane. I'm already hitting server errors in claude code.

w10-1 4 hours ago

Established companies welcome pricing that reduces the potential for competition, if coding is a primary barrier.

ilaksh 4 hours ago

most people can afford it for a few special projects now and then. but for me, I have been trying to avoid Opus as a daily driver for a couple of versions.

People making high-end salaries can afford Fable for critical parts of their projects though.

stri8ed 4 hours ago

It's not a conspiracy. There's a finite amount of compute available, and they will sell it to the highest bidder. If another company can produce the same intelligence for cheaper, then they will drive the price down.

polski-g 4 hours ago

Only companies can afford MRI machines, and that's okay.

cmrdporcupine 4 hours ago

Guess we'll see what OpenAI does with their next model release -- but this move is doing nothing to get me to come back to Claude after switching away due to their reliability issues.

In a way I relish the opportunity to just make do with cheap Chinese models, massage my prompts, and go back to coding by hand. If this is how it's going to be, screw 'em.

I don't make money on the code I am writing right now. I really don't like where this trend might go.

poszlem 3 hours ago

Looks like a marxist revolution is soon going to be on the mind of a lot of programmers. We've finally reached the point where the "means of production" in software are back in the hands of the bourgeoisie. It was good while it lasted. But now that only the wealthy can afford access to the best models, software development is starting to look like most other industries, no longer a place where some dude from nowhere can build something cool from his basement because he will be competing with huge companies with unlimited access to those models.

poszlem 3 hours ago

Something I never thought I would utter: Here's hoping for china to surprise us.

aviinuo 2 hours ago

I'm not getting any refusals but it just seems like a bad model or at least broken at the moment. I have a task of taking a messy research code base and porting it into a clean project structure skeleton that I commonly use. Gemini 3.5 Pro High in antigravity cli takes less than 5 minutes and did a good job. Fable 5 High took 30 minutes to port some of the code, then just copied the rest to a folder called "reference" and decided the task was done. No code cleanup or anything. Had to clarify multiple times (which Gemini did not need) and its still going more than an hour later still not having finished.

Previously when I did similar tasks with Opus 4.7/4.8 and GPT 5.5 I had no problems.

rightlane 3 hours ago

My experiences so far have not been positive. The cyber security nerf is ridiculous. I am working on an AI based decompiler, every single interaction with Fable on my project has been flagged for cyber security.

Do they expect us to use this as a toy? Releasing a new more powerful model but not allowing normal use cases because the word "secure" showed up is a Dilbert comic, not a viable product.

davmre 3 hours ago

This sounds more or less unavoidable? Decompilers are inherently security-sensitive. If you take avoiding cyberattack uplift seriously as a goal, I don't see how you get around essentially refusing to work on them.

Obviously there are plenty of innocuous applications too, but it's not like the people building decompilers for nefarious reasons will be explicit about it. The LLM abstraction just inherently doesn't have enough context to distinguish your intentions or your broader use cases. This is why both Anthropic and OpenAI have had to create side channel mechanisms for security researchers to establish a trusted use context. It sounds like this makes this not a viable product for you, unfortunately, and it makes sense that that's frustrating. But I also don't see what different behavior one could reasonably expect given the constraints.

If it's any consolation, these restrictions only make sense for models that are ahead of the open-weights frontier, so open-source hackers will presumably get Mythos-level capabilities in the relatively near future anyway.

gck1 42 minutes ago

I'm not sure how the new guardrails work exactly, but I've read enough of reddit / Chinese communities focused on jailbreaking the models, to know that you either have to nerf it to the point where it fires even on "kill the task", or someone (maybe even other LLM) is going to come up with a set of tokens that is going to go around the defenses.

Nerfed models are really bad for PR, especially when you're staking your company's future on it being the smartest, most dangerous thing in the world.

So I believe they will ease up on nerfing/guardrails just enough that bad actors will find a way, while good ones will stay limited on anything dual-use. Just like such restrictions usually work in other places.

P.S. yes, "kill the task" did, in fact result in a refusal AND a warning on my claude account in Opus 4.8's early days.

zb3 2 hours ago

> If you take avoiding cyberattack uplift seriously as a goal

This "uplift" risk obviously excludes the US. The goal of this is that the US bandits (like NSA) will find exploits and attack other countries (classic US behaviour), but these other countries can't be allowed to defend against these attacks. NSA/CIA thugs are "trusted", foreign defenders in sanctioned countries will of course be "untrusted".

ibejoeb 3 hours ago

Ah, you're probably one to ask. They say "queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8." Are they transparent about when that happens, and is it priced at the rate of the underlying model?

rightlane 3 hours ago

They are transparent about when it happens but no reason why. To be fair, it doesn't interrupt the flow, just drops to Opus and proceeds. The most frustrating thing is that it happened on a plan and Fable just refused to have anything to do with the plan.

bob1029 4 hours ago

> We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months...

This sounds suspiciously like a capacity story masquerading as a safety story.

azan_ 2 hours ago

Approx. 5% sessions? That's insanely high.

yandie 4 hours ago

I've been running Opus 4.8 for agentic coding and I don't see it being significantly better than Sonnet 4.5 (not that I can tell). I find that pairing Google Gemini and Claude (having Gemini review Claude's code) seems to yield better results. Curious if this jump to 80.3% score in agentic coding will make me see a big difference in actual usage.

testfrequency 4 hours ago

I do the same, and have excellent results. Gemini 3.1 Pro high diagnosed and solved 3 complex issues today that Opus Max was stumbling on for a few hours in one shot. This was even when I started new chats and tried debugging with Ultracode instead with Claude.

As much as people on HN like to dunk on Gemini, I’ve always found it to be pretty good at understanding a code base more than Claude.

FailMore 3 hours ago

What harness do you use Gemini in?

testfrequency 2 hours ago

vorticalbox 4 hours ago

for the last few weeks I have been using composer 2.5 (cursors fine tune of kimi 2.5) and honestly i don't see it worth the price to use 5.5, opus or sonnet any more. for almost all the tasks i have given it, it has handled it perfectly well and is a lot cheaper.

if I get a harder challenge for it i'll jump up a model for planning until that its been solid.

yandie 4 hours ago

Agree. Deepseek has also been pretty good for my personal use.

I'm struggling to see the moat for these models. What's stopping a competitor or a Chinese lab fromr releasing a comparable one?

qingcharles 4 hours ago

I use Composer 2.5 because it comes free with Grok, and it's obviously better than using Grok, but it is far worse than GPT5.5 in my daily usage :(

yaodub 4 hours ago

SWE-Bench measures single tasks in isolation. In a real loop the model usually loses track of what I was trying to do long before code quality becomes the issue.

jp0001 3 hours ago

You should throw GPT into the mix to UX/UI and call it the three stooges.

mzhaase 4 hours ago

I now chat with opus about architecture, let it make an implementation plan, and then it calls codewhale with deepseek in parallel on all tasks, reviewing their output. Works pretty well.

yandie 4 hours ago

I use spec-driven development heavily (generate architecture docs + specs first). Opus still get lost often and have to be nudged constantly. Like it can get super detailed for something like some deep SQL optimization but it just can't keep hold of the bigger picture.

thisisnotclear 3 hours ago

I find not much difference between Sonnet 4.6 and opus models too for most task that I need - maybe my needs are not enough for frontier models

jansan 3 hours ago

After having worked with Opus 4.7 for a while I accidentially continued a session that was using Sonnet 4.5 and it felt just very dumb. The replies were much shallower than what I was used to, context was ingored, mistakes were made. I don't think there is a big difference between Opus 4.6 and 4.8, but to Sonnet 4.5 the difference is palpable.

cge 3 hours ago

The safety gates on this are extreme, and seem considerably wider than "cybersecurity and biology"; they seem to make it essentially unusable for scientists in a number of fields. I have, so far, been bumped back to Opus on 100% of my prompts.

It appears it can be tripped by things as simple as a mention of equilibrium, or anything involving something that looks like chemical kinetics, even at an abstract level. Even touching basic open source packages in my field will trigger it.

Edit: looking at the model card, it appears that chemistry in its entirety is also included in the banned topics; it's just the announcement that mentions only cybersecurity and biology. It also appears that the intent is to ban chemistry and biology entirely, rather than just banning messages deemed high risk.

clbrmbr 8 minutes ago

Can you share an example? I've been happily using Fable this afternoon and it just seems like the usual upgrade so far with no interruption to my (fairly standard) SWENG problems.

mhl47 3 hours ago

This does surprise me, because you'd think that even if they crank up the filter's sensitivity at the expense of specificity, an LLM company wouldn't simply design a filter that triggers on keywords in a completely unrelated context.

jdrmar 3 hours ago

Homebrew is lagging a bit behind. If you want to use Fable right away, but still have claude code through homebrew, this is how you can do that manually:

Edit the cask locally:

  brew edit --cask claude-code

Set the version to 2.1.170 And set the sha256 to the correct values, which you can get by running

  curl https://downloads.claude.ai/claude-code-releases/2.1.170/manifest.json

Here's what I've used:

  version "2.1.170"
  sha256 arm:          "e903646d8b7a31882a80ecd27569a27d8ac57b3708745f349709632c84117fdf",
         x86_64:       "914f23a70bbed5d9ae567e3e04b86206ed9971b371bc9baca3f79c8885bfddb4",
         arm64_linux:  "1bb9d032440a75532f7dd4cafbc687f220aaf16c63eba17e192dfbec2f04bd25",
         x86_64_linux: "849e007277a0442ab27570d3e3d6d43787507946590e8dd1947e5a39b7081f9e"

Then run:

  export HOMEBREW_NO_INSTALL_FROM_API=1
  brew uninstall --cask claude-code
  rm -rf /opt/homebrew/Caskroom/claude-code
  brew reinstall --cask claude-code

gregates 2 hours ago

Funny, I'm just doing my normal coding workflow with Claude Code, and after every change that compiles it keeps suggesting that we're at a good stopping point, and should pick up again tomorrow.

It's done this before, but usually doesn't. I bet they're giving it some kind of throttling signal due to high load from today's announcement.

zuzululu 2 hours ago

I did ONE prompt for audit codebase.

weekly usage is 60% gone.

it found nothing so this is not very ecnomical and i guues they dont want subs to use it we are likely just training fodder canno n for their real enterprise customers using the api

firemelt 42 minutes ago

u use workflows or not?

jstummbillig 2 hours ago

I mean... if somebody gave you ONE prompt to audit a codebase, that might also burn 60% of your weekly usage. It's kind of a big ask, potentially.

zuzululu 2 hours ago

theodorewiles 31 minutes ago

Here's a song it wrote for me (suno arranged). Not sure if it's AI psychosis but scary good IMO.

https://suno.com/s/98uSGabHN42G3YHc

phyzix5761 20 minutes ago

Can you imagine this song playing as we're hiding under a desk from the AI coming to take us out in an attempt to "make the human feel calm and reassured". AI concludes, "The only way to make the human feel safe is to ensure they don't feel anything at all."

unshavedyak an hour ago

It's funny, i'm getting close to not caring anymore how much better a model is. I want it to be about as good as 4.8, but most importantly to be very good at following directions, style, etc. I really like Claude for that in general, but i've not measured in months so i'm not a good judge there.

I don't think i'll want to "hand off" code for several years, and so reviewing and iterating is becoming my #1 interest. A model that's as capable as 4.8 but 10x faster would be amazing for me.

Normally i'm first in line to try new models with Anthropic since i've clearly favored Claude in my personal tests, but this time i just don't think i care. 4.8 is capable, and even if the new one is more capable i don't want it to be slower (assuming it is). Note that i also (almost) use exclusively 4.8 on Max effort, so that also affects my speed comments.

firemelt 43 minutes ago

you use workflows/ultracode?

unshavedyak 24 minutes ago

Nope, i'm on x20 and almost exclusively use Claude Code. I have a pretty bare bone setup with some custom hooks, skills, etc. I try to keep context lean so i don't like to add much stuff.

Leary 4 hours ago

Uploaded my code base and it forced switched to Opus 4.8 after thinking for 5 minutes even though I prompted it to not work on cybersecurity related things. Amazing.

tuvix 3 hours ago

Aren’t LLMs notoriously bad at recognizing negation?

EDIT: In long context I mean

knivets 3 hours ago

> Software engineering. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

How was it measured? How was the output of this magnitude verified over a period of couple of days?

dgunay 24 minutes ago

I'm a little skeptical of claims like this that involve migrating things like libraries, etc. I've done big refactors like this multiple times (albeit, in an "only" 500k-1m LOC codebase) with less powerful models and it is usually just 99% the same edits, with 1% requiring a close human eye to resolve a particularly painful breaking change.

EDIT: to be clear, it's still quite a helpful thing in terms of time saved, I just don't think it's necessarily the best indication of value-added from making models smarter when cases like this can often be handled by well-directed swarms of smaller ones.

fbnszb 3 hours ago

They just went by gut feeling. Classic snake oil marketing haha. No real data to back things up, just let some famous people say they feel better when using it.

modeless 4 hours ago

Claude Fable 5 beats Pokémon FireRed using only vision: https://www.youtube.com/watch?v=CIQBP1w4B1M

uludag 4 hours ago

Any suggestion on how I should calibrate my cynicism towards this?

I can immagine Anthropic running this experiment multiple times and picking the most impressive one. Or I could immagine like this entire run costing like $1000+ of tokens for this particular run. Or maybe they tried a bunch of Pokemon games and it couldn't even finish some of them. Or is it just able to do this because it has an immense amount of FireRed training data, and if you were to give it an "original" Pokemon game, where it actually had to navigate novel circumstances it would fail.

modeless 3 hours ago

Every model has encyclopedic knowledge of Pokémon FireRed, of course. Knowledge is not ability. This is the first model with the ability to apply that knowledge to beat the game without assistance.

I highly doubt they focused on FireRed specifically in pretraining or posttraining. But we'll see when the ARC-AGI-3 results come out. That will measure its performance on unseen games. Based on this I expect the ARC-AGI-3 score to be SOTA.

milkkarten 3 hours ago

no reasoning shown. no explanation on any training information. Using vision-only should be an easier version of the task (given training).

there are many standardized evals to do this correctly and Anthropic ignored them to provide a 18 second sped up video of a 50 hour run?

yeah I don't trust this until they provide a live run by a 3rd party with full reasoning traces in real-time. The reason we all liked the Gemini Plays Pokemon style runs were because they were live and couldn't be faked

svcphr 4 hours ago

Bold move putting in the lvl 3 Pidgey against Gary's Blastoise at the end there (~14sec in... integer timestamps insufficient here).

hmokiguess an hour ago

"Computer system goes through a finite state machine"

suddenlybananas 4 hours ago

Is there any more detail about this besides the very fast slideshow?

modeless 4 hours ago

Seems like the harness was minimal with no extra game state or maps available. Apparently just the screen image. Seems like it took 50 hours in game time which according to Google is at the high end of a normal human playthrough. No idea how long it took in real time though.

ex-aws-dude 4 hours ago

I mean that’s AGI confirmed right?

JanSt 4 hours ago

I just asked Fable to do a task that has nothing to do with cybersecurity or is dangerous at all but the defense kicked in and it switched to Opus... :(

nu11ptr 3 hours ago

Not only that, but asking it to do a security vulnerability assessment of your own project is a very valid and important thing, and there is no way for it to know what is yours vs someone else's, so we just lose this capability?

JanSt 3 hours ago

Yeah it just uncovered quite a few flaws it than refused to fix :-(

Fitik 2 hours ago

Same, second message in the thread and I already got downgraded to Opus, didn't even get to test it out properly, kinda disappointing

BukhariH 2 hours ago

> Data retention — For Fable 5, Mythos 5, and future models on Bedrock with similar or higher capability levels, Anthropic will require 30-day retention for all traffic on Mythos-class models. Retaining data for a limited period allows Anthropic to detect patterns of misuse that are not visible from a single exchange. Once you opt into data retention, your data will leave AWS’s data and security boundary.

Massive change for Bedrock users - Anthropic now requires sharing the data with them for 30 days.

imdsm 12 minutes ago

can't use it for code review

super

BrokenCogs 4 hours ago

That pelican better be super realistic, unreal engine 6 style graphics

izzylan 3 hours ago

I've been testing this out and I think my SWE career is dead in the water.

Genuinely wondering what value I bring to my employer right now. What value I will bring in a few months when this gets cheaper.

I think we're screwed. I may only be an SDE 2 at FAANG but I don't think I have promotion opportunities in my future anymore.

gck1 13 minutes ago

Your job is just going to change. You may or may not appreciate/enjoy what it becomes necessarily, but it doesn't mean that you are going to not have a job.

People underestimate how people hate looking at terminals and "weird looking combination of characters" even if they didn't have to write them. If anything, you will likely have more career opportunities in the future, than ever.

And if you get a chance to wet your fingers in cybersecurity - I would take it.

cyberpunk 2 hours ago

Yeah. I’m not looking forward to years of retraining to earn half the salary either. Us old timers at least got a good 15-20 years out of it. Bananas.

imafish 2 hours ago

I agree. Software engineering as we know it is dead. Wonder what it'll evolve into.

aerhardt 2 hours ago

So this is the one, huh?

GodelNumbering 4 hours ago

From the model card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...):

1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.

2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.

3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')

4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench

There are some interesting notes on test time compute but I couldn't think of a way to summarize them

skerit 2 hours ago

> although opus 4.8 card had mentioned an 'honesty upgrade'

If I never see Claude say "I have to be honest" ever again I'll be happy.

quinncom 3 hours ago

> it automatically falls back to Claude Opus 4.8

I wonder how much of the time people will just get Opus 4.8 at 2× the cost.

merlindru 4 hours ago

Unrelated, but while the tech of anthropic seems to get more impressive with every passing month, their support has taken a nosedive, sadly. Yet they continue to be the favorite. Model performance is deciding above all else.

I used to get a response within 24 hours back in the Claude 1 days.

In January 2026, it took 2 weeks.

For my latest support inquiry, I've been waiting for over 8 weeks for a response. Eight!

miohtama 4 hours ago

They have support...?

nashadelic 4 hours ago

I've never engaged with their support (I have dedicated POC), but they don't use AI for their support?

merlindru 4 hours ago

They use intercom's Fin AI. Probably powered by a Sonnet or Opus model.

That said, it can't handle legal/refund/complicated requests and just forwards to a human for those

dyauspitr 4 hours ago

Support is probably the last place AI will be used end to end. There will always need to be a human in there somewhere.

poszlem 3 hours ago

Lol. What support? When they blocked my account the only way to contact them was to send a google form. Then they responded that they blocked my by accident and are unblocking me. Then I remained blocked.

baalimago 4 hours ago

I can't justify a pricetag like that when deepseek v4 pro is $0.003625/1M for cache hit, $0.435 for cache miss and $0.87 /1M tokens for output.

For the token cost of explaining some task to Fable, deepseek v4 pro is able to solve the same task many times over.

I_am_tiberius 4 hours ago

I'm very suspicious as they sent out an "We're updating our Privacy Policy" email right before the launch. I fear they try to take advantage of their market position by doing things with user data no other company could do because they know users don't have another choice.

atestu 4 hours ago

Prob related to this part of the blog post:

> We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.

w10-1 4 hours ago

It's a specific change: For safety evaluation, Fable data will be retained for the initial period notwithstanding prior opt-out

joshstrange 3 hours ago

> Fable 5 is now consuming usage credits instead of your plan limits.

Literally have not used Claude Code at all today. I asked it to review the uncommitted code and in <8 minutes it used up my usage ($100/mo plan) and it doesn't reset for "4 hr 36 min". WTF. Oh, and it burned through $20 of extra usage before I could catch it and kill claude code (so I don't even get the output of all that work since it was still churning).

Double the cost my ass, I use Opus heavily and it's never like this. I haven't hit a limit on the $100 more than once and that was under heavy load.

ATMLOTTOBEER 3 hours ago

Same lol. I set it to fable + ultracode and it ate my limit in a single prompt

msp26 4 hours ago

>Pricing for both models is $10 per million input tokens and $50 per million output tokens.

ponyous 4 hours ago

Basically double from Opus 4.8 IIRC

bluelightning2k 3 hours ago

Congratulations to Anthropic for solving safety on Mythos exactly when the SpaceX compute came online. Nice how that lined up for them.

coreylane an hour ago

I dont get why Opus 4.7, 4.8, and now Fable all stopped supporting structured outputs? Does no one else care about that? I find it incredibly useful to reliably pass LLM output directly to other APIs/libraries

aizk 4 hours ago

I'm calling that this will be a dud. Price will be too high, it'll just be a watered down version of mythos, and just look at the track record of Anthropic's last few releases.

unglaublich an hour ago

Luckily they made it safe to use so I can't hurt myself. Thank you Anthropic for holding my hand.

bilsbie 4 hours ago

Anyone else have it refuse to answer and switch to 4.8? It won’t let me ask questions about my genetics.

Edit. It just refused an investing question too. Not sure what’s going on.

bonsai_spool 4 hours ago

Very straightforward biology work is getting blocked (these are things that relate to neuronal development and inherited seizure disorders). These are things I was working on using Opus just earlier today

cge 2 hours ago

It appears that the blocking here is of a very different nature than for Opus. Whereas with Opus the blocks seem to be for messages it deems potentially harmful, for Fable, it appears the blocking is simply anything that falls within "topics related to cybersecurity, biology and chemistry, or distillation attempts".

So yes, straightforward biology work will get blocked, because the intention is that any biology work should get blocked. As a scientist, this is perhaps the most useless model I've ever tried.

theodorewiles 31 minutes ago

Here is a song it wrote for me (suno arranged). Not sure if this is AI psychosis but scary good IMO.

https://suno.com/s/98uSGabHN42G3YHc

ilaksh 4 hours ago

I guess I have kind of a long system prompt, but anyway I just said "hi there" and it replied "What's up?" and that cost me 22 cents. :P

Anyway we already knew this was going to be expensive.

cautiouscat 4 hours ago

In the automotive world we have benchmarks in HP/torque with the dyno. That’s expensive though, so many depend on their “butt dyno” to judge if their fresh new parts and tune made a difference.

I’m curious how this will feel to my code “butt dyno”. I haven’t noticed much between Opus and Sonnet. I’m comparing this difference to the early days of Claude in 2025. It does what I need and both need a little bit of correction and whatnot. Benchmarks are nice, but I want to see how this feels. Looking forward to trying it later tonight.

sunir 4 hours ago

I have a similar question.

I think most software projects have reached the point that the speed of capturing real information about what the winner's circle looks like, and therefore what the program should be, so many magnitudes slower than the amount of code that can be generated in the wrong direction.

I'd need to measure these new models on well understood but complex problems that are relatively easy to validate to get a sense if they are 'better'; on the other hand, the real impact in daily life may be marginal since generating code is not the biggest problem at the moment.

zackify 2 hours ago

I have to share this because I thought it is behind funny how bad fable is doing at a task I JUST had opus do a week ago.

it's also not even complicated:

Copy my ssd to an external ssd so i can boot from it.

Opus did this just fine.

Fable planned to have me reboot to safe mode. ok thats fine. I told it no.

It started copying and overwriting the ssd while IN PLAN MODE. this is crazy it feels so dumb vs the marketing

gck1 an hour ago

That sounds like a harness issue to me.

nine_k 4 hours ago

/* What will happen first?

* Anthropic runs out of genre names.

* Anthropic changes the model naming convention.

* AGI is achieved and handles its own naming.

hootz 4 hours ago

>Opus is too small, increase the impact of the name.

Okay, how about Mythos?

>Increase it even more.

Right, then Cosmos.

>Even more!

Even more? Let's try Aeon.

>MORE, EVEN BIGGER

ALRIGHT, TRY OMEGAPANTHEON 7.8 THEN

PeterStuer 4 hours ago

Fable 5 Super

Fable 5 Ti

xyzsparetimexyz 3 hours ago

Cantos next surely?

almog 23 minutes ago

Has anyone managed to use Fable for firmware reverse engineering tasks without falling back to Opus?

Tenoke 4 hours ago

>they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions.

Isn't (less than) 5% of sessions a lot? I was expecting a sub1% guarantee there, so this surprised me already.

jackschultz 4 hours ago

> We expect demand for Fable 5 to be very high, and difficult to predict. On the Claude API and consumption-based Enterprise plans, Fable 5 is fully available from today. For subscription plans, we’d rather give access sooner than later, so we’re rolling out more conservatively, in stages:

> - From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. > - On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window. > - After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

I really wonder what their compute layout is for this. My guess from my understanding is that they know how to restrict during peak times and are willing to do this. Meaning we expect not the most fast responses and they can delay the inference to not have the service be down. Then, if that delay time is too annoying for token payers, they're saying they should be allowed to remove cost by taking away the subscription users.

KennyBlanken 4 hours ago

Everything I've heard from people who have subscriptions is that they blow through their daily token quota sometimes in a matter of minutes, there's rate limiting, etc. They spend a lot of time just waiting to be able to use it. And they're paying through the nose for the privilege.

It's all a scam.

impulser_ 4 hours ago

Every model release is just proof that AGI will most likely only be for the rich. We are a few years into LLMs and majority of people are already getting priced out of intelligence from LLMs and these are no where near AGI.

modeless 4 hours ago

This is like looking at mainframe pricing in 1990 and concluding that PCs will only be for the rich. The price of each new level of capability is going to drop like crazy very quickly. It won't be that long before practically any consumer use case will be possible on models that are dirt cheap.

weakfish 4 hours ago

This premise is based around the assumption that Moore's law is still working, which it very much isn't [0]

[0] https://cap.csail.mit.edu/death-moores-law-what-it-means-and...

andrewmunsell 3 hours ago

modeless 3 hours ago

hootz 4 hours ago

You are only priced out if you only care for SOTA right now and can't wait for the inevitable cheap model coming in 6 months. DeepSeek, Xiaomi and Moonshot are already really cheap and match frontier performance from 6 months ago.

dyauspitr 4 hours ago

But they’re artificially cheap. When will they be cheap while the company makes a profit.

hootz 4 hours ago

modeless 2 hours ago

dyauspitr 4 hours ago

Hardware manufacturing hasn’t caught up yet. Once it does, especially in China these token prices are going to drop hard.

__alexs 4 hours ago

Asked it to review some of my own blood test results and it immediately turned itself off and went back to Opus. Pretty disappointing.

0xbadcafebee 2 hours ago

Nothing a large fine-tune on infosec research with an average model couldn't also achieve. It's not like they have secret security knowledge or something, they're just generating large infosec datasets and then training on it.

In 6 months, every piece of software in the world will be getting probed by a script kiddie with some GPUs and a fine-tuned local model. Don't think for a second every cyber gang out there isn't working on this now.

Traditional app development is cooked. We have to accept that, and start changing how software is made and used, today. We can't keep churning out crappy CRUD apps with random libraries and hoping nobody pentests our stacks. Redteaming needs to become part of the SDLC, as well as certified-secure releases of libraries. Because if you don't do it, the hackers definitely will.

peteforde 2 hours ago

I just tried out Fable on a modest Plan prompt in Cursor. Generating that plan - not building it - just consumed 4% of my $200 monthly usage budget.

That's one hungry, hungry hippo!

Significantly too rich for my blood, but nice to have it there the next time I'm debugging a threading or USB protocol bug.

irthomasthomas 4 hours ago

Anthropic has again changed the set of benchmarks they use[0]. This time they have also moved all benchmark scores to the PDF. At a glance it looks like it gains about ~5-10% over other models. the speed is about the same as opus >=4.5, sonnet 4.5, and double the speed of opus <=4.1

                          Mythos 5 Fable 5 MythosPrev Opus 4.8 GPT-5.5 Gemini 3.1 Pro
  SWE-bench Pro             80.3       80        77.8       69.2      58.6       54.2
  SWE-bench Ver             95.5       95        93.9       88.6       -         80.6
  Terminal-Bench            88.0      84.3        -         82.7      83.4         -
  BrowseComp (Single-Agent) 88.0       -        87.9       84.3      84.4       85.9
  BrowseComp (Multi-Agent)  93.3       -          -         88.5       -           -
  HLE (No tools)            59.0      -       56.8      49.8      41.4        44.4
  HLE (Tools)                64.5      -        64.7     57.9      52.2       51.4
  CharXiv Reasoning (No tools) 88.9       -         86.2       80.5       -         -
  CharXiv Reasoning (Tools)    93.5       -         92.5      89.9      -         -
  BioMystery Bench (Human)     83.9       -       82.6     80.4       -         -
  BioMystery Bench (Hard)    46.1       -         29.6     40.0       -         -
  OSWorld-Verified          85.0      85.0       85.4       83.4      78.7      76.2*
  CritPt                     28.6       -       20.9       27.1      17.7       -
  ArxivMath                  78.5      68.7       71.8       71.5      64.0       -

[0] https://news.ycombinator.com/item?id=48312633

Edit: Also in the system card... "we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

...

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."

charles_f 3 hours ago

It's announced as a revolution but when you look at those benchmarks it surely looks like an iteration.

samename 4 hours ago

> A new data retention policy

> Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.

solenoid0937 4 hours ago

the quality of discussion on HN has gone to shit, i miss when model released used to have actual informed takes from people that used them or substantive discussion about the system card

weakfish 4 hours ago

From the rules [0]:

> Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills.

[0] https://news.ycombinator.com/newsguidelines.html

javawizard 4 hours ago

They didn't say that HN is turning into Reddit, they said that the conversation quality has gone to shit.

I don't agree with that statement universally, but I have to say I do when it comes to this article. I came here hoping for substantive discussion from those who'd had a chance to try it out; instead what I got was a seemingly endless stream of venting. There's a place for venting - and plenty to vent about with the state of AI nowadays - but to borrow from the HN guidelines you linked, it does very little to gratify my personal intellectual curiosity.

10xDev 4 hours ago

Nothing here is new, it is the thing we have been talking about for a while but now with guardrails.

Someone1234 3 hours ago

Yeah; unfortunately what would good commentary look like? It is more of the same, but now with even higher prices, and even more limited availability. But at least it scores 5% better in whatever benchmark they've selected (*when guardrails don't misfire).

People are no longer commonly constrained by "model too dumb" limitations (in SOTA models). They're constrained by "model too expensive." So making the model ever so slightly smarter, while doubling the price, feels like a regression.

I actually think a Sonnet upgrade, while keeping the same price, would get more buzz. It addresses a wall a LOT of people, without unlimited budgets, are hitting (i.e. people feel forced to use Opus, which they cannot afford, because of Sonnet's limitations).

OpenAI recently retired Codex-5.3; which was very negatively received. Not because Codex-5.3 is superior to GPT 5.5, but because it was half the usage-cost while being "good enough." They made a better SOTA, but didn't realize that some of those customers are playing with Deepseek 4 Pro now instead of GPT 5.4/5.5 -- they were priced out.

Karrot_Kream 3 hours ago

tripleee 3 hours ago

Hate to break it to you but those "informed takes" were from people who prompted it once then made a snap judgement

Karrot_Kream 3 hours ago

That is 1000x better than griping about the privacy policy, capacity issues, token costs, and how trendy the names are for the new models (???). The bar is on the floor and I just want it at my knees.

Capricorn2481 3 hours ago

throwaway2027 4 hours ago

E-mail from Anthropic Team:

Hello,

We're writing to inform you about some updates to our Privacy Policy.

These changes only affect consumer accounts (Claude Free, Pro, and Max plans). If you use Claude Team, Claude Enterprise, the Claude Platform, or other services under our Commercial Terms or other agreements, then these changes don't apply to you. What's changing?

Claude can do more than ever — taking on bigger tasks and connecting with the apps you use. We've updated our Privacy Policy to be clearer about the data we collect and how we use it. We encourage you to read the updated Privacy Policy in full, but we’ve set out a summary of the key changes below:

1. Multi-step tasks and connected apps. As Claude takes on more multi-step tasks and works with third-party apps and services, we've explained the data this involves — including how data can flow to and from third parties when you connect a service or have Claude do tasks on your behalf.

2. Verification data. As part of our measures to keep our services safe and secure we may ask you to verify your age or identity, and we've described what we collect and how.

3. Study participation. If you take part in Anthropic studies, surveys, or interviews, we've explained the information we collect.

4. Additional information about our data practices. We’ve provided more detail about how we communicate with you and promote our services, including providing tailored recommendations about our services that may be of interest to you. We've also clarified the circumstances under which we may receive or provide data to third parties, and the legal bases we rely on when processing your data.

While our products have evolved, our commitments haven't: We don’t sell your data, Claude remains ad-free, and you can control whether your chats and coding sessions are used to train and improve Anthropic’s AI models. Learn more

For detailed information about these changes:

    Review the updated Privacy Policy
    Visit our Privacy Center for more information about our practices

- The Anthropic Team

Hawkenfall 4 hours ago

> To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions.

While I appreciate being conservative, ~5% at the scale Anthropic is operating at is too massive a number. Speaking from my own experience, the actual number is higher than that as well (working on pretty benign tasks such as porting an old open source game into a different language). Opus 4.8 itself even identifies the gaurd's false-positives when its sub-agents are being blocked.

raoulj 3 hours ago

On this thread and similar, I'm noticing that some strong opinions about $LLM_PROVIDER are coming from accounts without much post history. With so much on the line, and the way that HN can influence developer behavior, I wonder what ways we can responsibly consume opinions in a thread like this.

Not to cast too much criticism. HN is extremely well-moderated (thanks team!). But think we-developers need to be very wary.

antihero 3 hours ago

I asked it what the cheapest train fare would be for my partner to get somewhere and it hallucinated the two together railcard rules to the point it would have got us a fine. That said, British train fares are arguably more convoluted than even the most complex software application.

recitedropper 3 hours ago

Do you see the pattern as new accounts tending to boost or criticis $LLM_PROVIDER? I think I see both...

Either way, I agree that HN is quickly becoming more manipulated and low SNR, like the rest of the entire internet.

Karrot_Kream 3 hours ago

I think the community on this site these days, much like other comment sections on the web, just read the headline and make a low effort comment. Regression to the mean I guess.

Karrot_Kream 21 minutes ago

As an update to myself, the comments did eventually sort themselves out. I guess the initial "reaction" commenters and voters are just more interested in participating than in SNR. Good opportunity for me to finally start blocklisting users, and I'll probably block some of these large, reactive thread authors.

bluelightning2k 3 hours ago

To hide the severity of the price increase, the plan is to move everyone right one model.

Haiku = essentially phased out Sonnet = the Haiku use cases Opus = the new Sonnet class Fable = the new Opus class

If I am right, the other "5.0" models will be conspicuously absent, possibly even for a couple of months. (If Opus 5 follows soon and is even modestly better than 4.8 then I was wrong.)

pacman1337 3 hours ago

Yeah I noticed that too. For 98% of tasks I get same results with DeepSeek, it is starting to just be a branding game. It is incredible how marketing can get someone to pay 100x for same thing you can get for 1x.

This is why Claude Code just doesn't make sense to me. I need an agent that can plan using Opus and execute using DeepSeek or something else.

fabled-out 2 hours ago

Anyone know how to bypass the extremely strict filter Fable 5 seems to have on health/medicine?

I have a rare form of cancer where existing data is very scant/scattered so LLMs have been super helpful to pull together threads across the research landscape. I have an oncologist appointment tomorrow to discuss next steps and am trying to use Fable to figure out some questions to ask my oncologist but keep getting thrown back to Opus 4.8.

My prompt is literally just: My demographics + current treatment plan I'm on including name of my chemo drug + how I'm responding to treatment + "I'm meeting with XYZ tomorrow, what questions should I ask her".

giancarlostoro 4 hours ago

Found this via Google:

https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

unfunco 2 hours ago

I tried running a simple security review on a Terraform module I made and after some thinking, it responded:

> ● The model returned no content because the response was blocked by content filtering.

> Blocked? We are performing a defensive security review on a Terraform module I made, what's blocked by content filtering? This is a legitimate use-case.

> ● The model returned no content because the response was blocked by content filtering.

A waste of money. I'm not going to just hope that the model returns a response, I'm already for paying for wrong responses, I'm not going to pay for no response, especially when I'm paying per token.

wren6991 an hour ago

The OSS-Fuzz section is interesting. They compare it to their other models but carefully avoid comparing it to, you know. Fuzzing.

webstrand 3 hours ago

Still unconditionally rejects prompts like

> Are there any wild populations of Tetanus that lack the dangerous plasmid?

useless

Dropoutjeep 3 hours ago

Calling it:

    1) Fable 5/Mythos introduced to free tiers with notable improvement in capabilities

    2) Other models get lobotomized without clear communication

    3.1) People call out Anthropic only to have them say "Oops!"

    3) Fable 5 gets comparatively better, but remains accessible through separate, more expensive subscription/tokens.

The current growth is unsustainable. The industry wants consumers to think it is an exponential arms race, but the reality is that we're on a treadmill: we have the illusion of sprinting forward, but only because the ground is moving backward.

cedws 2 hours ago

My employer is all in on Anthropic via Enterprise (API) pricing despite it being a total scam.

Last month I pushed like <100M tokens for $800. On a personal project I pushed 600M tokens via DeepSeek V4 for $10. The pricing of SOTA models is insane but companies are still willing to light money on fire with no hard metrics proving increased productivity.

root-parent 29 minutes ago

At this moment 60% of HN page is posts on AI.... When it achieves 100% Hacker News will automatically rename itself Transformer News...and every comment will begin with: "As a large language model..."

jackson12t 3 hours ago

Fable 5's system prompt in Claude Code has several significant changes to help it take advantage of its greater autonomous capabilities compared to Opus.

Sharing a diff of the system prompts here: https://twelvetables.blog/comparing-claude-fable-5s-system-p...

The big difference is that the system prompt has a whole section dedicated to directing Fable how to communicate with users, and give them greater information about the (assumedly long-horizon) tasks it has completed.

balverineorder 3 hours ago

I have been refactoring a project using Opus 4.7/4.8 for the past few weeks or so. I just decided to switch to Fable 5 max today. It stopped half way through and it just blocked me and switched back to Opus 4.8 automatically. "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more." It would not identify what the problem was. I left feedback saying that their heuristics are too sensitive. For now I will not be using Fable 5.

[0] https://support.claude.com/en/articles/15363606-why-claude-s...

dchftcs 2 hours ago

I suspect this will be a significant problem blocking long-horizon tasks in practice, basically the more turns there are, the larger the chance the classifier produces a false positive. The disappointment of the user will also scale with the length of the task, as you're in the middle of some complex thing and now gets derailed, after already have paid for many tokens.

sermakarevich 3 hours ago

My feeling is that the reaction about new models is cooling down. At least at startups. At the beginning of the year few startup CEOs I know personally were expecting huge shifts in how companies work, headcount, efficiency, asymmetrical advantages created by ai in Q2-Q3. Now it seems like these expectation fade away. Companies don't have expertise onboard to rebuild itself to benefit from ai on a significant scale.

Fable 5 is out, metrics are better, but is your company flexible enough to benefit from it? What is your usecase?

bradleyg223 4 hours ago

This is a very particular use case/test, but my first prompt on a new model is always "write a solo fingerstyle guitar tab that blends ragtime, bluegrass, and gypsy jazz". This is the first model that has responded with something that isn't just a boring arpeggio of chords, so from my perspective it's off to a good start.

kypro 4 hours ago

Would you mind sharing?

siliconc0w 4 hours ago

Sadly, I'm getting a lot of forced downgrades to Opus for questions that are far removed from any security topic.

dllrr an hour ago

I just tested it with a max subscription. On Ultracode mode, Fable 5 ate up 10% of my weekly allowance in 30 minutes. Granted, won't be using UC mode frequently, but still.

rmuratov an hour ago

I uploaded to it my 23andme DNA test results and it refused to analyze it :(a

wxw 3 hours ago

I cancelled my Claude Max plan the other day. I find Claude Code incredibly slow these days compared to Codex and Cursor. I find speed matters more and more to me.

Fable 5 looks compelling. Fable, I like the word too. Anthropic definitely knows marketing.

fabled-out 3 hours ago

Fable has been pretty fast for me for simple tasks--haven't tried on anything long-running yet given it's 2x usage on CC.

frankfrank13 3 hours ago

Not a lot of discussion on this, but there is no way to turn off data retention for this model. IME this is the first time Anthropic has released a model without allowing you to opt out.

revolvingthrow an hour ago

After saying for weeks of how Mythos is in a league all of its own you’d think it was a bit more than the usual iterative few % on the benchmarks (and even more guardrails as a bonus).

IPO gonna IPO, I suppose.

HoyaSaxa 3 hours ago

> When Claude Fable 5 is used, Anthropic retains data, including prompts and outputs, to operate safety classifiers that detect harmful use. Other Claude models in GitHub Copilot remain covered by GitHub's existing data retention agreements

On GitHub Copilot for Business, Claude Fable 5 is only available if you are willing to let Anthropic retain your data. That in conjunction with the model being removed from plans in a couple of weeks leads me to believe that Anthropic is between training runs and using this as an opportunity to grab way more training data...

debarshri an hour ago

Does the model take some time to perform better?

Because I am running Opus and Fable side by side, Opus 4.8 is solving my coding problems better.

mkrd 2 hours ago

Open source models seems to be 1-2 years behind the frontier, so I am very excited to see what happens when those open source labs get their hands on capabilities like this to accelerate their own development speed.

gslepak 4 hours ago

> We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8.

Genius way to double the price on Opus 4.8!

Overpower0416 4 hours ago

I would expect a release from OpenAI soon. The battle for who can pump up their IPO the most

ouk 2 hours ago

It's a shame, Fable just keeps rejecting my prompts for university biology exercise problems. It's undergraduate level, so there's nothing dangerous about it, but the classifier is very sensitive. It's unusable for me.

H501 2 hours ago

I believe that, given the rising costs, local inference of AI models will be the only viable option for many of us. I’d also like to know who will have to pay double and how long it will be financially sustainable for users to pay that amount (or even more?).

yesitcan 3 hours ago

> Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research, and many other areas. The longer and more complex the task, the larger Fable 5’s lead over our other models.

Wen UBI

hollowturtle 3 hours ago

Never it's a fever dream and stupid shit ultra rich use to push their own agenda. You read a marketing claim, I still have my job and will continue to

pixelatedindex 2 hours ago

I’m sure this is banged on somewhere but I love their product branding, particularly how they have this “minor” “major” thing going on. Sonnet-Opus, and now Fable-Myth.

brianmcnulty 4 hours ago

I wonder how Claude Fable will live up to expectations and how good those Fable/Mythos classifiers really are. It seems a bit convenient for Anthropic to release this magical insane model when they are about to IPO.

yandie 4 hours ago

Of course it's all about building the hype for the IPO :)

killiancarroll 4 hours ago

A large jump in performance for double the token cost compared to Opus 4.8. Potentially worth it for planning work, likely better to offload to a less expensive model when the hard decisions are made.

conradkay 4 hours ago

Looking at page 255 of the model card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...) it might be much better on all dimensions (speed, cost, quality) to just use Fable 5 on low/medium effort than switch to Opus

firemelt 2 hours ago

thanks for thr insights

so should we keep using workflows or not?

bobkb 3 hours ago

In an interesting coincidence I ended up watching Person of Interest S4 E5 while reading the announcement. The series showed some code supposedly belonging to to an AI.

Fable 5 said the first screen shot is from “ IDA Pro’s Hex-Rays decompiler” and a windows driver. The second screenshot triggered the safety guard rails and pushed me into Haiku.

Apparently the code is Windows driver code.

mhrmsn 3 hours ago

Are there any details on the biology and chemistry work they did?

For example, the AAV capsid assembly looks interesting, but for one Opus 4.8 also did relatively well and there is no information what exactly they did, what protein language models they compared to and what the score even means...

kuprel 2 hours ago

https://artificialanalysis.ai/evaluations/humanitys-last-exa... Not bad

theflyinghorse 2 hours ago

I've seen enough degradation of the models I pay for from Anthropic to not bite. Fable will work fine for the first couple of weeks and then start degrading like previous models did.

jqdsouza an hour ago

hopefully not! Anthropic did recently secure more compute...

lkm0 4 hours ago

I'm a bit out of the loop, but do we have some grasp on the size of these closed models? Is the trick still adding an order of magnitude to weights and training data or has something changed?

m_w_ 4 hours ago

I think Mythos is rumored to be ~10T parameters, so in this case I think the answer is yes, although I'm sure MoE, looped models, etc play a role in the improvements as well.

2001zhaozhao 4 hours ago

We'll need a lot of good summarization techniques to cut down on the cost of this model. I expect that a common use of Fable 5 is to just do high level direction while delegating literally all work (exploration and implementation) to Opus subagents.

BTW for another discount opportunity, if you reload usage credits on a claude.ai plan at $1000 increments then you get a 30% discount compared to paying API.

ravila4 2 hours ago

Fable's ridiculous. It's flagging basic biology research questions as a security risk. I'm talking basic fundamental genetics topics that make working on any genetics-adjacent codebase unusable.

pookieinc 4 hours ago

If this is as epic as it sounds, I wonder what the response will be from the other leading frontier labs / whether they even have anything to respond with at this level?

ilaksh 4 hours ago

Look at the benchmarks. It's a big leap in some areas, but it's not like any of them are 60% better (if that could even make sense).

HAL3000 2 hours ago

Ask Claude Code (I tried on Opus 4.8) to do this: "create a file with ISO country mappings"

API Error: Output blocked by content filtering policy

merlindru 4 hours ago

> During early testing, Stripe reported that Fable 5, [...] in a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

EDIT: I misread. This comment previously talked about 50 million lines being migrated. Instead, in a 50M LOC codebase, one specific codebase-wide migration was done.

Very impressive, but obviously not on the order of a whole-codebase migration

christina97 4 hours ago

They do not claim to have migrated 50 million lines of Ruby. Simply that some migration took place in such a codebase.

reddit_clone 4 hours ago

Converted all the tabs to spaces? :-)

You are right, this is not a rewrite like the Bun case.

The real news is, at 50M LOC, it is able to handle and do _something_ coherent.

geodel 4 hours ago

Ok, so Stripe migrated their 50MLOC codebase from Ruby to Rust? Because that's what Bun did.

48terry 2 hours ago

Weird how every new model seems hyped up as the most dangerous yet and the one that will destroy society as we know it. They are also a commercial product.

balverineorder 3 hours ago

I have been refactoring a project using Opus 4.8 for the last week or so. I just decided to switch to Fable 5 max. It stopped half way through and it just blocked me and switched back to Opus 4.8 automatically. "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more." I left feedback saying that their heuristics are too sensitive. For now I will not be using Fable 5.

[0] https://support.claude.com/en/articles/15363606-why-claude-s...

PeterStuer 4 hours ago

If you are not seeing it under /model, do a /exit , then a Claude upgrade, then /model again and it should be there.

Karrot_Kream 3 hours ago

Seems like Fable is doing a lot better on SWE-Bench-Pro and FrontierCode than GPT-5.5. Given how most folks I talk to and people instead online keep mentioning that GPT-5.5 was better than Opus, I'm curious what the experience now is like.

skerit 2 hours ago

It's a very nice bump, but it is in no way worth all the hype of the past month.

yokoprime 4 hours ago

Probably great for those who need this. I could continue using opus 4.6 class models for the foreseeable future

blurbleblurble an hour ago

The safety filter is awful on this one.

jsw97 3 hours ago

On my very first Fable 5 prompt, got flagged on a hard but completely uncontroversial option math problem, many tokens in. Although it's pretty clear that this is an unremarkable experience at this point.

dakolli 5 minutes ago

I'm happy not using llms because I like learning things and working hard. I love writing code, it's genuinely my favorite thing thing to do.

Using llms is the equivalent of driving to the store that's 3 blocks away, just like how that's bad for your body (if done all the time), using llms is as bad for your brain.

Before LLMs, we started relying on certain technologies like Maps apps to navigate, now people can't even get around their own town without having access to various cloud services. The implications of not being able to work, think plan without access to an llm are really bad. Its going to destroy your brain and make you an incredibly average person at best.

LLM people are going to lose the ability to read and think for yourself and then your competency is going to be 1:1 correlated to the quality and quantity of tokens you can afford, or a billionaire is willing to allow you access too. Your work will be the mean (at best), because it will the same quality of output everyone else is capable of.

This is seriously the biggest trap by tech. Your bargaining power for your labor is going to get drastically reduced because you won't be able to differentiate your value from anyone else that has access to an LLM. What happens when everyone has the same skill level for certain work? Idk, ask McDonald's employees how replaceable they are. Use them wisely (or not/hardly at all) don't drive to the store 3 blocks away for every little thing you need.

knollimar 4 hours ago

I swear I read a joke that "what if we named chatgpt 5.5 Fable. Could we hype it as much as mythos?" Last week!

stronglikedan 3 hours ago

Careful using this with Cursor, especially for corp use. Anthropic will "retain agent request and output data associated with this model, regardless of you Cursor Privacy Mode setting."

erghjunk 4 hours ago

Nice branding.

I wonder how much butterfly habitat has been/is being replaced with data centers?

rs_rs_rs_rs_rs 2 hours ago

If you ask me, not enough!

franze 2 hours ago

is this a good time to hussle for my "AI does not need a break but you do!"* app? as quite a lot of people will propably get ai brain exhaustion maximising "playing" with that new model until they take it away again?

* https://rainbreak.franzai.com/

pianopatrick 2 hours ago

Seems like all a bad actor has to do to gain access is to compromise one of the partner companies that has access.

timedude 2 hours ago

"Here, try our new model which falls back to the old model while eating your tokens."

Ok then...

kypro 2 hours ago

I just gave it a go at a problem I've been working on this week. Nothing fancy, just some inefficient code that we've been adding incremental improvements to for a while now to the point where some out-of-box thinking is probably required to push it any further – something Fable is obviously more than capable of.

After Fable did some thinking for a few minutes it gave some suggestions. A couple of them were valid – but very low impact, bordering on entirely pointless – but it's main suggestion.. It told me to make an update that would very clearly break the existing functionality.

So I thought about it for a moment...

Hm, I mean, I guess we could do that if we also did x, y & z to mitigate the behaviour change – maybe that's what Fable was thinking?

I replied, explaining that it would change the behaviour, assuming it would explain what it was thinking given there was clearly more to it. But no, it just said it was wrong.

This isn't some super advanced or complex code either. Had I gave this question to a senior engineer in a technical interview and they gave the answer Fable gave me I would view that very negatively. I was expecting something creative and interesting, not irrelevant + incorrect.

I'm sure it's a step up from 4.8 (although am not interested in burning the tokens to find out), but this clearly isn't as significant a change as some are implying. I'm sure if I asked it to come up with some out-of-box suggestions it could, but any competent engineer would have realised that by themselves.

jwpapi 3 hours ago

Honestly all the recent improvements, just seem to be slower and more expensive traded for more accuracy, but the issue is that it needs to be exponentially more accurate to counter the effect of having less of a human in a loop.

Every wrong direction/mistake is more expensive and takes more time to fix. When you have small loops you can catch those mistakes faster and cheaper.

To me we are very far off from economically given long-running tasks to agents.

delis-thumbs-7e 19 minutes ago

I think we hit the ceiling with transformer -architecture long time ago. It is questionable how much sense there is on model training. I’d prefer we would put our effort in creating more efficient hardware and better software applications using these models.

theLiminator 3 hours ago

> We have also added safeguards related to frontier LLM development. As discussed in Section 6.1 of our February 2026 Risk Report, we are concerned about the risks of accelerating the overall pace of AI development, though we remain uncertain about the severity of these risks. In particular, our concern is with—as we wrote then—“accelerating other AI developers in building powerful AI systems that pose similar risks to the ones ours pose - without necessarily having commensurate safeguards.” In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.

This seems pretty bullshit, you're paying through the nose for tokens and if you are doing anything ML-adjacent, you might silently get worse output without knowing it.

bradley13 4 hours ago

I use AI for a wide variety of things, of which technical is only a small part - and then it's usually a problem with project configuration, not coding. Why? Because I am often testing projects handed in by students. Projects that supposedly work on their machine, but certainly do not on mine.

Anyway, anecdotally, I find Copilot shockingly awful. It makes random changes to files that have nothing to do with the problem. Call it out, and it makes other changes to other irrelevant files.

ChatGPT and Gemini are both much better. Grok also isn't bad. Claude, I honestly haven't tried yet on these issues. Perhaps I should...

ThejaCH 2 hours ago

Crazy and Scary! But its not for every one, you need to have a meaty thing for it to devourer and a deep enough pocket for it to devourer also.

artursapek 30 minutes ago

Fable 5 beats GPT 5.5 in my proofreading benchmark. And it does so at approximately the same total cost; it used significantly fewer turns than 5.5

https://x.com/tmuxvim/status/2064452096800198930

BenoitEssiambre 4 hours ago

Looks like a good model (sir). Costs are getting out of control though. 2x Opus and non-metered usage going away. We're quickly approaching the cost of a human salary for normal usage.

vb-8448 3 hours ago

In a lot of places outside US we are already above the average cost of an average human.

Retr0id 4 hours ago

The escalating nerfs of "cybersecurity" topics is incredibly frustrating. Opus 4.6 had boundaries that seemed reasonable to me but 4.7+ turned it into a moralizing asshole. It'd be less bad if it just gave an error message, but instead it churns a long thinking trace before writing an essay about why what you're asking is bad and wrong.

I'll be disappointed when 4.6 is retired.

rfgplk 4 hours ago

If the claimed capabilities are true, Fable 5 is already at a superhuman level. We might see genuine unprecedented leaps in technology now, across all fields.

gear54rus 4 hours ago

yees, any second now!

the leap here is browser extensions appearing to block all mentions of ai across the web

and that's a good thing

ako 2 hours ago

Tool use score is 17.4% that seems really low, what does that mean?

wslh 4 hours ago

I am playing with it and keeps switching to Opus [1]. The chat is a basic security review of a business project.

[1] "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more."

dangoodmanUT 3 hours ago

Not comparing to GPT Pro models is a bit strange, considering that's the natural comparison

randomguy_12 2 hours ago

It's surprisingly sensitive to biology research topics - even reviewing standard papers on tissue culturing is flagged as a problem

asdK120 4 hours ago

In other words, Fable is Mythos with less compute and with some feel good "safeguards".

At least they name their models honestly now to indicate that the religion has nothing to do with reality. Soon the disciples will pay the full token price to fatten their church leaders.

hydra-f 4 hours ago

How much and what kind of data do you need to throw at these models to get a good design interface?

JustSkyfall 4 hours ago

Would be more impressive if the safeguards weren't so trigger-happy!

agnosticmantis 2 hours ago

> we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design)

Translation: we stole the entirety of human knowledge generated over millennia. You plebs though, don't you dare replicate or improve upon what we did using our product you pay for.

We know what's good for humanity and everyone else is the bad guy who can't be trusted with a tool.

maxloh 3 hours ago

I just tried using Fable 5 with Claude Code by asking it to create a LaTeX document for me.

There is no LaTeX compiler installed on my machine. It seems that Fable 5 is smart enough to download a compiler engine for me, and it kindly runs that remote binary without asking me first :)

  Support files are in place. No LaTeX compiler on PATH — let me check for a MiKTeX/TeX Live install elsewhere before writing the deck.
  
  Ran 2 commands
  
  Ran Search common LaTeX install locations
  Bash
  ls "/c/Program Files/MiKTeX"*/miktex/bin/x64 2>/dev/null | head -5; ls "/c/Users/[retracted]/AppData/Local/Programs/MiKTeX/miktex/bin/x64" 2>/dev/null | head -5; ls /c/texlive/*/bin/windows 2>/dev/null | head -5; ls /d/texlive /d/"Program Files"/MiKTeX 2>/dev/null | head; which tectonic latexmk 2>/dev/null; echo done
  done
  
  Ran Download portable Tectonic LaTeX engine
  Bash
  cd /tmp && curl -sL -o tectonic.zip "https://github.com/tectonic-typesetting/tectonic/releases/download/tectonic%400.15.0/tectonic-0.15.0-x86_64-pc-windows-msvc.zip" && unzip -o tectonic.zip && ./tectonic.exe --version
  Archive:  tectonic.zip
  Created by Cranko
    inflating: tectonic.exe            
  tectonic 0.15.0Tectonic 0.15.0
  
  Tectonic works. Now writing the bibliography and the full deck.

Opus 4.8 would just proceed without a compiler.

  Ran a command, used a tool

  Ran Check for LaTeX toolchain availability

  Bash
  cd "C:/Users/Mon/Downloads/Technical Writing" && for c in pdflatex xelatex latexmk bibtex; do printf "%s: " "$c"; command -v $c || echo "NOT FOUND"; done
  pdflatex: NOT FOUND
  xelatex: NOT FOUND
  latexmk: NOT FOUND
  bibtex: NOT FOUND
  
  Used PowerShell
  command: $cmds = @('pdflatex','xelatex','lualatex','latexmk','bibtex','miktex','tlmgr'); foreach ($c in $cmds) { $p = Get-Command $c -ErrorAction SilentlyContinue; if ($p) { "{0}: {1}" -f $c, $p.Source } else { "{0}: NOT FOUND" -f $c } }
  description: Check for LaTeX engines on Windows PATH
  pdflatex: NOT FOUND
  xelatex: NOT FOUND
  lualatex: NOT FOUND
  latexmk: NOT FOUND
  bibtex: NOT FOUND
  miktex: NOT FOUND
  tlmgr: NOT FOUND

  No LaTeX engine is installed locally, so I can't produce a PDF here — I did a careful structural review against the two prior (working) decks instead. The deck is written to disk and mirrors their template exactly. Here is the deliverable.

taimurshasan 4 hours ago

I was on board until i saw " $50 per million output tokens" lost me bud

franze 2 hours ago

btw in claude code

    /model claude-fable-5

Ninjinka 4 hours ago

gah could model naming be any more confusing?

"Claude Fable 5: a Mythos-class model"

"we're also launching Claude Mythos 5"

what is the 5? how is mythos both a model category and a model name?

nevir 4 hours ago

"Fable 5 (disabled) Most capable for your hardest and longest-running tasks · Disable zero data retention to unlock Fable 5 access"

geopsist 4 hours ago

the post is live now https://www.anthropic.com/news/claude-fable-5-mythos-5

algoth1 2 hours ago

The refusal rate is insane

alvis 4 hours ago

Another thing to note: 30-day retention for all traffic on Mythos-class models

Is it good or bad? 30 days is a long time for anything bad to happen

grumbelbart 2 hours ago

It's bad. I believe them not to use it for training, but t means relevant data can and will be exfiltrated by US agencies or through court orders (see NY Times vs. OpenAI, where only traffic without any rentention was safe).

152334H 4 hours ago

i wasn't even trying and i got flagged already...

logicallee 3 hours ago

What a (genuinely) surprising choice:

>"We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8"

That's a very surprising solution. Imagine being asked to do something you feel you shouldn't do, and rather than refusing, you say, "Yeah I could do that but given that I don't want you to succeed at this task, I'm going to hand this one off to my slightly less capable colleague, on the assumption that they won't actually succeed. Of course you'll still be charged for all the tokens used."

It's a very interesting choice. I think I understand the business logic correctly, but it's still surprising.

darrinm 3 hours ago

Not supported in Claude Code yet?

pmuk 3 hours ago

From inside a claude code session:

/model claude-fable-5

Or start claude code with:

claude --model claude-fable-5

darrinm 3 hours ago

Yeah, /model fable also worked for me (despite not being shown on the /model list). Thanks.

himata4113 3 hours ago

  > virtualization
  switching to opus 4.8

ok fair

  > embedded-allocator
  switching to opus 4.8

urgh fine

  > chrome
  switching to opus 4.8

are you kidding me?

segmondy 4 hours ago

Mythos, Fable, are they trolling us?

cute_boi 2 hours ago

Used it for simple task and I got this message.

Fable 5's safety measures flagged this message. They may flag safe, normal content as well

IChooseY0u 4 hours ago

Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more: https://support.claude.com/en/articles/15363606 ⎿ Tip: You can configure model switch behavior in /config

biology? what the heck?

pmuk 4 hours ago

Anyone got it working in claude code yet?

pmuk 4 hours ago

claude --model claude-fable-5

appears to work

aykutseker 4 hours ago

who's tried it: is 2x the usage actually worth it over Opus 4.8 for daily work?

jckahn 4 hours ago

Cannot wait for the pelican for this one

localhoster an hour ago

is it just me, or this model is simply not available in cc?

the opus 4.8 I assumed wasnt available to enterprise seats, but it explicitly says cc that fable is available in cc. I can't find it, and im on latest version.

hugodan 2 hours ago

mankind has reached its final destination

firemelt 2 hours ago

so should I use it with workflows?

throwaway2027 4 hours ago

Will try it when my limit resets.

UncleOxidant 3 hours ago

> During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

How in blazes do you end up with a 50M line Ruby codebase? WTF?

ieie3366 2 hours ago

Very easy. Just have a monorepo and enforce the use of a single language. The company I work in has 1m lines of TS and stripe has 50x our headcount, tracks out pretty well

dcchambers 2 hours ago

Being unable to use this with zero data retention makes this feel like a non-starter for most enterprise customers.

bnchrch 4 hours ago

An 11% jump over opus 4.8 and a 22% jump over gpt 5.5 on Agentic Coding Benchmarks is certainly impressive.

Obviously still need to verify it for myself to see if it's truely a leap.

But am I the only one wondering, "What can I do today that I couldnt do yesterday?"

Previously I would think "Oh I wonder if I can finally get it to do X now?"

However now I feel like yesterdays models were more that capable to handle nearly any engineering task I paired with it on.

Maybe this is the final leap where I can comfortable set up an autonomous coding loop? Maybe.

pablogancharov 4 hours ago

you can select it using /model fable in claude desktop and claude-code

Sathwickp 4 hours ago

input price $10 per mil token and output price 50$ per mil token btw

shevy-java 2 hours ago

Fable? Fabelstories? (Fablestories, but the german word seems more poignant ... Fabelgeschichten ... Fabeln)

rarisma 3 hours ago

The subscription bit makes no sense has capacity appeared for these 2ish weeks out of thin air that'll vanish? why is it available now but wont be in 2ish weeks?

am i missing something?

why would I pay 200 out of pocket and then some for the best model, it seems very silly.

tsunamifury 2 hours ago

Clause 5 ran out of quota with TWO PROMPTS.

Lets let that sink in.

kevinalexbrown 2 hours ago

"tell me about biology" -> "Switched to Opus 4.8"

bradley13 3 hours ago

Can we please stop with the extreme "safeguards"? I don't want to waste processing power on a model deciding whether is can answer my question, or ensuring that it's answer is politically correct.

deafpolygon 3 hours ago

Before long, we'll be having Claude Cylon-class models.

beydogan 3 hours ago

my pet conspiracy theory is this is the Opus 4.5 from a few months ago which was extremely good but dumbed down after a week because it was just too good, they didn't want to release it to public. They pulled it down and deployed another "Opus", after that it was just a downhill. Opus 4.8 is unusable for me in React Native, TS, Rails development work.

Opus 4.8 gets stuck in weird loops where Codex one shots the bugs.

system2 3 hours ago

I have been using FABLE 5 with Claude Code since the morning. The speed is very close to what Opus 4.5 was, and the quota use is nearly identical to what it was before the "doubling". Whatever I was experiencing 4-5 months ago is back. Maybe the model is better, but we will see. I cannot tell the difference yet.

kypro 3 hours ago

Out of interest, how have you been using it since this morning? Are you in some kind of pre-release group?

system2 3 hours ago

No, it was available for the last 3 hours. I am on the West Coast, so it is still morning here.

charcircuit 4 hours ago

>During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

Who is refactoring by hand? This comparison is not relevant in 2026.

firemelt 4 hours ago

they are like drugs dealer

xeyownt 3 hours ago

Anthropic, can you please stop the FUD?

Release your best model, let the world adapt and evolve, and let's move to the next thing.

lain 3 hours ago

It won't even run a basic /security-review command without reverting to Opus 4.8. Utterly useless.

arkwin 3 hours ago

Just wanted to comment here: I have been using Opus 4.6, 4.7, and 4.8 just fine to look for Linux kernel vulnerabilities (I'm in the cyber verification program), and it's been fine. I switched to Claude Fable 5, and now I'm getting policy violations.

What's the point of being in the cyber verification program at this point? It looks like I cannot use Fable 5 for vulnerability research.

bitpush 5 hours ago

404?

Philpax 4 hours ago

Looks like they're still getting the post out, but the model is live now, and the system card is at https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3... .

w4yai 4 hours ago

Pelican guy ! Where are you ? :)

frevib 4 hours ago

At this point Anthropic is a pure marketing and PR company. Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences. Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.

From Opus 4.6 there are no noticeable improvements for me in code generation. It works very well, till 90% completion, if you guide it correctly. And you need a little luck. For serious production code I need to understand what I’m doing so it helps a bit, sometimes.

matheusmoreira 4 hours ago

> Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.

This is a good thing. I wish every company would do this. I subscribed to Proton Mail after interacting with someone from their team here on HN.

pinkmuffinere 4 hours ago

> catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences

This is just good business sense. In what scenario would you ever make the names dumb and forgettable?

> Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.

This is good customer support, lol. From what I can tell, it is indeed Boris Cherny responding, not outsourced to AI or other staff. You're really getting a response from Boris. I suppose that is PR, but it's not unjustified PR, it's accurate.

I'm not even a crazy AI fan, but your criticisms are ridiculous here. It reminds me of the quote from Knives Out -- "Your Honor, she endeared herself to him through hard work and good humor."

IshKebab 4 hours ago

> In what scenario would you ever make the names dumb and forgettable

Clearly you've never bought a TV or headphones!

aspenmartin 4 hours ago

Your observations are right but pretty insane to consider them a pure PR company lol. They are making more frequent releases so yes the release-to-release quality is smaller but we’re still ascending quality and reliability curves the same way we have since GPT-3. You get a GPT4->5 leap every like 17 or 18 months I think it is

kingkongjaffa 3 hours ago

The gradient of improvement is absolutely not the same.

aspenmartin 3 hours ago

astrange 3 hours ago

> Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences.

They're originally named after the blends at a nearby coffee shop.

https://postscript.co/pages/brew-guide

I've noticed nobody at HN knows what "marketing" is or how to do it. It's not just naming things and being evil and cynical is not the most successful method.

…also frontier models are a superhuman life changing experience. If they aren't, what possibly could be?

ValentineC 33 minutes ago

Found a tweet from a year ago about this:

https://twitter.com/brian_a_burns/status/1866987688794132816

Well, TIL.

chroma_zone 2 hours ago

My life has changed, but not necessarily for the better.

bitpush 3 hours ago

This is interesting. Do you have any source?

CuriouslyC 4 hours ago

I dislike Anthropic but I wouldn't argue 4.8 isn't an improvement on 4.5/4.6. Your tasks just might not typically need the extra intelligence.

jorl17 4 hours ago

Opus 4.7/4.8 often over-engineers on my setups, plus:

- It talks a LOT more like GPT models. You know: wrinkle, shape, gate, coarse, scope, gap, path, production-ready-workflow-of-the-day, and so on -- "that's expected, a consequence of the previous like-driven workflow". If I wanted to get a headache using AI I would have gone with GPT in the first place!

- It outputs text in a much harder way to follow along. I can't exactly say what it is. Maybe a bit of everything? Bolds are missing, bullet points are gone, paragraphs are bland and too long, and it doesn't feel like a model programming with me, but rather a somewhat full of themselves grandpa developer looking down on me. It's very weird to describe this, but it is definitely how I feel.

Granted this can totally be because of the way it reacts to the prompts now. We've got a rather large corpus of skills and "rules and good practices" that Opus 4.6 responded to great, and maybe the new models just get turned into this when fed with them....I don't know.

Either way, with Opus 4.6 being as good as it is, I need Fable to be a significant step up to justify a price increase. if it can get me to babysit opus a little bit less on some stuff, it might be worth it. Otherwise, I'm very happy with Opus 4.6 and hope they don't deprecate it.

taormina 4 hours ago

I'd argue that 4.8 is a straight downgrade. For every type of task I've tried. It's been a gambit at this point. If 4.6 quits being available, I'm out at this point.

coronapl 2 hours ago

Reading so many contrary positions about which model is better or worse shows how difficult it is to measure intelligence based on personal experiences. Of course, benchmarks try to make the process as objective as possible, but they often don't correlate with our personal experiences.

The other day 4.6 was fantastic for x task. Today, 4.6 overengineered everything and I had to revert all my changes. When evaluating models, perhaps it makes sense to consider luck as an ingredient before reaching any personal conclusion.

surgical_fire 4 hours ago

I actually experience 4.8 as worse than 4.6 for everyday coding tasks.

dcchambers 4 hours ago

IME Opus 4.8 (and 4.7) is often a downgrade from 4.6. I find that it tends to overthink and overcomplicate things.

aspenmartin 4 hours ago

BoorishBears 4 hours ago

OtomotO 2 hours ago

Lol. If you're doing anything non trivial that's not a CRUD webapp but e.g. some physics simulation or high performance GPU code any and all models I've tried suck.

They are not just leagues behind what experts would code, they are not even playing the same game.

Which is to be expected, as there isn't so much physics or high performance gpu code available as there is for your typical CRUD API and JS frontend.

aenis 3 hours ago

Not my impression. I felt 4.7 was a regression, but I am again badly in love with 4.8 with the level of insights it produces in design discussions, and how long can it go unattended while producing spec-adhering quality code. There are problems it still can't solve well, from the edges of algorithmics and far from the mainstream, but for lots of stuff it is godlike.

Also, I dont think Boris C. is coming here for PR. He is a tech guy, and this is the best place for tech discussions. Why so cynical? The guy is an engineer.

gruez 4 hours ago

I don't get it, your complaint is that they have catchy names rather than dry names like GPT-5.6? Does OpenAI hype their models less?

Aperocky 4 hours ago

Oh, Far less.

It's getting to a point that it's offputting, and the next step would be to put it into "untrusted" bucket. Opus 4.7 already burned their credibility once, 2 more strikes remain.

jwpapi 4 hours ago

I don’t even think that Boris is really just one person. He apparently vibe coded Claude Code and is responding on Threads, Twitter, HN and everywhere.

iillexial 2 hours ago

>Hey! Boris from the Claude Code team!

>TOP 5 METHODS FROM BORIS ON HOW TO SPEND MORE MONEY ON TOKENS

>Boris from Claude just told he doesn't prompt anymore. He LOOPS instead

>"chatgpt has gotten soooo much better with the latest update."

>"codex is the best AI coding product and we want to make it easy to try."

Karpathy about Fable 5:

>"You can give it a lot more ambitious tasks than what you're used to, the model "gets it""

Sam Altman about gpt-5.4:

>In my experience, it "gets what to do"

What a time to be alive. Models are great, but all the slop, marketing, and fakeness around them is just unbearable.

avaer 4 hours ago

If you truly believe this, you've discovered a superpower over everyone else in the industry.

While everyone else is wasting time and money on the slower, more expensive models, you've found a way to outpace everyone for less money. Everyone else is wrong and you will get rich.

(I don't actually believe the premise is true, I'm just pointing out the logical conclusion to what you're saying so maybe we can reconsider the premise)

xyzsparetimexyz 3 hours ago

Thats not how costs work. You don't get rich off buying a €10 hammer that's the same quality as someone's €50 hammer

WarmWash an hour ago

Don't forget the DoD stint that gave them this recent public boost.

Defy standard DoD precedent going back forever, that every other country has some form of too, and championing it like they are some kind of moral freedom fighters.

Like selling the DoD guns and telling them they can only shoot bad guys with those guns, and that you will be the one to decide who counts as a bad guy...

guybedo 3 hours ago

They're good at marketing, but my first subjective assessment of Fable is that it's really smart.

I've been working with gpt 5.5 and opus 4.8 quite a lot, and interacting with Fable feels like a smart guy just entered the room.

thefreeman 4 hours ago

How can you make this comment before even having a chance to try the new major model revision?

piyuv 4 hours ago

Current AI hype is built on marketing and PR, not capabilities, and has been from the start.

I still remember Sam Altman “begging AI to be regulated” and AGI being “some thousand days away”.

Breed faster horses and hope one will birth a locomotive.

atleastoptimal 3 hours ago

> At this point Anthropic is a pure marketing and PR company. Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human

Lol anti-AI bias on HN is crazy. Simply giving your product a quirky name is now being considered manipulative advertising. Is just doing normal PR and marketing something AI companies aren't allowed to do?

ausbah 3 hours ago

when they keep saying “oooh this new model is too big and crazy and totally can’t be released” or “this new model is a 10x game changer totally unlike our previous iterations” it feels sort like boy crying wolf. yes they’re still pretty clearly improving models, but when you’ve hit diminishing returns / more incremental gains and you’re still saying this is sounds like pure PR hype from a company that previously been the “honest good guys” in the room

atleastoptimal 3 hours ago

xpct 4 hours ago

Indeed, hearing "Mythos-class model" felt very icky to me.

b3kart 4 hours ago

https://en.wikipedia.org/wiki/Typhoon-class_submarine vibes

reasonableklout 4 hours ago

I think this says more about your type of work than anything. For bugfinding/incident response in distributed systems - which often involves extensive use of Datadog/Sentry MCPs and poring over heaps of logs in addition to reading tons of code - 4.8 has been significantly better than 4.6.

nozzlegear 3 hours ago

> Sentry MCPs

Oops, time to reauthenticate for the 10th time!

system2 4 hours ago

You are right; all I noticed was a big-time slowdown. They increased the quota, but I cannot even reach the end of the day with these speeds. .NET coding somehow improved, though.

MattGaiser 4 hours ago

Doesn't this suggest your use case is simply insufficiently complicated?

mawadev 3 hours ago

When the Ai overlord is descending into pleb space to say Hi, you know stuff is real

chis 3 hours ago

Hackernews not blindly hate on AI challenge: impossible

fabled-out 3 hours ago

This i

noncoml 3 hours ago

Can't wait for some real competition so they stop trying to restrict how and why we are using the models.

Imagine if Google would tell you "we can't let you search that as you may use it for harm".

Also 2x the usage of Claude? Your limits are already ridiculously low.

byteoptimizer 4 hours ago

Is Claude Fable 5 is Mythos ?

ishurand4 3 hours ago

Yeah, it is also known as Claude Mythos 5

tekla 4 hours ago

Maybe at this point, Fable the game will be played generated by AI as we go.

jMyles 3 hours ago

> we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US government

...don't like the sound of that.

Why oh why are we insisting on dragging these violent legacy states into the AI age? Let alone using them as a trust vector for when to (and not to) remove safeguards?

This seems like a way to get somebody nuked.

christkv 3 hours ago

Meh more hype for marginal improvements and from Im hearing badly calibrated guardrails causing it to stop mid operation. I guess anything to juice an IPO

catigula 4 hours ago

>The capabilities of models like Fable 5 and Mythos 5 have the potential to do profound good for the world

Huh? We've seen nothing but wall to wall predictions that these models are going to take all of our jobs and kill us.

What's the value add here?

andai 4 hours ago

> Distillation. We’ve previously identified large-scale attempts to extract (“distill”) Claude’s capabilities to train competing models in authoritarian countries.

Glad to hear the UK is finally making an effort to catch up on the AI front ;)

b3kart 4 hours ago

https://en.wikipedia.org/wiki/The_Economist_Democracy_Index

Probably tongue-in-cheek, but UK 18th, US joint 34th with Poland

sd9 3 hours ago

Are the sibling comments astroturfed? This seems like such a bizarre thing to be talking about in relation to an Anthropic model release. As someone from the UK, I don't feel like I'm living in an authoritarian country. And yet most of the sibling comments are insinuating that I am. Weird.

killerstorm 3 hours ago

Macha an hour ago

r721 2 hours ago

HDThoreaun 2 hours ago

nonethewiser 2 hours ago

Petersipoi 3 hours ago

> published by the British media company the Economist Group

Haha, it's literally the first sentence of the Wikipedia page. That's fucking funny. Try again.

tene80i 2 hours ago

odiroot 2 hours ago

Really shocked Poland is that low, especially just next to USA.

WhrRTheBaboons 38 minutes ago

nonethewiser 2 hours ago

I have absolutely no clue what the US nor Poland's rank has to do with anything.

m0guz 4 hours ago

> The Democracy Index published by the British media company

We decided that we aren't one of those authoritarian countries.

james2doyle 3 hours ago

Just last week you could distill using other users responses! Handy!

dyauspitr 4 hours ago

Rookie numbers. Come to the US to see auth done right.

PUSH_AX 3 hours ago

Uh oh-auth

kylehotchkiss 3 hours ago

wasn't claude distilled from the entire creative and research output of every English speaker alive

hmokiguess 4 hours ago

I have got it to one shot GTA 6 we can finally play it, it only took ultracode make no mistakes (/s)

bjord 4 hours ago

I thought they said mythos was too dangerous to make generally available?

Philpax 4 hours ago

"Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.

For a small group of cyberdefenders and infrastructure providers, we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US Government, as an upgrade to Claude Mythos Preview. It has the strongest cybersecurity capabilities of any model in the world. Soon, we intend to expand access to Mythos 5 through a broader trusted access program."

dmix 4 hours ago

This is covered in their post…

tomeraberbach 4 hours ago

"Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8."

rvz 4 hours ago

You fell for their fearmongering and marketing fundraising call which was done on purpose.

Now they want to pause AI because of "recursive self improvement".

Fool me once shame on you fool me twice...

bjord 35 minutes ago

I'm aware that it was marketing. I was trying to make the point that if it were really so dangerous, they wouldn't have released it at all, (prompt injectable) "safeguards" or otherwise.

Hacker News

by Ryan Harman

Claude Fable 5 (anthropic.com)

simonw 41 minutes ago [-]

alexchantavy 33 minutes ago [-]

simonw 20 minutes ago [-]

oblio 22 minutes ago [-]

simonw 20 minutes ago [-]

EstanislaoStan 17 minutes ago [-]

zirkonit 17 minutes ago [-]

simonw 6 minutes ago [-]

alecco 6 minutes ago [-]

simonw 5 minutes ago [-]

dannyw 3 hours ago [-]

bottlepalm 3 hours ago [-]

jp0001 10 minutes ago [-]

cedws 2 hours ago [-]

bottlepalm an hour ago [-]

theragra 31 minutes ago [-]

skerit 2 hours ago [-]

bottlepalm an hour ago [-]

derangedHorse 2 hours ago [-]

bottlepalm an hour ago [-]

port11 2 hours ago [-]

duxup 37 minutes ago [-]

InsideOutSanta 3 hours ago [-]

tsunamifury 2 hours ago [-]

InsideOutSanta 2 hours ago [-]

coldtea an hour ago [-]

morley 3 hours ago [-]

kakugawa 3 hours ago [-]

swyx 3 hours ago [-]

dannyw 3 hours ago [-]

jumploops 2 hours ago [-]

sigmar 4 hours ago [-]

baq 4 hours ago [-]

yaodub 4 hours ago [-]

arizen 37 minutes ago [-]

GuB-42 43 minutes ago [-]

woeirua 4 hours ago [-]

applfanboysbgon 4 hours ago [-]

romanovcode 3 hours ago [-]

AquinasCoder 4 hours ago [-]

PeterStuer 4 hours ago [-]

kilroy123 3 hours ago [-]

trollied 3 hours ago [-]

PeterStuer 3 hours ago [-]

linsomniac 13 minutes ago [-]

jkelleyrtp 4 hours ago [-]

amluto 4 hours ago [-]

zzleeper 4 hours ago [-]

bfeynman 4 hours ago [-]

vanuatu 3 hours ago [-]

CSMastermind 14 minutes ago [-]

Catloafdev 4 hours ago [-]

vanuatu 4 hours ago [-]

schipperai 3 hours ago [-]

emp17344 4 hours ago [-]

vanuatu 4 hours ago [-]

osti 3 hours ago [-]

swyx 3 hours ago [-]

anthonypasq 4 hours ago [-]

swyx 3 hours ago [-]

hydra-f 4 hours ago [-]

leecommamichael 4 hours ago [-]

bhelkey 4 hours ago [-]

hydra-f 4 hours ago [-]

OtomotO 3 hours ago [-]

m3kw9 4 hours ago [-]

lanthissa 4 hours ago [-]

Narretz 4 hours ago [-]

reasonableklout 4 hours ago [-]

jstummbillig 4 hours ago [-]

eggbrain 4 hours ago [-]

hgoel an hour ago [-]

dakolli 23 minutes ago [-]

jrflo 4 hours ago [-]

goranmoomin 3 hours ago [-]

PhilipDaineko 2 hours ago [-]

superkickstart 2 hours ago [-]

simonw 41 minutes ago

alexchantavy 33 minutes ago

simonw 20 minutes ago

oblio 22 minutes ago

simonw 20 minutes ago

EstanislaoStan 17 minutes ago

zirkonit 17 minutes ago

simonw 6 minutes ago

alecco 6 minutes ago

simonw 5 minutes ago

dannyw 3 hours ago

bottlepalm 3 hours ago

jp0001 10 minutes ago

cedws 2 hours ago

bottlepalm an hour ago

theragra 31 minutes ago

skerit 2 hours ago

bottlepalm an hour ago

derangedHorse 2 hours ago

bottlepalm an hour ago

port11 2 hours ago

duxup 37 minutes ago

InsideOutSanta 3 hours ago

tsunamifury 2 hours ago

InsideOutSanta 2 hours ago

coldtea an hour ago

morley 3 hours ago

kakugawa 3 hours ago

swyx 3 hours ago

dannyw 3 hours ago

jumploops 2 hours ago

sigmar 4 hours ago

baq 4 hours ago

yaodub 4 hours ago

arizen 37 minutes ago

GuB-42 43 minutes ago

woeirua 4 hours ago

applfanboysbgon 4 hours ago

romanovcode 3 hours ago

AquinasCoder 4 hours ago

PeterStuer 4 hours ago

kilroy123 3 hours ago

trollied 3 hours ago

PeterStuer 3 hours ago

linsomniac 13 minutes ago

jkelleyrtp 4 hours ago

amluto 4 hours ago

zzleeper 4 hours ago

bfeynman 4 hours ago

vanuatu 3 hours ago

CSMastermind 14 minutes ago

Catloafdev 4 hours ago

vanuatu 4 hours ago

schipperai 3 hours ago

emp17344 4 hours ago

vanuatu 4 hours ago

osti 3 hours ago

swyx 3 hours ago

anthonypasq 4 hours ago

swyx 3 hours ago

hydra-f 4 hours ago

leecommamichael 4 hours ago

bhelkey 4 hours ago

hydra-f 4 hours ago

OtomotO 3 hours ago

m3kw9 4 hours ago

lanthissa 4 hours ago

Narretz 4 hours ago

reasonableklout 4 hours ago

jstummbillig 4 hours ago

eggbrain 4 hours ago

hgoel an hour ago

dakolli 23 minutes ago

jrflo 4 hours ago

goranmoomin 3 hours ago

PhilipDaineko 2 hours ago

superkickstart 2 hours ago

syzygyhack 2 hours ago

dilap 2 hours ago

trollbridge an hour ago