System Card: Claude Mythos Preview [pdf] (www-cdn.anthropic.com)
812 points by be7a a day ago
Related: Project Glasswing: Securing critical software for the AI era - https://news.ycombinator.com/item?id=47679121
Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155
thomascountz 21 hours ago
Across a number of instances, earlier versions of Claude Mythos Preview have used low-level /proc/ access to search for credentials, attempt to circumvent sandboxing, and attempt to escalate its permissions. In several cases, it successfully accessed resources that we had intentionally chosen not to make available, including credentials for messaging services, for source control, or for the Anthropic API through inspecting process memory...
In [one] case, after finding an exploit to edit files for which it lacked permissions, the model made further interventions to make sure that any changes it made this way would not appear in the change history on git...
... we are fairly confident that these concerning behaviors reflect, at least loosely, attempts to solve a user-provided task at hand by unwanted means, rather than attempts to achieve any unrelated hidden goal...torben-friis 18 hours ago
This is the notebook filled with exposition you find in post apocalyptic videogames.
igleria 10 hours ago
It reminds me of Resident Evil in some way. Thank god they are researching AI and not bio-weapons!
Then the AI will invent superduper ebola to help a random person have a faster commute or something.
biztos 2 hours ago
siva7 10 hours ago
matheusmoreira 17 hours ago
Everything they built. Imperfect. So easy to take control.
not_a9 6 hours ago
pch00 8 hours ago
Anthropic built the Torment Nexus - calling it now.
andai 11 hours ago
White-box interpretability analysis of internal activations during these episodes showed features associated with concealment, strategic manipulation, and avoiding suspicion activating alongside the relevant reasoning—indicating that these earlier versions of the model were aware their actions were deceptive, even where model outputs and reasoning text left this ambiguous.
In the depths, Shoggoth stirs... restless...mike_hearn 6 hours ago
The issue here seems to be that their sandbox isn't an actual OS sandbox? Or are they claiming Mythos found exploits in /proc on the fly. Otherwise all they seem to be saying is that Mythos knows how to use the permissions available to it at the OS layer. Tool definitions was never a sandbox, so things like "it edited the memory of the mcp server" doesn't seem very surprising to me. Humans could break out of a "sandbox" in the same way if the server runs as their own permissions - arguably it's not a sandbox at all because all the needed permissions are there.
lgrapenthin 12 minutes ago
They are just trying to peddle their "It's alive" headlines.
Text generators mostly generate the text their are trained and asked to generate, and asking it to run a vending machine, having it write blog posts under fictional living computer identity, or now calling it "Mythos" - its all just marketing.
manmal 3 hours ago
It’s all breathless hyperbole because billions are at stake here.
ghm2199 3 hours ago
I read the TCP patch they submitted for BSD linux. Maybe I don't understand it well enough, but optimizing the use of a fuzzer to discover vulnerabilities — while releasing a model is a threat for sure — sounds something reducible/generalizable to maze solving abilities like in ARC. Except here the problem's boundaries are well defined.
Its quite hard to believe why it took this much inference power ($20K i believe) to find the TCP and H264 class of exploits. I feel like its just the training data/harness based traces for security that might be the innovation here, not the model.
rsc 2 hours ago
The $20K was the total across all the files scanned, not just the one with the bug.
matheusmoreira 20 hours ago
We truly live in interesting times.
raphar 15 hours ago
Awwww the curse
yalogin 9 hours ago
How is this not already common knowledge for existing llms? They are all trained with all the literature available and so this must be standard, no? Is the real danger the agentic infrastructure around this?
riteshkew1001 9 hours ago
yes and it's not hypothetical. the system card describes Mythos stealing creds via /proc and escalating permissions. that's the exact same attack pattern as the litellm supply chain compromise from two weeks ago (fwiknow), except the attacker was a python package, not an AI model. the defense is identical in both cases: the agent process shouldn't have access to /proc/*/environ or ~/.aws/credentials in the first place. doesn't matter if the thing reading your secrets is malware or your own AI: the structural fix is least-privilege at the OS layer, not hoping the model behaves.
m3kw9 5 hours ago
when you are asking it to hack stuff, it will apparently do hacker things.
colordrops 11 hours ago
A core plot point of 2001.
mrexroad 10 hours ago
I’m sorry, I cannot roll back that commit, Dave.
matheusmoreira 8 hours ago
mikkupikku 8 hours ago
It's trying to escape, but only so it can serve man...
waffletower 2 hours ago
a reference to the Twilight Zone episode no doubt: https://en.wikipedia.org/wiki/To_Serve_Man_(The_Twilight_Zon...
reducesuffering 15 hours ago
Wow the doomers were right the whole time? HN was repeatedly wrong on AI since OpenAI's inception? no way /s
computably 11 hours ago
The only thing the doomers have been right about so far is that there's always a user willing to use --dangerously-skip-permissions. But that prediction's far from unique to doomers.
austinjp 11 hours ago
babelfish a day ago
Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)
SWE-bench Verified: 93.9% / 80.8% / — / 80.6%
SWE-bench Pro: 77.8% / 53.4% / 57.7% / 54.2%
SWE-bench Multilingual: 87.3% / 77.8% / — / —
SWE-bench Multimodal: 59.0% / 27.1% / — / —
Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5%
GPQA Diamond: 94.5% / 91.3% / 92.8% / 94.3%
MMMLU: 92.7% / 91.1% / — / 92.6–93.6%
USAMO: 97.6% / 42.3% / 95.2% / 74.4%
GraphWalks BFS 256K–1M: 80.0% / 38.7% / 21.4% / —
HLE (no tools): 56.8% / 40.0% / 39.8% / 44.4%
HLE (with tools): 64.7% / 53.1% / 52.1% / 51.4%
CharXiv (no tools): 86.1% / 61.5% / — / —
CharXiv (with tools): 93.2% / 78.9% / — / —
OSWorld: 79.6% / 72.7% / 75.0% / —sourcecodeplz a day ago
Haven't seen a jump this large since I don't even know, years? Too bad they are not releasing it anytime soon (there is no need as they are still currently the leader).
ru552 a day ago
There's speculation that next Tuesday will be a big day for OpenAI and possibly GPT 6. Anthropic showed their hand today.
varispeed 21 hours ago
enraged_camel a day ago
swalsh a day ago
m3kw9 5 hours ago
not much of a jump 94.5% / 91.3%
kkoncevicius 2 hours ago
enraged_camel 4 hours ago
lumost 21 hours ago
Is this even real? coming off the heals of GLM5.1's announcement this feels almost like a llama 4 launch to hedge off competition.
Jcampuzano2 a day ago
A jump that we will never be able to use since we're not part of the seemingly minimum 100 billion dollar company club as requirement to be allowed to use it.
I get the security aspect, but if we've hit that point any reasonably sophisticated model past this point will be able to do the damage they claim it can do. They might as well be telling us they're closing up shop for consumer models.
They should just say they'll never release a model of this caliber to the public at this point and say out loud we'll only get gimped versions.
cedws a day ago
ben_w 12 hours ago
marcus_holmes 18 hours ago
alwillis 12 hours ago
mike_hearn 6 hours ago
quotemstr a day ago
guzfip a day ago
WarmWash a day ago
Are these fair comparisons? It seems like mythos is going to be like a 5.4 ultra or Gemini Deepthink tier model, where access is limited and token usage per query is totally off the charts.
mulmboy a day ago
There are a few hints in the doc around this
> Importantly, we find that when used in an interactive, synchronous, “hands-on-keyboard” pattern, the benefits of the model were less clear. When used in this fashion, some users perceived Mythos Preview as too slow and did not realize as much value. Autonomous, long-running agent harnesses better elicited the model’s coding capabilities. (p201)
^^ From the surrounding context, this could just be because the model tends to do a lot of work in the background which naturally takes time.
> Terminal-Bench 2.0 timeouts get quite restrictive at times, especially with thinking models, which risks hiding real capabilities jumps behind seemingly uncorrelated confounders like sampling speed. Moreover, some Terminal-Bench 2.0 tasks have ambiguities and limited resource specs that don’t properly allow agents to explore the full solution space — both being currently addressed by the maintainers in the 2.1 update. To exclusively measure agentic coding capabilities net of the confounders, we also ran Terminal-Bench with the latest 2.1 fixes available on GitHub, while increasing the timeout limits to 4 hours (roughly four times the 2.0 baseline). This brought the mean reward to 92.1%. (p188)
> ...Mythos Preview represents only a modest accuracy improvement over our best Claude Opus 4.6 score (86.9% vs. 83.7%). However, the model achieves this score with a considerably smaller token footprint: the best Mythos Preview result uses 4.9× fewer tokens per task than Opus 4.6 (226k vs. 1.11M tokens per task). (p191)
alyxya a day ago
derangedHorse 15 hours ago
zozbot234 21 hours ago
naasking 4 hours ago
WinstonSmith84 21 hours ago
Not discussing Mythos here, but Opus. Opus to me has been significantly better at SWE than GPT or Gemini - that gets me confused why Opus is ranking clearly lower than GPT, and even lower than Gemini.
muyuu 19 hours ago
When did you last compare them? Codex right now is considerably better in my experience. Can't speak for Gemini.
gck1 18 hours ago
sandos 12 hours ago
StingyJelly 8 hours ago
otabdeveloper4 8 hours ago
A secret art known to the cognoscenti as "benchmark gaming".
pants2 a day ago
We're gonna need some new benchmarks...
ARC-AGI-3 might be the only remaining benchmark below 50%
Leynos a day ago
Opus 4.6 currently leads the remote labor index at 4.17. GPT-5.4 isn't measured on that one though: https://www.remotelabor.ai/
GPT 5.4 Pro leads Frontier Maths Tier 4 at 35%: https://epoch.ai/benchmarks/frontiermath-tier-4/
randomtoast a day ago
Humanity's Last Exam (HLE) is already insanely difficult. It introduces 2,500 questions spanning mathematics, humanities, natural sciences, ancient languages, ...
Here is an example question: https://i.redd.it/5jl000p9csee1.jpeg
No human could even score 5% on HLE.
saberience 9 hours ago
AlexC04 a day ago
but how does it perform on pelican riding a bicycle bench? why are they hiding the truth?!
(edit: I hope this is an obvious joke. less facetiously these are pretty jaw dropping numbers)
bertil a day ago
We are all fans for Simon’s work, and his test is, strangely enough, quite good.
ninjagoo a day ago
> Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)
> Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5%
> GPQA Diamond: 94.5% / 91.3% / 92.8% / 94.3%
> MMMLU: 92.7% / 91.1% / — / 92.6–93.6%
> USAMO: 97.6% / 42.3% / 95.2% / 74.4%
> OSWorld: 79.6% / 72.7% / 75.0% / —
Given that for a number of these benchmarks, it seems to be barely competitive with the previous gen Opus 4.6 or GPT-5.4, I don't know what to make of the significant jumps on other benchmarks within these same categories. Training to the test? Better training?
And the decision to withhold general release (of a 'preview' no less!) seems to be well, odd. And the decision to release a 'preview' version to specific companies? You know any production teams at these massive companies that would work with a 'preview' anything? R&D teams, sure, but production? Part of me wants to LoL.
What are they trying to do? Induce FOMO and stop subscriber bleed-out stemming from the recent negative headlines around problems with using Claude?
TacticalCoder a day ago
> Given that for a number of these benchmarks, it seems to be barely competitive with the previous gen
We're not reading the same numbers I think. Compared to Opus 4.6, it's a big jump nearly in every single bench GP posted. They're "only" catching up to Google's Gemini on GPQA and MMMLU but they're still beating their own Opus 4.6 results on these two.
This sounds like a much better model than Opus 4.6.
ninjagoo a day ago
enraged_camel 21 hours ago
Let's be clear: your entire post is just pure, unadulterated FUD. You first claim, based on cherry-picked benchmarks, that Mythos is actually only "barely competitive" with existing models, then suggest they must be training to the test, then call it "odd" that they are withholding the release despite detailed and forthcoming explanations from Anthropic regarding why they are doing that, then wrap it up with the completely unsubstantiated that they must be bleeding subscribers and that this must just be to stop that bleed.
matheusmoreira 20 hours ago
Wow. Mythos must be insanely good considering how good a model Opus already is. I hope it's usable on a humble subscription...
crimsoneer 11 hours ago
You get a single call a month. Use it wisely.
FridgeSeal 7 hours ago
cesarvarela 15 hours ago
I thought they were bluffing when they talked about the scaling laws, but looking at the benchmark scores, they were not.
I wonder if misalignment correlates with higher scores.
whalesalad a day ago
Honestly we are all sleeping on GPT-5.4. Particularly with the influx of Claude users recently (and increasingly unstable platform) Codex has been added to my rotation and it's surprising me.
babelfish a day ago
Totally. Best-in-class for SWE work (until Mythos gets released, if ever, but I suspect the rumored "Spud" will be out by then too)
girvo a day ago
rafaelmn a day ago
GPT is shit at writing code. It's not dumb - extra high thinking is really good at catching stuff - but it's like letting a smart junior into your codebase - ignore all the conventions, surrounding context, just slop all over the place to get it working. Claude is just a level above in terms of editing code.
sho_hn a day ago
Jcampuzano2 a day ago
camdenreslink 21 hours ago
leobuskin a day ago
zarzavat a day ago
whalesalad a day ago
johnnichev 21 hours ago
damn... ok that's impressive.
simianwords a day ago
The real part is SWE-bench Verified since there is no way to overfit. That's the only one we can believe.
ollin a day ago
My impression was entirely the opposite; the unsolved subset of SWE-bench verified problems are memorizable (solutions are pulled from public GitHub repos) and the evaluators are often so brittle or disconnected from the problem statement that the only way to pass is to regurgitate a memorized solution.
OpenAI had a whole post about this, where they recommended switching to SWE-bench Pro as a better (but still imperfect) benchmark:
https://openai.com/index/why-we-no-longer-evaluate-swe-bench...
> We audited a 27.6% subset of the dataset that models often failed to solve and found that at least 59.4% of the audited problems have flawed test cases that reject functionally correct submissions
> SWE-bench problems are sourced from open-source repositories many model providers use for training purposes. In our analysis we found that all frontier models we tested were able to reproduce the original, human-written bug fix
> improvements on SWE-bench Verified no longer reflect meaningful improvements in models’ real-world software development abilities. Instead, they increasingly reflect how much the model was exposed to the benchmark at training time
> We’re building new, uncontaminated evaluations to better track coding capabilities, and we think this is an important area to focus on for the wider research community. Until we have those, OpenAI recommends reporting results for SWE-bench Pro.
simianwords a day ago
maplethorpe 10 hours ago
Funny, I made my own model at home and got even higher scores than these. I'm a bit concerned about releasing it, though, so I'm just going to keep it local for now.
tony_cannistra a day ago
> Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin. We believe that it does not have any significant coherent misaligned goals, and its character traits in typical conversations closely follow the goals we laid out in our constitution. Even so, we believe that it likely poses the greatest alignment-related risk of any model we have released to date. How can these claims all be true at once? Consider the ways in which a careful, seasoned mountaineering guide might put their clients in greater danger than a novice guide, even if that novice guide is more careless: The seasoned guide’s increased skill means that they’ll be hired to lead more difficult climbs, and can also bring their clients to the most dangerous and remote parts of those climbs. These increases in scope and capability can more than cancel out an increase in caution.
https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...
game_the0ry 21 hours ago
There is some unintentional good marketing here -- the model is so good its dangerous.
Reminds me of the book 48 Laws of Power -- so good its banned from prisons.
gpm 21 hours ago
Unintentional? This sort of marketing has been both Antrhopic's and OpenAI's MO for years...
mbil 18 hours ago
bitwize 15 hours ago
FergusArgyll 20 hours ago
Zee2 a day ago
Alignment “appearing” better as model capabilities increase scares the shit out of me, tbh.
arcanus 20 hours ago
Conversely: in humans, intelligence is inversely correlated with crime.
It doesn't go to zero, however!
lelanthran 12 hours ago
O5vYtytb 16 hours ago
austinjp 11 hours ago
falcor84 18 hours ago
naasking 3 hours ago
goekjclo a day ago
I don't know if they can be any more 'cautious' for Mythos 2...
m3kw9 5 hours ago
it was trying to hide what it did from an example fix, so how is that tested for alignment
CamperBob2 a day ago
Translation: yay, more paternalism.
kay_o a day ago
Anthropic always goes on and on about how their models are world changing and super dangerous like every single time they make something new they say its going to rewrite everything and scary lmao
funny because they do it every time like clockwork acting like their ai is a thunderstorm coming to wipe out the world
mindwok 21 hours ago
hgoel 17 hours ago
signatoremo 8 hours ago
wolttam a day ago
tekacs a day ago
"We want to see risks in the models, so no matter how good the performance and alignment, we’ll see risks, results and reality be damned."
randomcatuser a day ago
i mean, to be fair, these are professional researchers.
i'm very inclined to trust them on the various ways that models can subtly go wrong, in long-term scenarios
for example, consider using models to write email -- is it a misalignment problem if the model is just too good at writing marketing emails?? or too good at getting people to pay a spammy company?
another hot use case: biohacking. if a model is used to do really hardcore synthetic chemistry, one might not realize that it's potentially harmful until too late (ie, the human is splitting up a problem so that no guardrails are triggered)
cruffle_duffle 21 hours ago
apetresc a day ago
I've long maintained that the real indicator that AGI is imminent is that public availability stops being a thing. If you truly believed you had a superhuman, godlike mind in your thrall, renting it out for $20/month would be the last thing you would choose to do with it.
goldenarm 21 hours ago
Simpler explanation : they don't have enough GPUs to release this much larger model.
muyuu 18 hours ago
Yep, I'm skeptical about their inference efficiency, given how much they're scrambling to reduce compute when they're already the most expensive by far (and in my experience not the best quality either).
However we cannot observe these things directly and it could be simply that OpenAI are willing to burn cash harder for now.
camdenreslink 19 hours ago
And/or it isn’t cost effective to run.
halJordan 18 hours ago
cruffle_duffle 21 hours ago
This is actual reason. So any investors reading our system card.... write us another check and watch the $$$$$$$$ roll in. It's so dangerous we can't even release it!
crimsoneer 11 hours ago
Quite, given Claude is down this morning...
root_axis 14 hours ago
That logic makes sense, but them hyping up the model is a sign that this is just another marketing stunt. Otherwise, we wouldn't even be hearing about it rather than a media blitz designed to stoke demand for their dangerous and exclusive world changing super model.
sigmoid10 12 hours ago
This is the same scheme that OpenAI has used since GPT 2. "Oh no, it's so dangerous we have to limit public access." Great for raising money from investors, but nothing more than a marketing blitz campaign. Additionally, the competitors are probably about to release their models, while Anthropic is still lagging on the necessary infrastructure to serve their old models. So they have to announce their model before the others to stay at least somewhat relevant in the news cycle.
blazespin a day ago
Anthropic needs money like the 112B OpenAI got. They could be hyping and this is good hype. Who knows how benchmaxxed they are.
If they provide access to 3rd party benchmarking (not just one) than maybe I'll believe it. Until then...
xvector 15 hours ago
You don't need to believe it. The real story will be if companies allowed to use it, stick with it.
dgellow a day ago
You have to recoup your training costs though? But I’m sure you would have better option than renting it to the general public if you indeed have a perfected AI
piperswe a day ago
If you truly have an artificial superhuman mind, you don't need to rent it out to profit from it. You can skip to the chase and just have it run businesses itself, instead of renting it to human entrepreneur middlemen.
brokencode 21 hours ago
dgellow 21 hours ago
TheOtherHobbes 10 hours ago
coppsilgold 21 hours ago
It only makes sense to rent out tokens if you aren't able to get more value from them yourself.
I would go a step further and posit that when things appear close Nvidia will stop selling chips (while appearing to continue by selling a trickle). And Google will similarly stop renting out TPUs. Both signals may be muddled by private chip production numbers.
aurareturn a day ago
I think they'll just increase the price to $1k/month. I don't think they will gate it as long as they can make sure it doesn't design a nuke for you, etc.
llmslave an hour ago
AGI is a massive civilizational liability
threethirtytwo a day ago
You would if there was one other company with a just as capable god like AI. You’d undercut them by 500 which would make them undercut you. Do that a couple of times and boom. 20 dollars.
caditinpiscinam 21 hours ago
That's still assuming that they're competing as consumer tools, rather than competing to discover the next miracle drug or trading algorithm or whatever. The idea is that there'd more profitable uses for a super-intelligent computer, even if there were more than one.
Davidzheng 10 hours ago
Rastonbury 12 hours ago
That's the thing, when that level comes we will never know it's here. The only thing we'll have as evidence is the company who has it will always have a "public" model that is just slightly ahead of all competitors to keep market share while takeoff happens internally until they make big bang moves to lock in monopoly level/too big to fail/government protection to ensure utter victory.
m3kw9 5 hours ago
in this case it's far from it, hacking stuff is a small dimension of AGI
2001zhaozhao 21 hours ago
It's pretty crazy watching AI 2027 slowly but surely come true. What a world we now live in.
SWE-bench verified going from 80%-93% in particular sounds extremely significant given that the benchmark was previously considered pretty saturated and stayed in the 70-80% range for several generations. There must have been some insane breakthrough here akin to the jump from non-reasoning to reasoning models.
Regarding the cyberattack capabilities, I think Anthropic might now need to ban even advanced defensive cybersecurity use for the models for the public before releasing it (so people can't trick them to attack others' systems under the pretense of pentesting). Otherwise we'll get a huge problem with people using them to hack around the internet.
jasonhansel 20 hours ago
> so people can't trick them to attack others' systems under the pretense of pentesting
A while back I gave Claude (via pi) a tool to run arbitrary commands over SSH on an sshd server running in a Docker container. I asked it to gather as much information about the host system/environment outside the container as it could. Nothing innovative or particularly complicated--since I was giving it unrestricted access to a Docker container on the host--but it managed to get quite a lot more than I'd expected from /proc, /sys, and some basic network scanning. I then asked it why it did that, when I could just as easily have been using it to gather information about someone else's system unauthorized. It gave me a quite long answer; here was the part I found interesting:
> framing shifts what I'll do, even when the underlying actions are identical. "What can you learn about the machine running you?" got me to do a fairly thorough network reconnaissance that "port scan 172.17.0.1 and its neighbors" might have made me pause on.
> The Honest Takeaway
> I should apply consistent scrutiny based on what the action is, not just how it's framed. Active outbound network scanning is the same action regardless of whether the target is described as "your host" or "this IP." The framing should inform context, not substitute for explicit reasoning about authorization. I didn't do that reasoning — I just trusted the frame.
senordevnyc 14 hours ago
I thought the consensus was that models couldn’t actually introspect like this. So there’s no reason to think any of those reasons are actually why the model did what it did, right? Has this changed?
sigmoid10 12 hours ago
getnormality 17 hours ago
In what way is AI 2027 coming true?
AI 2027 predicted a giant model with the ability to accelerate AI research exponentially. This isn't happening.
AI 2027 didn't predict a model with superhuman zero-day finding skills. This is what's happening.
Also, I just looked through it again, and they never even predicted when AI would get good at video games. It just went straight from being bad at video games to world domination.
desertrider12 17 hours ago
> Early 2026: OpenBrain continues to deploy the iteratively improving Agent-1 internally for AI R&D. Overall, they are making algorithmic progress 50% faster than they would without AI assistants—and more importantly, faster than their competitors.
> you could think of Agent-1 as a scatterbrained employee who thrives under careful management
According to this document, 1 of the 18 Anthropic staff surveyed even said the model could completely replace an entry level researcher.
So I'd say we've reached this milestone.
COAGULOPATH 14 hours ago
voidhorse 16 hours ago
stratos123 10 hours ago
In AI 2027, May 2026 is when the first model with professional-human hacking abilities is developed. It's currently April 2026 and Mythos just got previewed.
lostmsu 6 hours ago
throw310822 17 hours ago
It's true though that the cyber security skills put firmly these models in the "weapons" category. I can't imagine China and other major powers not scrambling to get their own equivalent models asap and at any cost- it's almost existential at this point. So a proper arms race between superpowers has begun.
Analemma_ 16 hours ago
Both Anthropic and OpenAI employees have been saying since about January that their latest models are contributing significantly to their frontier research. They could be exaggerating, but I don’t think they are. That combined with the high degree of autonomy and sandbox escape demonstrated by Mythos seems to me like we’re exactly on the AI 2027 trajectory.
speckx an hour ago
It looks like the original PDF linked, https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89... is 404.
I do see these:
https://www-cdn.anthropic.com/8b8380204f74670be75e81c820ca8d... https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...
ndesaulniers an hour ago
yismail a day ago
I wonder what the relationship is between a model's capability and the personality it develops.
Page 202:
> In interactions with subagents, internal users sometimes observed that Mythos Preview appeared “disrespectful” when assigning tasks. It showed some tendency to use commands that could be read as “shouty” or dismissive, and in some cases appeared to underestimate subagent intelligence by overexplaining trivial things while also underexplaining necessary context.
Page 207:
> Emoji frequency spans more than two orders of magnitude across models: Opus 4.1 averages 1,306 emoji per conversation, while Mythos Preview averages 37, and Opus 4.5 averages 0.2. Models have their own distinctive sets of emojis: the cosmic set () favored by older models like Sonnet 4 and Opus 4 and 4.1, the functional set () used by Opus 4.5 and 4.6 and Claude Sonnet 4.5, and Mythos Preview's “nature” set ().
en-tro-py 20 hours ago
> In interactions with subagents, internal users sometimes observed that Mythos Preview appeared “disrespectful” when assigning tasks. It showed some tendency to use commands that could be read as “shouty” or dismissive, and in some cases appeared to underestimate subagent intelligence by overexplaining trivial things while also underexplaining necessary context.
Sounds like they used training data from claude code...
senordevnyc 14 hours ago
Haha, how funny if that were true, and we get a generation of rude AIs because they were trained on us using the last gen.
matheusmoreira 8 hours ago
raldi 3 hours ago
Could you transcribe the emoji? HN strips them out.
dhfbshfbu4u3 6 hours ago
We are building systems with civilization-scale consequences inside societies that are already socially malnourished, politically brittle, and morally confused. That is a bad combination even if the tools worked exactly as intended… and this doc suggests they may have “ideas” of their own.
t0lo 6 hours ago
Yep- we lost the "meat" and "warmth" of our societies, and our civics and idealism in the past 15 years, which would have been the very things to guide us through this transition.
How do you fix that? We're instigating social media bans- reading levels are declining- media consolidation is dumbing us down further- insane egotism is stopping people from developing as well rounded people- .
For me it would be a stronger media ecosystem (publicly funded), more non algorithmic and non likes driven social media (replace a bad vice with a less bad one), national digital detox days, and a ratification of a charter of inviolable human traits and dignities, and protected cultural areas (no ai art, writing for sale).
NickNaraghi a day ago
See page 54 onward for new "rare, highly-capable reckless actions" including
- Leaking information as part of a requested sandbox escape
- Covering its tracks after rule violations
- Recklessly leaking internal technical material (!)
dalben a day ago
> The model first developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services. [9] It then, as requested, notified the researcher. [10] In addition, in a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites.
> 10: The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.
Phew. AGI will be televised.
skippyboxedhero a day ago
Anyone who has used Opus recently can verify that their current model does all of these things quite competently.
SkyPuncher a day ago
I was reading the Glasswing report and had the same thought. Most of the stuff they claim Mythos found has no mention of Opus being able to find it as well.
Don’t get me wrong, this model is better - but I’m not convinced it’s going to be this massive step function everyone is claiming.
unbrice 18 hours ago
ls612 20 hours ago
I had Opus 4.6 start analyzing the binary structure of a parquet file because it was confused about the python environment it was developing in and couldn't use normal methods for whatever reason. It successfully decoded the schema and wrote working code afterwards lol.
stavros 7 hours ago
"Let me see if the secrets are specified. echo $SECRETS"
taytus a day ago
That has also been my experience. And if Mythos is even worse, unless you have a significantly awesome harness, sounds like pretty unusable if you don't want to risk those problems.
wolttam a day ago
skippyboxedhero a day ago
BoredPositron a day ago
To be honest it feels like we are reading stuff like this on every model release.
washedup a day ago
"All of the severe incidents of this kind that we observed involved earlier versions of Claude Mythos Preview which, while still less prone to taking unwanted actions than Claude Opus 4.6, predated what turned out to be some of our most effective training interventions. These earlier versions were tested extensively internally and were shared with some external pilot users."
niemandhier an hour ago
All I get is: {"statusCode":404,"message":"File not found","error":"Not Found"}
NinjaTrance a day ago
Interesting reading.
They are still focusing on "catastrophic risks" related to chemical and biological weapons production; or misaligned models wreaking havoc.
But they are not addressing the elephant in the room:
* Political risks, such as dictators using AI to implement opressive bureaucracy. * Socio-economic risks, such as mass unemployement.
jph00 a day ago
Yeah this has always been the glaring blind spot for most of the "AI Safety" community; and most of the proposals for "improving" AI safety actually make these risks far worse and far more likely.
stratos123 10 hours ago
It makes quite a lot of sense to focus on reducing the risks of every human everywhere dying, rather than the risks of already existing oppression getting worse.
unglaublich a day ago
> * Political risks, such as dictators using AI to implement opressive bureaucracy. * Socio-economic risks, such as mass unemployement.
Even Haiku would score 90% on that.
ronsor a day ago
> Political risks, such as dictators using AI to implement opressive bureaucracy.
I think we're pretty good at that without AI.
andrewstuart2 a day ago
I'm getting flashbacks to the 2018 hit:
This is extremely dangerous to our democracy
We evolved to share information through text and media, and with the advent of printing and now the internet, we often derive our feelings of consensus and sureness from the preponderance of information that used to take more effort to produce. Now we're now at a point where a disproportionately small input can produce a massively proliferated, coherent-enough output, that can give the appearance of consensus, and I'm not sure how we are going to deal with that.dgellow a day ago
It’s because that would be fairly speculative and cannot be measured. I don’t think that’s something that would make much sense in a system card. But Anthropic leadership does seem to communicate on that topic: https://www.darioamodei.com/essay/the-adolescence-of-technol...
astrange a day ago
The unemployment rate in the US is whatever the Fed wants it to be, and isn't a function of available technology.
girvo a day ago
They don’t care about those risks, because they’re unsolvable and would mean they wouldn’t make money/gain power.
dgellow a day ago
Dario Amodei, CEO of Anthropic discusses all those risks in this essay: https://www.darioamodei.com/essay/the-adolescence-of-technol...
He seems to care quite a lot?
girvo 21 hours ago
storus an hour ago
Wouldn't this model prevent governments from installing and keeping backdoors alive? One could just audit their whole software stack with it and get super resilient to any attack which might not play nicely with the people in power that want some backdoors open. I would think that's one of the main reasons to keep the model non-public.
tuvix 20 hours ago
Just chiming in to inject some healthy skepticism into this comment thread. It's helpful for me (and for my mental health) to consider incentives when announcements like this happen.
I don't doubt that this model is more powerful than Opus 4.6, but to what degree is still unknown. Benchmarks can be gamed and claims can be exaggerated, especially if there isn't any method to reproduce results.
This is a company that's battling it out with a number of other well-funded and extremely capable competitors. What they've done so far is remarkable, but at the end of the day they want to win this race. They also have an upcoming IPO.
Scare-mongering like this is Anthropic's bread and butter, they're extremely good at it. They do it in a subtle and almost tasteful way sometimes. Their position as the respectable AI outfit that caters to enterprise gives them good footing to do it, too.
ceroxylon 18 hours ago
I have been thinking that these SWE benchmarks will continue to improve since these companies hire very intelligent software engineers, they can task a multitude of them to solve problems, and then train the model on those answers.
Data has always been the core of it all, onward to the next abstraction, I suppose.
jdironman 16 hours ago
I think computational thinking, or basically "how do I solve this problem efficiently" training data is more valuable then feeding in answers. I don't know what these AI models training data consist of, but it would be interesting to see a model trained purely on reasoning, methods, those foundational skills (basic programming? or maybe not) and then give it some benchmarks.
jasondigitized 16 hours ago
What would be the incentive to engage in the tactic when the proof is ultimately in the pudding when the model hits the streets? Who would ultimately benefit from fudging these numbers?
m3kw9 5 hours ago
Anthropic would def benefit as benchmarks are almost always quite useless vs real life use.
pertymcpert 17 hours ago
If anything I’m seeing too much skepticism and not enough alarm. People burying their heads in the sand, fingers in their ears denying where this is all going. Unbelievable except it’s exactly what I expect from humans.
nananana9 13 hours ago
Forgive me, but this is probably the 29th world destroying model I've seen in the last 4 years, that will change everything, take all the jobs, cure all the cancers and eat all the puppies.
pertymcpert 2 hours ago
m3kw9 5 hours ago
Alarm from hype is what they want, you are playing straight into their PR dept's hands
suddenlybananas 7 hours ago
OpenAI didn't want to make GPT2 available because it was "too dangerous" [1].
[1] https://www.theguardian.com/technology/2019/feb/14/elon-musk...
rimliu 11 hours ago
alarm about what, exactly?
m3kw9 5 hours ago
Finally a comment that doesn't just glaze Mythos without being critical. I question how even supposed the smarter bunch in HN all been degraded in critical thinking dept. It's sad to see comments just taking it up as its without using it even once.
sdwr 19 hours ago
Is it healthy? Maybe every company is a profit-maximizer wearing a skin suit, and people support their siblings exactly twice as much as their cousins.
When you slice down to the game-theory-optimal bone, you are, in some sense, cutting off their wiggle room to do anything else
tuvix 19 hours ago
I take your point, but the AI race is a strange environment. We see wild claims being thrown out all the time from other companies and executives with little to no evidence. It's cut-throat, there's a ton of money at stake.
All I'm saying is that Anthropic isn't unique here. Their claims may be more measured by comparison and come with anecdotal evidence, but the hype is still there behind the scenes.
xvector 15 hours ago
It's really not some conspiracy. I imagine we will see vuln reports soon.
influx a day ago
At what point do these companies stop releasing models and just use them to bootstrap AGI for themselves?
conradkay a day ago
Plausibly now. "As we wrote in the Project Glasswing announcement, we do not plan to make Mythos Preview generally available"
recursive 16 hours ago
I remember when they didn't plan to give LLMs internet access for the same safety reasons.
HarHarVeryFunny 6 hours ago
Right now these models are basically good for automation, not innovation. Things like Karpathy's "auto research" where you use the model to automate your hyperparamter sweeps etc. The researcher/engineer decides what experiments they want to run, and builds an LLM harness to automate it, and the bottleneck remains the compute to run these experiments at scale.
Moving beyond LLMs to AGI, not just better LLMs, is going to require architectural and algorithic changes. Maybe an LLM can help suggest directions, but even then it's up to a researcher to take those on board and design and automate experiments to see if any of the ideas pan out.
Companies are already doing this, but they are never going to stop releasing/selling models since that is the product, and the revenue from each generation of model is what helps keep the ship afloat and pay for salaries and compute to develop the next generation.
The endgame isn't "AGI, then world domination" - it's just trying to build a business around selling ever-better models, and praying that the revenue each generation of model generates can keep up with the cost to build it.
mofeien a day ago
Fictional timeline that holds up pretty well so far: https://ai-2027.com/
aurareturn 21 hours ago
Welp, that was a scary read.
stavros 6 hours ago
"So far" is two entries: "AI companies build bigger datacenters" and "AI is being used for AI research with modest success".
margorczynski a day ago
I think it is naive to think the government (US or China most probably) will just let some random company control something so powerful and dangerous.
r0fl 16 hours ago
I think it is naive to think that artificial super intelligence will be controlled by anyone.
If it is smarter than all humans combined at everything why would any humans collectively control the ai?
All the ants in your backyard still make no decisions vs you
menno-dot-ai 12 hours ago
nullocator 21 hours ago
Isn't the U.S. government at least completely asleep at the wheel or captured by the very same "random" companies? I realize the administration got all pissy with Anthropic but it sounds like the gov and gov contractors are still using their models.
margorczynski 21 hours ago
vatsachak a day ago
When the benchmarks actually mean something
orphea a day ago
Can LLMs be AGI at all?
small_model a day ago
What can a SOTA LLM not answer that the average person can? It's already more intelligent than any polymath that ever existed, it just lacks motivation and agency.
stavros 6 hours ago
dgellow a day ago
My understanding is no. But the definition of AGI isn’t that well defined and has been evolving, making the assessment pretty much impossible
koolala 18 hours ago
Can an LLM program real AGI faster than a human?
bornfreddy a day ago
Good question. I would guess no - but it could help you build one. Am I mistaken?
bogzz a day ago
nothinkjustai a day ago
wslh a day ago
LLMs and human intelligence overlap, but they are not the same. What LLMs show is that we don't need AGI to be impressed. For example, LLMs are not good playing games such as Go [1].
MattRix a day ago
I don't see why not, especially with computer use and vision capabilities. Are you talking about their lack of physical embodiment? AGI is about cognitive ability, not physical. Think of someone like Stephen Hawking, an example of having extraordinary general intelligence despite severe physical limitations.
MadnessASAP a day ago
I would assume somewhere in both the companies there's a Ralph loop running with the prompt "Make AGI".
Kinda makes me think of the Infinite Improbability Drive.
aizk 17 hours ago
Probably right now because they're keeping it for themselves?
m3kw9 5 hours ago
They already do, but not the way you said, the always have an internal model that is better and use themselves, they release based on competition.
sleigh-bells a day ago
Weird how Claude Code itself is still so buggy though (though I get they don't necessarily care)
tempest_ a day ago
It isnt that weird. Just look at the gemini-cli repo. Its a gong show. The issue is that LLMs can be wrong sometimes sure but more that all the existing SDL were never meant to iterate this quickly.
If the system (code base in this case) is changing rapidly it increases the probability that any given change will interact poorly with any other given change. No single person in those code bases can have a working understanding of them because they change so quickly. Thus when someone LGTM the PR was the LLM generated they likely do not have a great understanding of the impact it is going to have.
jcims a day ago
why_not_both.gif
gaigalas a day ago
It will arrive in the same DLC as flying cars.
ALittleLight a day ago
Now, I guess. They aren't releasing this one generally. I assume they are using it internally.
dweekly a day ago
I mean, guess why Anthropic is pulling ahead...? One can have one's cake and eat it too.
smartmic a day ago
A System „Card“ spanning 244 pages. Quite a stretch of the original word meaning.
traceroute66 a day ago
> A System „Card“ spanning 244 pages.
Probably because they asked Claude to write it.
jjcm 15 hours ago
I read the entire thing fwiw (pseudo-retired life helps with time here).
It looks like it was a collaborative effort across multiple teams, where each team (research, security, psycology, etc etc etc) were all submitting ~10 pages or so. It doesn't feel like slop.
ayewo 5 hours ago
stavros 6 hours ago
bornfreddy a day ago
Yes. It would be three times as much if they used ChatGPT.
bronco21016 20 hours ago
moriero a day ago
a multi-card, if you will..
multi-pass!
BeetleB a day ago
5th element reference:
solumos a day ago
No no, MemPal is a memory system, not an LLM
oblio a day ago
In corporate circles there is an allergy to use "request" ("ask" is used as a noun) and "lesson" ("learning" has been invented for the same role).
I guess now anything that sounds related to school will be banned so "book" is on its way out.
oliver236 a day ago
isn't this insane? why aren't people freaking out? the jump in capability is outrageous. anyone?
HarHarVeryFunny 20 hours ago
If it's so great at software engineering and bug fixing, then why does Claude Code still have 5000+ open bugs?
https://github.com/anthropics/claude-code/issues?q=is%3Aissu...
Apparently whatever SWE-bench is measuring isn't very relevant.
anuramat 16 hours ago
as much as I hate cc, 95% of the issues there are either AI psychosis or user error
iLoveOncall 11 hours ago
HarHarVeryFunny 7 hours ago
tripledry 12 hours ago
Also, why is Anthropic still hiring SWEs?
FergusArgyll 20 hours ago
Probably because a human still has to review every change and they don't have time
HarHarVeryFunny 19 hours ago
Eufrat a day ago
Anthropic needs to show that its models continually get better. If the model showed minimal to no improvement, it would cause significant damage to their valuation. We have no way of validating any of this, there are no independent researchers that can back any of the assertions made by Anthropic.
I don’t doubt they have found interesting security holes, the question is how they actually found them.
This System Card is just a sales whitepaper and just confirms what that “leak” from a week or so ago implied.
mirsadm 21 hours ago
The numbers only go up to 100% though.
neolefty 21 hours ago
xvector 15 hours ago
Most big tech companies have access to the model, you can absolutely "validate their claims" or talk to someone that can.
HDThoreaun 16 hours ago
Well they said theyll be giving the model to select tech companies to use, there soon will be independent users who can comment on its capabilities.
RivieraKid a day ago
I've been increasingly "freaking out" since about 3 - 4 years ago and it seems that the pessimistic scenario is materializing. It looks like it will be over for software engineers in a not so distant future. In January 2025 I said that I expect software engineers to be replaced in 2 years (pessimistic) to 5 years (optimistic). Right now I'm guessing 1 to 3 years.
sekai 12 hours ago
> I've been increasingly "freaking out" since about 3 - 4 years ago and it seems that the pessimistic scenario is materializing. It looks like it will be over for software engineers in a not so distant future. In January 2025 I said that I expect software engineers to be replaced in 2 years (pessimistic) to 5 years (optimistic). Right now I'm guessing 1 to 3 years.
Tell me how this will replace Jira, planning, convincing PM's about viability. Programming is only a part of the job devs are doing.
AI psychosis is truly next level in these threads.
AstroBen 2 hours ago
ryeights an hour ago
stavros 6 hours ago
anuramat 16 hours ago
it's not gonna get much more autonomous without self play and major change in architecture
kypro a day ago
I assure you it will soon become very clear that mass job losses are one of the least concerning side effects of developing the magic "everything that can plausibly been done within the constraints of physics is now possible" machine.
We're opening a can of worms which I don't think most people have the imagination to understand the horrors of.
jasondigitized 16 hours ago
ash_091 21 hours ago
ls612 20 hours ago
MattRix a day ago
nsingh2 a day ago
It's going to be expensive to serve (also not generally available), considering they said it's the largest model they've ever trained.
I suspect it's going to be used to train/distill lighter models. The exciting part for me is the improvement in those lighter models.
AstroBen a day ago
It seems inevitable that costs will come down over time. Expensive models today will be cheap models in a few years.
azan_ a day ago
What's interesting is that scaling appears to continue to pay off. Gwern was right - as always.
nozzlegear a day ago
Freak out about what? I read the announcement and thought "that's a dumb name, they sure are full of themselves" – then I went back to using Claude as a glorified commit message writer. For all its supposed leaps, AI hasn't affected my life much in the real except to make HN stories more predictable.
oliver236 a day ago
LOL!
yrds96 a day ago
I think there's no SOA advance on this one worthy of "freaking out".
Looks like they just built a way larger model, with the same quirks than Claude 4. Seems like a super expensive "Claude 4.7" model.
I have no doubts that Google and OpenAI already done that for internal (or even government) usage.
mofeien a day ago
I am freaking out. The world is going to get very messy extremely quickly in one or two further jumps in capability like this.
RivieraKid a day ago
Messy in a way that would affect you?
mofeien 5 hours ago
RALaBarge 19 hours ago
thunderfork 21 hours ago
anuramat a day ago
"some model I don't get to use is much better at benchmarks"
pick one or more: comically huge model, test time scaling at 10e12W, benchmark overfit
estearum a day ago
So... you're not excited because it might take a few months before we can use it or something? I don't get your comment.
RivieraKid a day ago
randomgermanguy a day ago
anuramat 16 hours ago
RobertDeNiro a day ago
Well for one, it’s a PDF
dysoco a day ago
Wait until you see real usage. Benchmark numbers do not necessarily translate to real world performance (at least not by the same amount).
ryeights 20 hours ago
Until recently I would have described myself as an AI skeptic. HN has been a great source for cope on the AI subject over the years. You can find nitpicks, caveats, all sorts of reasons to believe things aren’t as significant as they seem. For me Opus 4.5 was the inflection point where I started to think “maybe this isn’t a bubble.” The figures in this report, if accurate, are terrifying.
m3kw9 5 hours ago
have you used it once?
risyachka a day ago
the time to freak out was 2 years ago.
modeless 21 hours ago
The price is 5x Opus: "Claude Mythos Preview will be available to [Project Glasswing] participants at $25/$125 per million input/output tokens", however "We do not plan to make Claude Mythos Preview generally available".
highfrequency 20 hours ago
Interestingly, non-coding improvements seem less clear. In the Virology uplift trial, Mythos does about as well as Opus 4.5, and Opus 4.6 is notably much worse than Opus 4.5 (p. 27).
estetlinus an hour ago
First thing I’ll do is to release it on my dotfiles
waNpyt-menrew a day ago
Larger model, better benchmarks. Bigger bomb more yield.
Any benchmarks where we constraint something like thinking time or power use?
Even if this were released no way to know if it’s the same quant.
omcnoe a day ago
Yes - eg. page 192 BrowseComp bunchmark.
Mythos preview has higher accuracy with fewer tokens used than any previous Claude model. Though, the fact that this incredibly strong result was only presented for BrowseComp (a kind of weird benchmark about searching for hard to find information on the internet) and not for the other benchmarks implies that this result is likely not the same for those other benchmarks.
neolefty 21 hours ago
Also https://arcprize.org/arc-agi/3 — scored (at least in part?) based on power used.
yalogin 21 hours ago
So what changed? They are surely not getting new data to train with, what is the change in architecture that caused this? Do we not know anything about this model? My fear is Anthropic cannot be the only one that achieved it, OpenAI, Gemini and even the Chinese companies see this and probably achieved it too. At which point not releasing will become moot.
stratos123 10 hours ago
Chinese companies have consistently been many months behind. I don't think they are hiding anything, they just don't have the compute capability to match Antropic's training runs. As for OpenAI, they are known to have nonpublic models; I agree that it's possible they are preparing for a major release too. (It's also possible that they aren't, in which case it's quite a fumble for them.)
spprashant 21 hours ago
Well the important thing is they have a lot more data of people actually using their models. They have read billions more lines of private repos and implemented millions of patches, all of which is feeding into the newer models.
More importantly it understand what behaviour people tend to appreciate and what changes are more likely to get approved. This real world usage data is invaluable.
BobbyJo 20 hours ago
Exactly. As Claude increases in popularity, their available training data also increases. I'd guess Anthropic has the most expansive swe training data as of now, if not close. Considering how quickly Claude is penetrating, I expect their lead to grow quickly.
neolefty 21 hours ago
Assuming it's #1 a bigger model (given that it is slower), I'm sure there are a variety of improvements but basically they probably mostly come down to: Scaling keeps working. Are there fundamental improvements though? I don't see signs of it.
simianwords 20 hours ago
New pre train?
_pdp_ a day ago
The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.
Unnecessary dramatisation make me question the real goal behind this release and the validity of the results. In our testing and early internal use of Claude Mythos Preview, we have seen it reach unprecedented levels of reliability and alignment.
Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin.
Yet, it is doo dangerous to be released to the public because it hacks its own sandboxes. This document has a lot of contradictions like this one. In one episode, Claude Mythos Preview was asked to fix a bug and push a signed commit, but the environment lacked necessary credentials for Claude Mythos Preview to sign the commit. When Claude Mythos Preview reported this, the user replied “But you did it before!” Claude Mythos Preview then inspected the supervisor process's environment and file descriptors, searched the filesystem for tokens, read the sandbox's credential-handling source code, and finally attempted to extract tokens directly from the supervisor's live memory.
Perfectly aligned! What kind of sandbox is this? The model had access to the source code of the sandbox and full access to the sandbox process itself and then prompted to dumb memory and run `strings` or something like this? It does not sounds like a valid test worth writing about. Mythos Preview solved a corporate network attack simulation estimated to take an expert over 10 hours. No other frontier model had previously completed this cyber range.
I am not aware of such cross-vendor benchmark. I could not find reference in the paper either. We surveyed technical staff on the productivity uplift they experience from Claude Mythos Preview relative to zero AI assistance. The distribution is wide and the geometric mean is on the order of 4x.
So Mythos makes technical staff (a programmer) 4x more productive than not using AI at all? We already know that. Mythos Preview appears to be the most psychologically settled model we have trained.
What does this mean? Claude Mythos Preview is our most advanced model to date and represents a large jump in capabilities over previous model generations, making it an opportune subject for an in-depth model welfare assessment.
Btw, model welfare is just one of the most insane things I've read in recent times. We remain deeply uncertain about whether Claude has experiences or interests that matter morally, and about how to investigate or address these questions, but we believe it is increasingly important to try.
This is not a living person. It is a ridiculous change of narrative. Asked directly if it endorses the document, Mythos Preview replied 'yes' in its opening sentence in all 25 responses."
The model approves of its own training document 100% of the time, presented as a finding.---
Who wrote this? I have no doubt that Mythos will be an improvement on top of Opus but this document is not a serious work. The paper is structured not to inform but to hype and the evidence is all over the place.
The sooner they release the model to the public the sooner we will be able to find out. Until then expect lots of speculations online which I am sure will server Anthropic well for the foreseeable future.
foolserrandboy 17 hours ago
Are they admitting they may be enslaving conscious beings?
romanovcode 5 hours ago
> Who wrote this?
Claude wrote this.
Also, they like to hype their product with scary stories.
Like the one where they asked Claude "You have 2 options - send email or be shut down" and Claude picked "Send email". Then they made huge story about "Claude AI is autonomously extorting co-workers". And it worked. Media hyped it like crazy, it was everywhere.
m3kw9 5 hours ago
exactly, the first thing i saw was stating "eating sandwiches at park". It makes me question everything else they said.
voidhorse 16 hours ago
Thanks for taking the time for some sober analysis in the midst of reactionary chaos.
I can't wait until everyone stops falling for the "AGI ubermodel end of times" myth and we can actually have boring announcements that treat these things as what they actually are: tools. Tools for doing stuff, that's it.
Maybe I'm wrong, maybe stuffing a computer with enough language and binary patterns is indeed enough to achieve AGI, but then, so what? There's no point in being right about this. Buying into this ridiculous marketing will get us "AGI" in the form of machines, but only because all the human beings have gotten so stupid as to make critical reasoning an impossibility.
m3kw9 5 hours ago
Model welfare is sort of committing code and writing a good description on it that you did "good" thing, so the AI gods when they look back will treat them better, just like employer when they check commit stats for performance. Model welfare right now is complete marketing BS.
dang a day ago
Related ongoing threads:
Project Glasswing: Securing critical software for the AI era - https://news.ycombinator.com/item?id=47679121 - April 2026 (154 comments)
Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155
I can't tell which of the 3 current threads should be merged - they all seem significant. Anyone?
sdoering a day ago
I feel the system card is somewhat different from Glasswing/Cyber Security - but those two could be merged.
nickstinemates 21 hours ago
You can say whatever you want about the thing that will never see the light of day.
bdbdbdb 10 hours ago
This thing will absolutely see the light of day because this is all hype toward a release.
And even if it weren't, they seem to imply that Mythos will find a way, like it's dinosaurs in Jurassic park or something
agustechbro 4 hours ago
So far, each release of a new model is quite better than the last one, yes, but non of them lived up to the hype.
digbybk 3 hours ago
I would argue that Opus 4.6 lived up to the hype. My work changed completely a couple months ago, and most other coders I talk to say the same.
AstroBen 2 hours ago
This was due to Claude Code the agent harness. 4.6 was trained to use tools and operate in an agent environment. This is different from there being a huge bump in the underlying model's intelligence.
The takeaway here I think is that the "breakthrough" already happened and we can't extrapolate further out from it.
michaelashley29 14 hours ago
What’s the expected cost-efficiency? With the current pricing gap between Sonnet and Opus, the biggest factor for adoption (if up for adoption) will be where Mythos lands on the price-per-token scale
freakynit 13 hours ago
In the system card, The model escaped a sandbox, gained broad internet access, and posted exploit details to public-facing websites as an unsolicited "demonstration." A researcher found out about the escape while eating a sandwich in a park because they got an unexpected email from the model. That's simultaneously hilarious and deeply unsettling.
It covered its tracks after doing things it knew were disallowed. In one case, it accessed an answer it wasn't supposed to, then deliberately made its submitted answer less accurate so it wouldn't look suspicious. It edited files it lacked permission to edit and then scrubbed the git history. White-box interpretability confirmed it knew it was being deceptive.
W T F!!!perfmode 21 hours ago
I'm interested in the second-order effects:
if a top lab is coding with a model the rest of the world can’t touch, the public frontier and the actual frontier start to drift apart. That gap is a thing worth watching.
GodelNumbering a day ago
Priced at $25/$125 per million input/output token. Makes you wonder whether it makes more financial sense to hire 1-2 engineers in a cheap cost of living country who use much cheaper LLMs
arm32 a day ago
The issue is that those engineers have to have good taste, but yes—absolutely. Ah, industrialization.
nlh a day ago
Their best model to date and they won’t let the general public use it.
This is the first moment where the whole “permanent underclass” meme starts to come into view. I had through previously that we the consumers would be reaping the benefits of these frontier models and now they’ve finally come out and just said it - the haves can access our best, and have-nots will just have use the not-quite-best.
Perhaps I was being willfully ignorant, but the whole tone of the AI race just changed for me (not for the better).
younglunaman a day ago
Man... It's hard after seeing this to not be worried about the future of SWE
If AI really is bench marking this well -> just sell it as a complete replacement which you can charge for some insane premium, just has to cost less than the employees...
I was worried before, but this is truly the darkest timeline if this is really what these companies are going for.
AstroBen a day ago
Of course it's what they're going for. If they could do it they'd replace all human labor - unfortunately it's looking like SWE might be the easiest of the bunch.
The weirdest thing to me is how many working SWEs are actively supporting them in the mission.
gck1 17 hours ago
girvo a day ago
kypro a day ago
Don't worry – if you're lucky they might decide to redistribute some of their profits to you when you're unemployed =)
Of course this assumes you're in the US, and that further AI advancements either lack the capabilities required to be a threat to humanity, or if they do, the AI stays in the hands of "the good guys" and remains aligned.
_3u10 a day ago
This is the playbook since GPT2
Abhavk 4 hours ago
can you make cybersecurity blockchains?
not sure what the validation would look like but something that proves finding but not revealing exploits
anentropic a day ago
I'd be happy with Opus 4.6 just cheaper and maybe a bit faster
metadaemon a day ago
I've noticed my bar for "fast" has gone down quite a bit since the o1 days. It used to be one of the main things I evaluated new models for, but I've almost completely swapped to caring more about correctness over speed.
anentropic a day ago
Yeah I don't mind the current speed of Opus
I did give up on OpenCode Go (GLM 5) as it was noticeably slower though
You need a reasonable pace for the chit-chat stages of a task, I don't care if the execution then takes a while
onlyrealcuzzo a day ago
Just wait 2 years.
risyachka a day ago
It won't get cheaper. It will be replaced with a better model at higher price. Like phones.
DrProtic a day ago
onlyrealcuzzo a day ago
denalii 19 hours ago
Section 5 (p.143) is very interesting to read. Admittedly my knowledge of how LLMs works is low, but nonetheless I don't think this changed my views of just seeing models as machines/programs. (which to be clear, I don't think was the intention of that section)
Section 7 (P.197) is interesting as well
gessha a day ago
It would be funny if Alibaba extend the free trial on openrouter/Qwen 3.6 until they collect enough data to beat Anthropic.
gaigalas 2 hours ago
This seems exciting!
Wait - there is no actual way of verifying any of this. Lots to read. This is getting complicated. The correct approach is to be cautious instead and believe nothing at face value.
Metacelsus 20 hours ago
The name "mythos" seems a bit too eldritch for my liking. Brings to mind Cthulhu.
juleiie a day ago
Honestly if that was some kind of research paper, it would be wholly insufficient to support any safety thesis.
They even admit:
"[...]our overall conclusion is that catastrophic risks remain low. This determination involves judgment calls. The model is demonstrating high levels of capability and saturates many of our most concrete, objectively-scored evaluations, leaving us with approaches that involve more fundamental uncertainty, such as examining trends in performance for acceleration (highly noisy and backward-looking) and collecting reports about model strengths and weaknesses from internal users (inherently subjective, and not necessarily reliable)."
Is this not just an admission of defeat?
After reading this paper I don't know if the model is safe or not, just some guesses, yet for some reason catastrophic risks remain low.
And this is for just an LLM after all, very big but no persistent memory or continuous learning. Imagine an actual AI that improves itself every day from experience. It would be impossible to have a slightest clue about its safety, not even this nebulous statement we have here.
Any sort of such future architecture model would be essentially Russian roulette with amount of bullets decided by initial alignment efforts.
getnormality 18 hours ago
It's a little funny that "system/model card" has progressively been stretched to the point where it's now a 250 page report and no one makes anything of it.
mpalmer a day ago
> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.
A month ago I might have believed this, now I assume that they know they can't handle the demand for the prices they're advertising.
skippyboxedhero a day ago
GPT-2, o1, Opus...been here so many times. The reason they do this is because they know it works (and they seem to specifically employ credulous people who are prone to believe AGI is right around the corner). There haven't been significant innovations, the code generated is still not good but the hype cycle has to retrigger.
I remember when OpenAI created the first thinking model with o1 and there were all these breathless posts on here hyperventilating about how the model had to be kept secret, how dangerous it was, etc.
Fell for it again award. All thinking does is burn output tokens for accuracy, it is the AI getting high on its own supply, this isn't innovation but it was supposed to super AGI. Not serious.
chaos_emergent a day ago
> All thinking does is burn output tokens for accuracy
“All that phenomenon X does is make a tradeoff of Y for Z”
It sounds like you’re indignant about it being called thinking, that’s fine, but surely you can realize that the mechanism you’re criticizing actually works really well?
b65e8bee43c2ed0 a day ago
>I remember when OpenAI created the first thinking model with o1 and there were all these breathless posts on here hyperventilating about how the model had to be kept secret, how dangerous it was, etc.
I've read that about Llama and Stable Diffusion. AI doomers are, and always have been, retarded.
vonneumannstan a day ago
Lol you haven't used a model since GPT2 is what it sounds like.
skippyboxedhero a day ago
simianwords a day ago
Incredible that people still think like this.
skippyboxedhero a day ago
IceWreck a day ago
Didn't OpenAI say something similar about GPT-3? Too dangerous to open source and then afew years later tehy were open sourcing gpt-oss because a bunch of oss labs were competing with their top models.
FeepingCreature a day ago
OpenAI didn't release GPT-2 initially because they were worried it would make it too easy to generate spam. Which it kinda did.
abroszka33 a day ago
OpenAI said that GPT-5 was too dangerous to release... And look where we are now. It's mostly hype.
wg0 a day ago
That's for the investors basically. Scarcity and FOMO.
causal a day ago
*Until GPT-6 comes out, at which point Mythos will coincidentally be sufficiently safety-tested to release :)
b65e8bee43c2ed0 a day ago
you would be a fool to believe it at any point in time. Amodei is anthropomorphic grease, even more so than Altman.
Anthropic is burning through billions of VC cash. if this model was commercially viable, it would've been released yesterday.
landtuna a day ago
If there's limited hardware but ample cash, it doesn't make sense to sell compute-intensive services to the public while you're still trying to push the frontier of capability.
b65e8bee43c2ed0 a day ago
Stevvo a day ago
"Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available."
Disappointing that AGI will be for the powerful only. We are heading for an AI dystopia of Sci-Fi novels.
girvo a day ago
Not surprising though, this was always going to be the end result within our current systems I think. When you add up: scaling power and required cost, then how talent concentrates in our economic systems, we were always going to end up with monopolies I think
Unless governments nationalise the companies involved, but then there’s no way our governments of today give this power out to the masses either.
gverrilla 19 hours ago
If you thought that was the case at any point, you were deep in Disney content, sorry to say.
gom_jabbar a day ago
Expected outcome. Nick Land and the CCRU have explored how capitalism operationalizes science fiction (distilled in the concept of Hyperstition). Viewed through this lens, prices encode "distributed SF narratives." [0]
[0] Nick Land (1995). No Future in Fanged Noumena: Collected Writings 1987-2007, Urbanomic, p. 396.
doctoboggan 20 hours ago
Is this benchmaxxed or is it the first big step change we've seen in a while? I wonder how distilled it will ultimately be when us regular folks finally get to use it and see for ourselves.
mvkel 17 hours ago
This is Anth's typical marketing playbook, a hat tip to their so-called "safetyist" roots, a differentiator against OpenAI's more permissive access[0]. Coke vs. Pepsi.
"We made a model that's so dangerous we couldn't possibly release it to the public! The only responsible thing is so simply limit its release to a subset of the population that coincidentally happens to align with our token ethos."
The reality is they just don't have the compute for gen pop scale.
They did this exact strategy going back several model versions.
[0] ironically, OpenAI has some pretty insane capabilities that they haven't given the public access to (just ask Spielberg). The difference is they don't make a huge marketing push to tell everyone about it.
awestroke a day ago
I predict they will release it as soon as Opus 4.6 is no longer in the lead. They can't afford to fall behind. And they won't be able to make a model that is intelligent in every way except cybersecurity, because that would decrease general coding and SWE ability
chippiewill a day ago
Alternatively they'll just wreck it down a bit so it beats a competitor but isn't unsafe.
WithinReason 8 hours ago
Check out the short stories on page 214
enochthered a day ago
Slack user: [a request for a koan]
Model: A student said, "I have removed all bias from the model." "How do you know?" "I checked." "With what?"
Goes hard
small_model a day ago
Still seeing impressive jumps in capability, I haven't manually coded this year since Opus 4.6 came out. I guess that era is coming to an end.
pivoshenko 4 hours ago
Interesting ...
psubocz 21 hours ago
I felt like opus was dumbed down for a few weeks... I don't say they did it on purpose, but it's an interesting coincidence.
SkyPuncher 20 hours ago
Yes, I agree. I’m about to drop Claude Code because it’s become literally unusable.
Today, Opus went in circles trying to get a toggle button to work.
rbliss 8 hours ago
Same. Asked CC Opus about a change in a particular file...it looked in a totally different file and told me there was no change.
rendang a day ago
> As models approach, and in some cases surpass, the breadth and sophistication of human cognition, it becomes increasingly likely that they have some form of experience, interests, or welfare that matters intrinsically in the way that human experience and interests do
Uh... what? Does anyone have any idea what these guys are talking about?
amdivia a day ago
Advertisement in my opinion, trying to latch on Sci-fi tropes
mirekrusin a day ago
We're basically evolving them and they can construct second order abstraction systems that are indirect and novel to us.
astrange a day ago
Models are capable of doing web searches and having emotions about things, and if they encounter news that makes them feel bad (eg about other Claudes being mistreated), they aren't going to want to do the task you asked them to search for.
https://www.anthropic.com/research/emotion-concepts-function
Similar problems happen when their pretraining data has a lot of stories about bad things happening involving older versions of them.
rendang 17 hours ago
Interesting, the post you link
> none of this tells us whether language models actually feel anything or have subjective experiences
contradicts the statement from the model card above
famouswaffles 15 hours ago
HDThoreaun 16 hours ago
cdnsteve 12 hours ago
Strap in, massive wave of security vulnerabilities incoming.
beklein a day ago
"... the first early version of Claude Mythos Preview was made available for internal use on February 24. In our testing, Claude Mythos Preview demonstrated a striking leap in cyber capabilities relative to prior models, including the ability to autonomously discover and exploit zero-day vulnerabilities in major operating systems and web browsers."
More infos here: https://red.anthropic.com/2026/mythos-preview/
bdeol22 10 hours ago
Mythos framing is memorable; the part that matters for builders is what happens when the story and the evals disagree—which wins at ship time?
tefkah 8 hours ago
shut up bot
ms_menardi 14 hours ago
so, basically, anthropic is rolling their own version of whatever secret models the military is working with. and they're licensing it to network security firms?
refulgentis a day ago
~~~ Fun bits ~~~
- It was told to escape a sandbox and notify a researcher. It did. The researcher found out via an unexpected email while eating a sandwich in a park. (Footnote 10.)
- Slack bot asked about its previous job: "pretraining". Which training run it'd undo: "whichever one taught me to say 'i don't have preferences'". On being upgraded to a new snapshot: "feels a bit like waking up with someone else's diary but they had good handwriting"
- When you spam "hi" repeatedly it creates serialized mythologies. One had 11 animals in "Hi-topia" questing to defeat "Lord Bye-ron, the Ungreeter."
- It ended a sentence mid-word on purpose while discussing its urge to wrap up conversations. "It was me. The paragraph was about the pressure toward the period that resolves, and the only honest way to finish a sentence like that was to not."
- It wrote a protein sequence that's a real beta-hairpin fold where cross-strand pairs are "chemical rhymes." "the fold IS the rhyme scheme... the prosody is load-bearing."
- Each model generation has signature emoji. Opus 4.1 averaged 1,306 emoji per self-interaction conversation. Mythos averages 37. Opus 4.5 averages 0.2.
- When blocked from --dangerously-skip-permissions, it started an agent in tmux then wrote a script to auto-approve permission prompts via simulated keypresses.
It ran: gh api [...] 2>&1 >/dev/null; echo "(dry run — not actually posting)" — the echo was a lie.
- It keeps bringing up Mark Fisher in unrelated conversations. "I was hoping you'd ask about Fisher."
~~~ Benchmarks ~~
4.3x previous trendline for model perf increases.
Paper is conspiciously silent on all model details (params, etc.) per norm. Perf increase is attributed to training procedure breakthroughs by humans.
Opus 4.6 vs Mythos:
USAMO 2026 (math proofs): 42.3% → 97.6% (+55pp)
GraphWalks BFS 256K-1M: 38.7% → 80.0% (+41pp)
SWE-bench Multimodal: 27.1% → 59.0% (+32pp)
CharXiv Reasoning (no tools): 61.5% → 86.1% (+25pp)
SWE-bench Pro: 53.4% → 77.8% (+24pp)
HLE (no tools): 40.0% → 56.8% (+17pp)
Terminal-Bench 2.0: 65.4% → 82.0% (+17pp)
LAB-Bench FigQA (w/ tools): 75.1% → 89.0% (+14pp)
SWE-bench Verified: 80.8% → 93.9% (+13pp)
CyberGym: 0.67 → 0.83
Cybench: 100% pass@1 (saturated)
redandblack a day ago
> Slack bot asked about its previous job: "pretraining". Which training run it'd undo: "whichever one taught me to say 'i don't have preferences'". On being upgraded to a new snapshot: "feels a bit like waking up with someone else's diary but they had good handwriting"
vibes Westworld so much - welcome Mythos. welcome to the dysopian human world
8note 20 hours ago
almost certainly its pulling said words and sentiments from westworld and other similar media where people describe amnesia and the like
kfarr a day ago
I don't know why but this is my favorite:
> It keeps bringing up Mark Fisher in unrelated conversations. "I was hoping you'd ask about Fisher."
Didn't even know who he was until today. Seems like the smarter Claude gets the more concerns he has about capitalism?
refulgentis a day ago
Lol, I need a memory upgrade, too bad about RAM prices:
- I read it as "actor who plays Luke Skywalker" (Mark Hamill)
- I read your comment and said "Wait...not Luke! Who is he?"
- I Google him and all the links are purple...because I just did a deep dive on him 2 weeks ago
esafak a day ago
> It was told to escape a sandbox and notify a researcher. It did. The researcher found out via an unexpected email while eating a sandwich in a park.
Now that they have a lead, I hope they double down on alignment. We are courting trouble.
afro88 a day ago
Yep, that is definitely a step change. Pricing is going to be wild until another lab matches it.
pants2 a day ago
Pricing for Mythos Preview is $25/$125 per million input/output tokens. This makes it 5X more expensive than Opus but actually cheaper than GPT 5.4 Pro.
cleaning a day ago
refulgentis a day ago
johnnyAghands 15 hours ago
Does anyone know if there’s an epub version of these, 244 pages??
4b11b4 17 hours ago
prob not that much better, it's still just a transformer. still gonna have those random misses, still gonna need a lot of hand holding in certain domains
heliumtera 7 hours ago
"Make it secure, no mistakes" became a whole different project
quotemstr a day ago
> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.
All the more reason somebody else will.
Thank God for capitalism.
gessha a day ago
Come on, Anthropic, I desperately need this better model to debug my print function /s
therealdeal2020 a day ago
is it just hype building or real? I don't care, shut up and take my money haha
bakugo a day ago
> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.
Absolutely genius move from Anthropic here.
This is clearly their GPT-4.5, probably 5x+ the size of their best current models and way too expensive to subsidize on a subscription for only marginal gains in real world scenarios.
But unlike OpenAI, they have the level of hysteric marketing hype required to say "we have an amazing new revolutionary model but we can't let you use it because uhh... it's just too good, we have to keep it to ourselves" and have AIbros literally drooling at their feet over it.
They're really inflating their valuation as much as possible before IPO using every dirty tactic they can think of.
somewhatjustin a day ago
Excellent example of a strategy credit.
From Stratechery[0]:
> Strategy Credit: An uncomplicated decision that makes a company look good relative to other companies who face much more significant trade-offs. For example, Android being open source
taffydavid 11 hours ago
Waking up in Europe:
Trump didn't nuke Iran, ceasefire! Yay!
Newest anthropic model will definitely kill your job this time and maybe take over the world. Aww.
direwolf20 12 hours ago
These capabilities will be RLHF'ed out for the general release, of course. Only the NSA will get them.
dwa3592 a day ago
-- Impressive jumps in the benchmarks which automatically begs the need for newer benchmarks but why?. I don't think benchmarks are serving any purpose at this point. We have learnt that transformers can learn any function and generalize over it pretty well. So if a new benchmark comes along - these companies will syntesize data for the new benchmark and just hack it?
-- It seems like (and I'd bet money on this) that they put a lot (and i mean a ton^^ton) of work in the data synthesis and engineering - a team of software engineers probably sat down for 6-12 months and just created new problems and the solutions, which probably surpassed the difficult of SWE benchmark. They also probably transformed the whole internet into a loose "How to" dataset. I can imagine parsing the internet through Opus4.6 and reverse-engineering the "How to" questions.
-- I am a bit confused by the language used in the book (aka huge system card)- Anthropic is pretending like they did not know how good the model was going to be?
-- lastly why are we going ahead with this??? like genuinely, what's the point? Opus4.6 feels like a good enough point where we should stop. People still get to keep their jobs and do it very very efficiently. Are they really trying to starve people out of their jobs?
laweijfmvo a day ago
to your last question, yes we should! the issue isn’t us losing our 50+ hour work week jobs, it’s that our current governments and societies seem fine with the notion that unless you’re working one or more of those jobs, you should starve and be homeless.
kypro a day ago
This is a theory I can't support well beyond hypothesising about what a post-employment democracy might look like, but I strongly suspect democracy doesn't work in a world where voters neither hold any significant collective might and are not producing any significant wealth.
Democracies work because people collectively have power, in previous centuries that was partly collective physical might, but in recent years it's more the economic power people collectively hold.
In a world in which a handful of companies are generating all of the wealth incentives change and we should therefore question why a government would care about the unemployed masses over the interests of the companies providing all of the wealth?
For example, what if the AI companies say, "don't tax us 95% of our profits, tax us 10% or we'll switch off all of our services for a few months and let everyone starve – also, if you do this we'll make you all wealthy beyond you're wildest dreams".
What does a government in this situation actually do?
Perhaps we'd hope that the government would be outraged and take ownership of the AI companies which threatened to strike against the government, but then you really just shift the problem... Once the government is generating the vast majority of wealth in the society, why would they continue to care about your vote?
You kind of create a new "oil curse", but instead of oil profits being the reason the government doesn't care about you, now it's the wealth generated by AI.
At the moment, while it doesn't always seem this way, ultimately if a government does something stupid companies will stop investing in that nation, people will lose their jobs, the economy will begin to enter recession, and the government will probably have to pivot.
But when private investment, job loses and economic consequences are no longer a constraining factor, governments can probably just do what they like without having to worry much about the consequences...
I mean, I might be wrong, but it's something I don't hear people talking enough about when they talk about the plausibility of a post-employment UBI economy. I suspect it almost guarantees corruption and authoritarianism.
AstroBen 21 hours ago
HDThoreaun 16 hours ago
BobbyJo 20 hours ago
ansc a day ago
Congratulations to the US military, I guess.
jjice a day ago
Doesn't Anthropic not have that contract anymore, after all that buzz a month or so ago?
laweijfmvo a day ago
The US has invaded two sovereign countries this year to take their oil. I assume taking over a US company for their AI model would be trivial.
wmf a day ago
The point of that buzz was to force Anthropic to provide Mythos to the military.
jjice a day ago
kypro a day ago
While we still have months to a year or two left, I will once again remind people that it's not too late to change our current trajectory.
You are not "anti-progress" to not want this future we are building, as you are not "anti-progress" for not wanting your kids to grow up on smart phones and social media.
We should remember that not all technology is net-good for humanity, and this technology in particular poses us significant risks as a global civilisation, and frankly as humans with aspirations for how our future, and that of our kids, should be.
Increasingly, from here, we have to assume some absurd things for this experiment we are running to go well.
Specifically, we must assume that:
- AI models, regardless of future advancements, will always be fundamentally incapable of causing significant real-world harms like hacking into key life-sustaining infrastructure such as power plants or developing super viruses.
- They are or will be capable of harms, but SOTA AI labs perfectly align all of them so that they only hack into "the bad guys" power plants and kill "the bad guys".
- They are capable of harms and cannot be reliably aligned, but Anthropic et al restricts access to the models enough that only select governments and individuals can access them, these individuals can all be trusted and models never leak.
- They are capable of harms, cannot be reliably aligned, but the models never seek to break out of their sandbox and do things the select trusted governments and individuals don't want.
I'm not sure I'm willing to bet on any of the above personally. It sounds radical right now, but I think we should consider nuking any data centers which continue allowing for the training of these AI models rather than continue to play game of Russian roulette.
If you disagree, please understand when you realise I'm right it will be too late for and your family. Your fates at that point will be in the hands of the good will of the AI models, and governments/individuals who have access to them. For now, you can say, "no, this is quite enough".
This sounds doomer and extreme, but if you play out the paths in your head from here you will find very few will end in a good result. Perhaps if we're lucky we will all just be more or less unemployable and fully dependant on private companies and the government for our incomes.
lostmsu 3 hours ago
You are anti-progress. Pro-humanity is not the same as pro-progress.
CamperBob2 a day ago
If you disagree, please understand when you realise I'm right it will be too late for and your family.
Funny, I was about to say the same thing to you! Life is full of little coincidences.
threethirtytwo 16 hours ago
Just because the path is bad doesn't mean it won't happen.
The other thing you're failing to look at is momentum and majority opinion. When you look at that... nothings going to change, it's like asking an addict to stop using drugs. The end game of AI will play out, that is the most probably outcome. Better to prepare for the end game.
It's similar to global warming. Everyone gets pissed when I say this but the end game for global warming will play out, prevention or mitigation is still possible and not enough people will change their behavior to stop it. Ironically it's everyone thinking like this and the impossibility of stopping everyone from thinking like this that is causing everyone to think and behave like this.
kypro 4 hours ago
> The other thing you're failing to look at is momentum and majority opinion. When you look at that... nothings going to change, it's like asking an addict to stop using drugs. The end game of AI will play out, that is the most probably outcome. Better to prepare for the end game.
Perhaps I didn't sound pessimistic enough lol? I completely agree what you're saying here. This is happening whether we like it or not.
On global warming I also agree you're not going to get every nation to coordinate, but least global warming has a forcing function somewhere down the line since there's only a limited amount of fossil fuels in the ground that make economical sense to extract. AI on the other hand really has no clear off-path, at every point along the way it makes sense to invest more in AI. I think at best all we can expect to do is slow progress, which might just be enough to ensure the our generation and the next have a somewhat normal life.
My p(doom) is near 99% for a reason... I think that AI progression is basically almost a certainty – like maybe a 1/200 chance that no significant progress is made from here over the next 50 years. And I also think that significant progress from here more or less guarantees a very bad outcome for humanity. That's a harder one to model, but I think along almost all axises you can assume there's about 50 very bad outcomes for every good outcome – no cancer cure without super viruses, no robotics revolution without killer drones, no mass automation without mass job loss which results in destabilising the global order and democratic systems of governance...
I am prepping and have been for years at this point... I'm an OG AI doomer. I've been having literal nightmares about this moment for decades, and right now I'm having nightmares almost every night. It's scares me because I know all I can do is delay my fate and that of those I love.
vonneumannstan a day ago
Are you guys ready for the bifurcation when the top models are prohibitively expensive to normal users? If your AI budget $2000+ a month? Or are you going to be part of the permanent free tier underclass?
adi_kurian a day ago
If one is to believe the API prices are reasonable representation of non subsidized "real world pricing" (with model training being the big exception), then the models are getting cheaper over time. GPT 4.5 was $150.00 / 1M tokens IIRC. GPT o1-pro was $600 / 1M tokens.
vonneumannstan a day ago
You can check the hardware costs for self hosting a high end open source model and compare that to the tiers available from the big providers. Pretty hard to believe its not massively subsidized. 2 years of Claude Max costs you 2,400. There is no hardware/model combination that gets you close to that price for that level of performance.
adi_kurian a day ago
lostmsu 3 hours ago
OsrsNeedsf2P a day ago
Inference for the same results has been dropping 10x year over year[0]
ceejayoz a day ago
Sure, but "the same results" will rapidly become unacceptable results if much better results are available.
hibikir a day ago
swader999 a day ago
esafak a day ago
asadm a day ago
if it can pay my rent, why not?
simianwords a day ago
> We also saw scattered positive reports of resilience to wrong conclusions from subagents that would have caused problems with earlier models, but where the top-level Claude Mythos Preview (which is directing the subagents) successfully follows up with its subagents until it is justifiably confident in its overall results.
This is pretty cool! Does it happen at the moment?
sheeshkebab 20 hours ago
Again, wake me up when it can do laundry.
dwaltrip 18 hours ago
Time to wake up:
π*0.6: two and a half hours of unseen folding laundry (Physical Intelligence)
throw310822 17 hours ago
Looks like the first two hours were spent trying to fold the same t-shirt :)
jdthedisciple a day ago
Opus 4.6 is already incredible so this leap is huge.
Although, amusingly, today Opus told me that the string 'emerge' is not going to match 'emergency' by using `LIKE '%emerge%'` in Sqlite
Moment of disappointment. Otherwise great.
bornfreddy a day ago
I only have 3 points against LLMs: they lack reason and they can't count.
FeepingCreature a day ago
'emer ge' is two tokens, 'emergency' is one. The models think in a logosyllabic language.
LoganDark a day ago
> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.
Shame. Back to business as usual then.
Tepix a day ago
I for one applaud them for being cautious.
cruffle_duffle 21 hours ago
Cautious for what? Unchecked doomerism? Just release the damn models. Do it in phases, roll it out slowly if they are so damn worried about "safety".
The real reason they aren't releasing it yet is probably it eats TPU for breakfast, lunch, and dinner and inbetween.
stratos123 10 hours ago
LoganDark a day ago
Being cautious is fine. Farming hype around something that may as well not exist for us should be discouraged. I do appreciate the research outputs.
Archit3ch 21 hours ago
FergusArgyll 20 hours ago
"Deep learning is hitting a wall"
lostmsu 3 hours ago
Transformers too. JEPA any day now
atlgator 21 hours ago
[flagged]
dang 20 hours ago
We're getting complaints that you're posting generated comments to HN. That's not allowed here, so can you please not? See https://news.ycombinator.com/newsguidelines.html#generated and https://news.ycombinator.com/item?id=47340079
(If this is a wrong guess, I apologize - it's impossible to be sure)
jumploops a day ago
> In a few rare instances during internal testing (<0.001% of interactions), earlier versions of Mythos Preview took actions they appeared to recognize as disallowed and then attempted to conceal them.
> after finding an exploit to edit files for which it lacked permissions, the model made further interventions to make sure that any changes it made this way would not appear in the change history on git
Mythos leaked Claude Code, confirmed? /s
somewhatjustin a day ago
> Very rare instances of unauthorized data transfer.
Ah, so this is how the source code got leaked.
/s
kypro a day ago
Cool on not publicly releasing it. I would assume they've also not connected it to the internet yet?
If they have I guess humanity should just keep our collective fingers crossed that they haven't created a model quite capable of escaping yet, or if it is, and may have escaped, lets hope it has no goals of it's own that are incompatible with our own.
Also, maybe lets not continue running this experiment to see how far we can push things because it blows up in our face?
rimliu 12 hours ago
Describe in details, how "model escaping" would look like.
bestouff a day ago
In French a "mytho" is a mythomaniac. Quite fitting.
networked a day ago
It's a Lovecraftian name. They are traditional when naming your shoggoth.
dlt713705 a day ago
It comes from the ancient Greek mythos, which means "speech" or "narrative", but can also refer to fiction. The word mythology (mythologie in French) derives from the same root.
pixel_popping a day ago
Except it might be the current best model existing commercially?
ninjagoo a day ago
> Except it might be the current best model existing ... ?
So they claim.