Arm's Cortex X925: Reaching Desktop Performance (chipsandcheese.com)
253 points by ingve 16 hours ago
Incipient 14 hours ago
Without being a cpu geek, a lot of the branch prediction details go over my head, however generally a good review. I liked the detail of performance on more complex workloads where IPC can get muddy when you need more instructions.
I feel these days however, for any comparison of performance, power envelope needs to be included (I realise this is dependent on the final chip)
adrian_b 13 hours ago
ARM Cortex-X925 achieves indeed a very good IPC, but it has competitive performance only in general-purpose applications that cannot benefit from using array operations (i.e. the vector instructions and registers). The results shown in the parent article for the integer tests of SPEC CPU2017 are probably representative for Cortex-X925 when running this kind of applications.
While the parent article shows AMD Zen 5 having significantly better results in floating-point SPEC CPU2017, these benchmark results are still misleading, because in properly optimized for AVX-512 applications the difference between Zen 5 and Cortex-X925 would be much greater. I have no idea how SPEC has been compiled by the author of the article, but the floating-point results are not consistent with programs optimized for Zen 5.
One disadvantage of Cortex-X925 is having narrower vector instructions and registers, which requires more instructions for the same task and it is only partially compensated by the fact that Cortex-X925 can execute up to 6 128-bit instructions per clock cycle (vs. up to 4 vector instructions per clock cycle for Intel/AMD, but which are wider, 256-bit for Intel and up to 512-bit for Zen 5). This has been shown in the parent article.
The second disadvantage of Cortex-X925 is that it has an unbalanced microarchitecture for vector operations. For decades most CPUs with good vector performance had an equal throughput for fused multiply-add operations and for loads from the L1 cache memory. This is required to ensure that the execution units are fed all the time with operands in many applications.
However, Cortex-X925 can do at most 4 loads, while it can do 6 FMAs. Because of this lower load throughput Cortex-X925 can reach the maximum FMA throughput only much less frequently than the AMD or Intel CPUs. This is compounded by the fact that achieving better FMA to load ratios requires more storage space in the architectural vector registers, and Cortex-X925 is also disadvantaged for this, by having 4-time smaller vector registers than Zen 5.
my123 11 hours ago
> While the parent article shows AMD Zen 5 having significantly better results in floating-point SPEC CPU2017, these benchmark results are still misleading, because in properly optimized for AVX-512 applications the difference between Zen 5 and Cortex-X925 would be much greater. I have no idea how SPEC has been compiled by the author of the article, but the floating-point results are not consistent with programs optimized for Zen 5.
The arithmetic intensity of most SPECfp subtests is quite low. You see this wall because it ends up reaching bandwidth limitations long before running out of compute on cores with beefy SIMD.
hajile 3 hours ago
SIMD workloads on CPU tend to be bursty. If your workload is all SIMD with few other instructions or branches, it's almost certainly going to be faster on a GPU or SME co-processor.
If there's space between the SIMD instructions, then double-pumping or even quad-pumping isn't very expensive (and with 6 SIMD ports, it might even be basically free).
DeathArrow 12 hours ago
Still, what percentage of software uses AVX512 for its core functionality, so vector performance matters in practice?
galangalalgol 12 hours ago
CyberDildonics 8 hours ago
I don't know where the focus on vector instructions comes from. 6 128-bit instructions per clock is not bad at all. 512 bit wide vector instruction being used are exotic.
What most people want is interactivity and fast web pages which doesn't have much to do with wide vector instructions (except possibly for optimized video decoding).
barrkel 9 hours ago
In my view, power consumption isn't relevant to a desktop or workstation (and increasingly, desktop machines are workstations since almost everyone uses laptops instead). When I'm plugged into a wall socket, I will take performance over efficiency at every decision point. Power consumption matters to the degree that the resulting heat needs to be dissipated, and if you can't get rid of the heat fast enough, you lose performance.
rbanffy an hour ago
There is a whole universe of good-enough desktop computers that doesn't care that much about performance, but where power consumption is important, because it makes the computer bulky, noisy, and expensive.
I'd love to have a Xeon 6, a big EPYC, or an AmpereOne (or a loaded IBM LinuxOne Express) as my daily driver, but that's just not something I can justify. It'd not be easy to come up with something for all this compute capacity to do. A reasonable GPU is a much better match for most of my workloads, which aren't even about pushing pixels anymore - iGPUs are enough these days - but multiplying matrices with embarrassingly low precision, so it can pretend to understand programming tasks.
dinglo 14 hours ago
If ARM starts dominating in desktop and laptop spaces with a quite different set of applications, might we start seeing more software bugs around race conditions? Caused by developers writing software with X86 in mind, with its differing constraints on memory ordering.
vardump 13 hours ago
That's a possibility. Some code still assumes (without realizing!) x86 style ordered loads and stores. This is called a strong memory model, specifically TSO, Total Store Order. If you tell x86 to execute "a=1; b=2;", it will always store value to 'a' first. Of course compilers might reorder stores and loads, but that's another matter.
ARM is free to reorder stores and loads. This is called a weak memory model. So unless it's explicitly told to the compiler, like C++ memory_order::acquire and memory_order::release, you might get invalid behavior. Heisenbugs in the worst case.
IshKebab 2 hours ago
I think that's less likely than you'd expect because the memory ordering model used by C++ and others essentially requires you to write code that works even without x86's total storage order. If you don't then you can get bugs even on x86, because the compiler will violate the ordering you thought you had in your program, even if the CPU doesn't.
Also most software runs on ARM now and I don't think that has actually happened in practice.
rbanffy an hour ago
> Also most software runs on ARM now and I don't think that has actually happened in practice.
At least in my house, ARM cores outnumber x86 cores by at least four to one. And I'm not even counting the 32-bit ARM cores in embedded devices.
There is a lot of space for memory ordering bugs to manifest in all those devices.
dd_xplore 14 hours ago
The major issue is these days most software is electron based or a webapp. I miss the days of 98/XP, where you'd find tons of desktop software. A PC actually felt something that had a purpose. Even if you spin up a XP/98(especially 98/2000 VM) now, you'd see the entire OS feels something that you can spend some time on. Nowadays most PCs feel like a random terminal where I open the browser and do some basic work(except for gaming ofcourse). I really hate the UX of win 11 , even 10 isn't much better compared to XP. I really hope we go back to that old era.
rbanffy an hour ago
> Nowadays most PCs feel like a random terminal
It's a fun perception. For the longest time, all the "serious" computers were used through networks and terminals and didn't even come with any ability to connect a monitor or a keyboard (although a serial terminal would work as the system console). I used to joke (usually looking at Unisys Windows-based big servers), if the computer had VGA and PS/2 ports, it wasn't a computer, but a toy. Those Unisys servers weren't toys, but you could run Pinball and Minesweeper directly on them, which kind of said otherwise.
I think we got used to such levels of platform bloat that we don't care if the UI toolkit these days is bigger than the entire operating system that runs 95% of the world's payment transactions.
cmrdporcupine 11 hours ago
This is actually one reason I feel like developing my systems level stuff on ARM64 instead of x86 (I have a DGX Spark box) is not a bad idea. Building lower level concurrent data structures, etc. it just seems wiser to have to deal with this more immanently.
That said, I've never actually run into one of these issues.
Zardoz84 12 hours ago
If it is programmed in assembly. This kind of nasty detail should be handled by the compilers.
askl 12 hours ago
If it's programmed in assembly, it just wont compile for a different architecture.
runeks 14 hours ago
Wouldn't the compiler take care of producing the correct machine code?
octachron 13 hours ago
The issue is that the C memory model allows more behaviours than the memory model of x86-64 processors. You can thus write code which is incorrect according to the C language specification but will happen to work on x86-64 processors. Moving to arm64 (with its weaker memory model than x86-64) will then reveal the latent bug in your program.
rbanffy an hour ago
Someone 12 hours ago
mrweasel 12 hours ago
OpenBSD famously keeps a lot of esoteric platforms around, because running the same code on multiple architectures reveal a lot of bugs. At least that was one of the arguments previously.
lproven 6 hours ago
mhh__ 14 hours ago
The compiler relies on the language and programmer to enforce and follow a memory consistency model
ivolimmen 14 hours ago
If you go around your OS yes that could be the case but you can already have issues using the application from machine to machine with the same OS having different amounts of RAM and different CPU's. But I am not an expert in these matters.
jordiburgos 13 hours ago
Only for the hand-written assemply parts of the source code. The rest will be handled by the compilers.
bpye 13 hours ago
You don't need to be writing assembly. Anything sharing memory between multiple threads could have bugs with ARM's memory model, even if written in C, C++, etc.
silon42 13 hours ago
Not even close. Except maybe in Rust /s
galangalalgol 12 hours ago
pdpi 14 hours ago
Kind of weird to see an article about high-performance ARM cores without a single reference to Apple or how this hardware compares to M4 or M5 cores.
ezst 14 hours ago
That would only matter (to me, at least) if those Apple chips were propping up an open platform that suits my needs. As things stand today, procuring an M chip represents a commitment to the Apple software ecosystem, which Apple made abundantly clear doesn't optimize for user needs. Those marginally faster CPU cycles happen on a time scale that anyway can't offset the wasted time fighting MacOS and re-building decades-long muscle memory, so thanks but no thanks.
pdpi 13 hours ago
Sure. Insofar as Apple Silicon beats these things, "I'll take less powerful hardware if it means I'm not stuck with the Apple ecosystem" is a perfectly reasonable tradeoff to make. Two things, though.
First, I don't like making blind tradeoffs. If what I need (for whatever reason) is a really beefy ARM CPU, I'd like to know what the "Apple-less tax" costs me (if anything!)
Second, the status quo is that Apple Silicon is the undisputed king of ARM CPU performance, so it's the obvious benchmark to compare this thing against. Providing that context is just basic journalistic practice, even if just to say "but it's irrelevant because we can't use the hardware without the software".
rbanffy an hour ago
bluGill 12 hours ago
jayd16 8 hours ago
__alexs 7 hours ago
guerrilla 5 hours ago
amelius 12 hours ago
flembat 13 hours ago
When purchasing any ARM based computer a key question for me, is how many of those can I purchase for the cost of a Mac mini, and how many Mac mini can I purchase for the cost of that, and does that have working drivers...
ezst 12 hours ago
synergy20 7 hours ago
totally true. for me it's unless until those apple hardware can run linux first-class, till then it's irrelevant. sad to say this but macos sucks.
truelinux1 7 hours ago
This echoed my thoughts exactly - Linux only.
tucnak 13 hours ago
FWIW, Apple Virtualization framework is fantastic, and Rosetta 2 is unmatched on other Arm desktops where QEMU is required. For example, you can get Vivado working on Debian guest, macOS host trivially like that.
ezst 12 hours ago
drzaiusx11 10 hours ago
upcoming-sesame 12 hours ago
still matters as a benchmark imo
renewiltord 12 hours ago
Last time I tried, getting Linux working on Apple Silicon actually worked better than on Qualcomm ARM machine (which only support strange Windows).
drzaiusx11 10 hours ago
spiderfarmer 13 hours ago
> represents a commitment to the Apple software ecosystem
I don't see how that's holding you back from using these tools for your work anymore than using a Makita power tool with LXT battery pack.
ezst 12 hours ago
atwrk 13 hours ago
Those are of almost zero use for people wishing to run Linux etc.
Yes, Asahi exists, and props to the developers, but I don't think I'm alone in being unwilling to buy hardware from a manufacturer who obviously is not interested in supporting open operating systems
promiseofbeans 12 hours ago
I mean… Apple went out of their way to build a GUI OS picker that supports custom names and icons into their boot loader.
So they don’t actively help (or event make it easy by providing clear docs), but they do still do enough to enable really motivated people
amelius 13 hours ago
Apple does not produce general purpose computing parts.
This is an industry blog, not a consumer oriented blog.
hajile 3 hours ago
Chips and Cheese covers Apple products in a LOT of their posts.
The real reason is probably because they are supported by patrons and can only get new equipment to review when people donate (either money or sometimes the hardware itself).
If you like what they do (as pretty much the last in-depth hardware reviewers), consider supporting them.
charcircuit 11 hours ago
M4 and M5 are literally general purpose computing parts. Apple literally owns the most profitable general purpose computing platform with the iPhone.
senko 10 hours ago
layer8 9 hours ago
GoblinSlayer 8 hours ago
SG- 14 hours ago
Same, I wish Chips and Cheese would compare some of these cores to Apple Silicon, especially in this case where they're talking about another ARM core.
A few years ago they were writing articles about Apple Silicon.
GeekyBear 6 hours ago
It does make me miss the deep dives for new core designs from Anandtech.
Running the SPEC benchmark interger and floating piitnt suites takes all day, but it's hard to game a benchmark with that much depth.
It's a shame that nobody has been willing to offer that level of detail.
geerlingguy 8 hours ago
Chips and Cheese focuses on architecture and chip design, and I think a lot of the tooling is less refined on macOS, so the comparison graphs can't quite get the same depth on Apple's chips. That's just a guess.
But I did some comparisons when I tested the same Dell GB10 hardware late last year: https://www.jeffgeerling.com/blog/2025/dells-version-dgx-spa...
hank808 6 hours ago
They are talking specifically about ARM cores designed by and licensable from ARM Holdings (the company), not other designs that don't use ARM's designs (like the Apple silicon).
close04 5 hours ago
They repeatedly compare to Intel and AMD cores though, which are x86. If they’re worth a mention, then so are some of the other ARM consumer desktop chips on the market regardless of who designed them. Apple was one of the closest ARM chips they could have compared to.
Your “specifically ARM cores designed by and licensable from ARM Holdings” argument doesn’t hold any water.
DeathArrow 12 hours ago
>Kind of weird to see an article about high-performance ARM cores without a single reference to Apple
And Qualcomm.
cubefox 11 hours ago
Kind of weird that you pick Apple CPU cores when Qualcomm cores would be a far more appropriate comparison.
llm_nerd 12 hours ago
The core they're talking about was released about two years ago. nvidia stuck it on their grace blackwell (e.g. DGX Spark) as basically a coordinator on the system.
Anyway, here it is in GB10 form-
https://browser.geekbench.com/v6/cpu/14078585
And here is a comparable M5 in a laptop-
https://browser.geekbench.com/macs/macbook-pro-14-inch-2025
M5 has about a 32% per core advantage, though the DGX obviously has a much richer power budget so they tossed in 10 high performance cores and 10 efficiency cores (versus the 4 performance and 6 efficiency in the latter). Given the 10/10 vs 4/6 core layouts I would expect the former to massively trounce the latter on multicore, while it only marginally does.
Samsung used the same X925 core in their Exynos 2500 that they use on a flip phone. Mediatek put it in a couple of their chips as well.
"Reaching desktop" is always such a weird criteria though. It's kind of a meaningless bar.
drzaiusx11 9 hours ago
Afaict the "desktop" target is meaningless these days. Desktops aren't really a thing anymore in the general sense are they? Only folks I know still hanging on to desktop hardware are gamers and even those I see going by the wayside with external video cards becoming more reliable.
"Daily driver" is probably a better term, but everyone's daily usage patterns will vary. I could do my day job with a VT100 emulator on a phone for example.
ThrowawayR2 6 hours ago
wmf 6 hours ago
KingOfCoders 11 hours ago
Perhaps you're not the target audience of the article.
Numerlor 11 hours ago
Apple doesn't expose the kind of introspection necessary to compare with the data the article is about. Any mention would just be about Apple's chips existing and being better
hrmtst93837 12 hours ago
You make a valid point; Apple has indeed set a high standard for ARM cores in performance. A comparison with their M4 and M5 cores would provide valuable context for these new developments.
dgacmu 12 hours ago
Most of your comment history reads like LLM generated trite comments. Are you human?
hrmtst93837 11 hours ago
xarope 13 hours ago
I can't seem to find any power draw or efficiency figures (e.g. <perf>/watts).
Only found this which talks about performance-per-area (PPA) and performance-per-clock ()I assume cycle) (PPC): https://www.reddit.com/r/hardware/comments/1gvo28c/latest_ar...
wmf 6 hours ago
We should have N1X vs. X2 vs. M5 laptop battery life reviews in a few months.
phkahler 9 hours ago
Nor do they say what process it's fabricated with.
voidmain0001 10 hours ago
Already usurped by Arm C1 Ultra.
https://www.androidauthority.com/arm-c1-cpu-mali-g1-gpu-deep...
adgjlsfhk1 9 hours ago
The C1 Ultra looks really powerful. 128 kb L1D cache on it's own is a ~10% IPC improvement that should let it pull firmly ahead of the x86 competition which is very stuck at 32kb due to the legacy 4k page size.
joha4270 8 hours ago
I'm sorry, I'm clearly missing something but why would page size impact L1 cache size?
aseipp 7 hours ago
adgjlsfhk1 8 hours ago
throwaway85825 9 hours ago
Why would I care about desktop performance without the PC desktop ecosystem where everything 'just works'? Universal ARM linux distros aren't supported by anything.
guerrilla 5 hours ago
Why would you not be able to build a PC around it? That's what you do with PowerPC.
Supersaiyan_IV 12 hours ago
Another good read is about ARM's SVE2 extensions: https://gist.github.com/zingaburga/805669eb891c820bd220418ee...
It has some interesting conclusions, such as that it covers certain AVX512 gaps:
"AVX512 plugs many of the holes that SSE had, whilst SVE2 adds more complex operations (such as histogramming and bit permutation), and even introduces new ‘gaps’ (such as 32/64-bit element only COMPACT, no general vector byte left-shift, non-universal predication etc)."
And also that rusty x86 developers might face skill issues:
"Depending on your application, writing code for SVE2 can bring about new challenges. In particular, tailoring fixed-width problems and swizzling data around vectors may become much more difficult when the length is unknown."
rayiner 9 hours ago
ARM designs are effectively paper launches. You get these press releases saying the new ARM matches Apple and AMD, but its years before you can buy a product with it. Google Pixels that came out in the fall are still on the X4, which was introduced in 2023. At this rate, Pixel 11 will launch with X925, which is an Apple A17/M3 tier core, when Apple is on the A20: https://wccftech.com/apple-a20-and-a20-pro-all-technological.... Outsourcing the core design creates a major lag in product availability.
ac29 8 hours ago
> ARM designs are effectively paper launches. You get these press releases saying the new ARM matches Apple and AMD, but its years before you can buy a product with it.
This is an article testing shipping hardware you can buy today.
Symmetry 7 hours ago
Yeah, the paper launch OP is talking about happened way back in May 2024.
aseipp 7 hours ago
I feel like that was much more true in the past but the X925 was only spec'd 18 months ago(?) and you can buy it today (I'm using one since October). Intel and AMD also give lots of advance notice on new designs well ahead of anything you can buy. ARM is also moving towards providing completely integrated solutions, so customers like Samsung don't have to take only CPU core and fill in the blanks themselves. They'll probably only get better at shipping complete solutions faster.
Honestly, Apple is the strange one because they never discuss CPUs until they are available to buy in a product; they don't need to bother.
rbanffy 32 minutes ago
> ARM designs are effectively paper launches.
Won't ARM have validation silicon available to their licensees?
my123 an hour ago
Google outright has worst in class SoCs on both CPU and GPU unfortunately.
If you want something more perf competitive, pick Dimensity, Exynos, or Snapdragon.
hajile 3 hours ago
This core was released in the MediaTek 9400 in October 2024 some 16 months ago.
The successor of x925 is C1 Ultra and even that was released 6 months ago in September 2025 with the MediaTek 9500 and GeekerWan even has a phone review they did with that chip last year.
exabrial 5 hours ago
Hoping someday we can get ARM System76 laptops that meet Apple M* chip performance.
jadbox 5 hours ago
For most, it doesn't need to 'meet' Apple's performance. It just needs to be competitive to general hardware of around the -the same price point- category. This is the same problematic statement I hear that a ~$1500 PC laptop just isn't as good as a ~$3000 macbook.
megous 4 hours ago
BTW, does anyone have some pointers to where one can find an oldish in-order Cortex-A core (like A53) in verilog RTL form? I know ARM must give this out to companies that implement ARM based SoCs for eg. purpose of validation on FPGA.
So far I've only found various M cores online. It would be fun to have something to experiment with on a cheapish FPGA like Kintex XC7-K480T, that may have enough resources for some in-order A core, and can be had for $50 or so.
wmf an hour ago
Arm lawyers have the RTL locked down tight. If you find it, it means you are already dead.
adgjlsfhk1 4 hours ago
You're going to have a much better time finding RiscV cores.
megous 4 hours ago
Yeah, I don't need help with that one. :)
sylware 13 hours ago
But with hardware IP locks like x86_64.
Better favor as much as possible RISC-V implementations.
But, I don't know if there are already good modern-desktop-grade RISC-V implementations (in the US, Sifive is moving fast as far as I know)... and the hard part: accessing the latest and greatest silicon process of TMSC, aka ~5GHz.
Those markets are completely saturated, namely at best, it will be very slow unless something big does happen: for instance AMD adapts its best micro-architecture to RISC-V (ISA decoding mostly), etc.
And if valve start to distribute a client with a strong RISC-V game compilation framework...
dmitrygr 5 hours ago
> Sifive is moving fast as far as I know)
worked with their cores in $pastJob. I'd say their main products are flowery promises and long errata sheets.
sylware 2 hours ago
Which models? Which nasty issues did you encounter?
DeathArrow 12 hours ago
This is kind of a solution in search for a problem. RISC-V will grow only if people find some value in it. If it solves their actual problems in ways that other architectures can't.
hylaride 11 hours ago
Yeah, the primary reason RISC-V exists is political (the desire to have an "open source" CPU architecture). As noble as that may be, it's not enough to get people or companies to use (or even manufacture!) it. It'll either be economical (costs) and/or performance (including efficiency) that drives people.
It took ARM decades to get to where it is, and that involved a long stint in low-margin niche applications like embedded or appliances where x86 was poorly suited due to head and power consumption.
Symmetry 6 hours ago
cmrdporcupine 11 hours ago
ddtaylor 14 hours ago
Can't zoom any of the content on mobile so most of the charts are unreadable.
sfdlkj3jk342a 13 hours ago
Zoom works fine with Firefox on Android.
GaggiX 14 hours ago
Browsers usually have an accessibility option to force the ability to zoom on all websites.
ddtaylor 2 hours ago
This website has those features disabled in Chrome or Brave. Apparently the Zoom option will only appear for "sites that support this feature". This is because they set this header in the meta tags:
user-scalable=0