DeepSeek v4 (api-docs.deepseek.com)
1072 points by impact_sy 8 hours ago
jari_mustonen 5 hours ago
Open Source as it gets in this space, top notch developer documentation, and prices insanely low, while delivering frontier model capabilities. So basically, this is from hackers to hackers. Loving it!
Also, note that there's zero CUDA dependency. It runs entirely on Huawei chips. In other words, Chinese ecosystem has delivered a complete AI stack. Like it or not, that's a big news. But what's there not to like when monopolies break down?
chvid 4 hours ago
The incredible arrogance and hybris of the American initiated tech war - it is just a beautiful thing to see it slowly fall apart.
The US-China contest aside - it is in the application layer llms will show their value. There the field, with llm commoditization and no clear monopolies, is wide open.
There was a point in time where it looked like llms would the domain of a single well guarded monopoly - that would have been a very dark world. Luckily we are not there now and there is plenty of grounds for optimism.
sigmoid10 4 hours ago
Still not sure how I feel about China of all places to control the only alternative AI stack, but I guess it's better than leaving everything to the US alone. If China ever feels emboldened enough to go for Taiwan and the US descends into complete chaos, the rest of the world running on AI will be at the mercy of authoritarian regimes. At the very least you can be sure noone is in this for the good of the people anymore. This is about who will dominate the world of tomorrow. And China has officially thrown their hat in the ring.
Ladioss 4 hours ago
chmod775 3 hours ago
Cthulhu_ 3 hours ago
mft_ 2 hours ago
chvid 2 hours ago
amunozo 4 hours ago
cde-v 2 hours ago
SgtBastard 3 hours ago
Danox 3 hours ago
srameshc an hour ago
As much I apprecite the sentiment, I think it is too early to declare that the well guareded monopoly is over. Yes, these models have answers, but don't expect all the large enterprises to switch to these models. The other aspect is scaling to serve these models will need a lot of time even if Huawei succeeds. Not all the Governments trust China and there will be a lot of resistance to work with these models eventually, even if cheaper.
spaceman_2020 2 hours ago
I've been baffled watching America double down on the same strategy even when it failed to produce results
They sanctioned the hell out of Huawei and now Huawei is bigger than ever
America is just not able to digest the idea that another country can be as good, if not better, at innovation
hirako2000 an hour ago
lanthissa 2 hours ago
not really, china has gone domestic for everything as soon as it could.
its naive to think they would have stayed on a 'western' stack.
Most of the time 'losing' isn't making a bad choice its being put in a situation where you have no good choices.
ifwinterco 4 hours ago
As a Brit I'm here for it to be honest, I'm tired of America with everything that's going on.
China is not perfect but a bit of competition is healthy and needed
jurgenburgen 4 hours ago
I don’t know if we’re ahead of the curve but that tired feeling has started turning into hate here in the EU. I guess being threatened with invasion does that to you.
The next decade is going to look very different with America Alone.
koe123 4 hours ago
hsiudh 4 hours ago
"not perfect" is a _very_ big simplification of what China is though
rglullis 4 hours ago
IsTom 4 hours ago
hunter67 4 hours ago
barrenko 4 hours ago
This is such a tired argument, and morally repugnant. Where is the UK in the race, where is the EU? Lets get of our asses and stop moralizing.
(China wiped out the entire EU industry through a "quiet" trade war since like the last 15 years, and we're not really talking about that aren't we...)
Cthulhu_ 3 hours ago
HatchedLake721 3 hours ago
calgoo 3 hours ago
falkenstein 4 hours ago
america is a continent. let’s take back our vocabulary (fellow european here). the little orange man shows very well what i mean when he started giving names to the gulf of mexico.
0xDEAFBEAD 8 minutes ago
nailer an hour ago
As someone that lived in Britain for 15 years until 2024, I'm not sure a nation with a GDP per capita lower than Poland, that is now poorer than every state in America, with a gang rape epidemic the government tried to suppress investigating should really concern itself with how other countries are ran.
lifeisstillgood 4 hours ago
As a different Brit I do not accept such moral relativism.
China’s governments actions are on a completely different level - for example:
“””
Since 2014, the government of the People's Republic of China has committed a series of ongoing human rights abuses against Uyghurs and other Turkic Muslim minorities in Xinjiang which has often been characterized as persecution or as genocide.
“”” https://en.wikipedia.org/wiki/Persecution_of_Uyghurs_in_Chin...
https://www.amnesty.org/en/location/asia-and-the-pacific/eas...
Yes Trump is clearly trying Totalitarianism in America, but it is orders of magnitude different from what is happening in China.
amunozo 3 hours ago
phatfish 2 hours ago
cedws 2 hours ago
tw1984 2 hours ago
Markoff 3 hours ago
junnan 3 hours ago
timmmk 4 hours ago
Fellow countryman here. I came here to say the same thing
dzonga an hour ago
Jensen Huang said this in his recent interview - that China has the best/most engineers, it has the chip making ability, it's a good thing they wanna build on a Nvidia stack - but if you push them they will build on an all Chinese stack - but the interviewer was being a numb head who kept parroting the propaganda of Western tech supremacy
wener 12 minutes ago
As a Chinese, I feel tiered, it's like the cold war, what is takes to keep competitive with every aspect, it's just another win for the country and the corp
accountofthaha an hour ago
Does the 'zero CUDA dependency' also count for running it on my own device? I have an AMD card, older model. Would love to have a small version of this running for coding purposes.
Really nice to see the Chinese are competing this strongly with the rest of the world. Competition is always nice for the end-consumer.
d3Xt3r 2 hours ago
> Also, note that there's zero CUDA dependency.
So does this mean I can run this on AMD? And on a consumer 9000 series card?
HarHarVeryFunny 7 minutes ago
If you don't have the source code then it makes no difference. If you have the weights and are running some model via llama.cpp, then you are using whatever API llama.pp is using, not the API that was used to train the model or that anyone else may be using to serve it.
randomgermanguy 2 hours ago
If you found a rare 9000 card with 200+ GB of VRAM, sure
TrackerFF 4 hours ago
Let's see how long it takes before the big US AI companies start lobbying to outright ban use of Chinese AI, even the open source / local models. For "national security" reasons, of course.
chronc6393 2 hours ago
> Let's see how long it takes before the big US AI companies start lobbying to outright ban use of Chinese AI, even the open source / local models. For "national security" reasons, of course.
Already do on EVs.
barnabee 4 hours ago
Hopefully the US’ self imposed isolation will mean that when they do, they aren’t able to force the rest of the world to follow suit.
ibic 4 hours ago
"Open Source" is the ultimate romance understood by software engineers.
nsoonhui an hour ago
Sorry, but exactly where did you get the idea that DS V4 runs entirely on Huawei?
I asked DS itself and it denied this. It says: 'Nvidia chips are absolutely used for DeepSeek V4. The reality is a pragmatic "both-and" strategy, not an "either-or."'
And based on the DS V4 technical report (https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...), it is mentioned that:
We validated the fine-grained EP scheme on both NVIDIA GPUs and HUAWEI Ascend NPUs platforms. Compared against strong non-fused baselines, it achieves 1.50 ~ 1.73× speedup for general inference workloads, and up to 1.96× for latency-sensitive scenarios such as RL rollouts and high-speed agent serving.
(In all honesty I relied on DS to give me the above, so I haven't vetted the information in full.)It mentions that Nvidia is still used. It doesn't even mention that Huawei chips are used in production — only in testing and validation, yes.
taytus 29 minutes ago
>I asked DS itself and it denied this
Bro, seriously?
khalic 2 hours ago
Open weight and open source are not the same
SquareWheel 2 hours ago
This is a pretty banal comment at this point. Open source is the term used in the LLM community. It's common and understood. Nobody is going to release petabytes of copyrighted training data, so the distinction between open source vs weights is a rather pointless one.
stefan_ an hour ago
frankdenbow 2 hours ago
Jensen was saying this in that interview last week and the interviewer dismissed it.
kitd 4 hours ago
I can't find any info on what exactly is open sourced.
And in any case what does open source actually mean for an llm? It's not like you can look inside it to see what it's doing.
gommm 3 hours ago
For me open source means that the entire training data is open sourced as well as the code used for training it otherwise it's open weight. You can run it where you like but it's a black box. Nomic's models are good example of opensource.
adammarples 2 hours ago
laurentiurad 2 hours ago
not a full AI stack. Training still runs on NVIDIA chips.
nailer an hour ago
It's also not fake open source like Metas models - https://huggingface.co/deepseek-ai/DeepSeek-R1-0528, the weights are actually under a real open source license, (MIT), see https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
slekker 5 hours ago
But remember to not ask about Taiwan!
tigrezno 4 hours ago
you talk like there isn't censorship in american AIs, like Israel topics.
unclejuan 3 hours ago
swingboy 2 hours ago
spaceman_2020 2 hours ago
I can't wait for Taiwan to peacefully reunify with the mainland so the west with its constant war waging won't even have this talking point
wallst07 36 minutes ago
eunos 3 hours ago
> China asks other country not to meddle with internal separatism > They also dont support separatism in my country
Understandable.
spiderfarmer 4 hours ago
Just ask it for a summary of the USA’s role in Iran, Gaza, Lebanon and its recent threats against Panama, Cuba and Greenland! It might be able to keep track.
teiferer 4 hours ago
libertine 4 hours ago
Lionga 4 hours ago
Quit a bit better then made to bomb little girl schools in Iran.
Markoff 3 hours ago
pretty sure you can ask whatever you want and it will tell you official stance agreed by almost all countries in the world that Taiwan is part of China as it's recognized by your own country (I don't even know where are you from, but there is like 98% chance I'm right)
sudo_cowsay 4 hours ago
I sometimes wonder if there are any security risks with using Chinese LLMs. Is there?
dalemhurley 4 hours ago
Theoretically yes. It is entirely possible to poison the training data for a supply chain attack against vibe coders. The trick would be to make it extremely specific for a high value target so it is not picked up by a wide range of people. You could also target a specific open source project that is used by another widely used product.
However there is so many factors involved beyond your control that it would not be a viable option compared to other possible security attacks.
2ndorderthought 36 minutes ago
wallst07 30 minutes ago
mazurnification 3 hours ago
_blk 2 hours ago
oliwarner 4 hours ago
If there is, couldn't they exist in any model?
I don't mean that flippantly. These things are dumped in the wild, used on common (largely) open source execution chains. If you find a software exploit, it's going to affect your population too.
Wet exploits are a bit harder to track. I'd assume there are plenty of biases based on training material but who knows if these models have a MKUltra training programme integrated into them?
rhubarbtree 3 hours ago
Backdooring software at scale.
Spearphishing.
Building reliance and exploiting it, through state subsidies, dumping, and market manipulation.
Handicapping provision to the west for competitive advantage.
2ndorderthought 29 minutes ago
cassianoleal 4 hours ago
What about LLMs from other origins? What makes them less risky?
eucyclos 3 hours ago
From my experience, kinda the opposite? It's like Chinese software is... Harder to weaponize or hurt yourself on. Deepseek is definitely censored, but I've never caught it being dishonest in a sneaky way.
Hamuko 4 hours ago
There must be. The executives at my company wouldn't have banned them all for no reason after all.
baal80spam 4 hours ago
Is this a serious comment? It honestly reads like the last famous words.
Of course there are risks.
hodgehog11 4 hours ago
There are quite a few comments here about benchmark and coding performance. I would like to offer some opinions regarding its capacity for mathematics problems in an active research setting.
I have a collection of novel probability and statistics problems at the masters and PhD level with varying degrees of feasibility. My test suite involves running these problems through first (often with about 2-6 papers for context) and then requesting a rigorous proof as followup. Since the problems are pretty tough, there is no quantitative measure of performance here, I'm just judging based on how useful the output is toward outlining a solution that would hopefully become publishable.
Just prior to this model, Gemini led the pack, with GPT-5 as a close second. No other model came anywhere near these two (no, not even Claude). Gemini would sometimes have incredible insight for some of the harder problems (insightful guesses on relevant procedures are often most useful in research), but both of them tend to struggle with outlining a concrete proof in a single followup prompt. This DeepSeek V4 Pro with max thinking does remarkably well here. I'm not seeing the same level of insights in the first response as Gemini (closer to GPT-5), but it often gets much better in the followup, and the proofs can be _very_ impressive; nearly complete in several cases.
Given that both Gemini and DeepSeek also seem to lead on token performance, I'm guessing that might play a role in their capacity for these types of problems. It's probably more a matter of just how far they can get in a sensible computational budget.
Despite what the benchmarks seem to show, this feels like a huge step up for open-weight models. Bravo to the DeepSeek team!
ozgune 2 hours ago
I reviewed how DeepSeek V4-Pro, Kimi 2.6, Opus 4.6, and Opus 4.7 across the same AI benchmarks. All results are for Max editions, except for Kimi.
Summary: Opus 4.6 forms the baseline all three are trying to beat. DeepSeek V4-Pro roughly matches it across the board, Kimi K2.6 edges it on agentic/coding benchmarks, and Opus 4.7 surpasses it on nearly everything except web search.
DeepSeek V4-Pro Max shines in competitive coding benchmarks. However, it trails both Opus models on software engineering. Kimi K2.6 is remarkably competitive as an open-weight model. Its main weakness is in pure reasoning (GPQA, HMMT) where it trails Opus.
Speculation: The DeepSeek team wanted to come out with a model that surpassed proprietary ones. However, OpenAI dropped 5.4 and 5.5 and Anthropic released Opus 4.6 and 4.7. So they chose to just release V4 and iterate on it.
Basis for speculation? (i) The original reported timeline for the model was February. (ii) Their Hugging Face model card starts with "We present a preview version of DeepSeek-V4 series". (iii) V4 isn't multimodal yet (unlike the others) and their technical report states "We are also working on incorporating multimodal capabilities to our models."
lifty 4 hours ago
Wondering how gpt 5.5 is doing in your test. Happy to hear that DeepSeek has good performance in your test, because my experience seems to correlate with yours, for the coding problems I am working on. Claude doesn't seem to be so good if you stray away from writing http handlers (the modern web app stack in its various incarnations).
hodgehog11 3 hours ago
Very cool to hear there is agreement with (probably quite challenging?) coding problems as well.
Just ran a couple of them through GPT 5.5, but this is a single attempt, so take any of this with a grain of salt. I'm on the Plus tier with memory off so each chat should have no memory of any other attempt (same goes for other models too).
It seems to be getting more of the impressive insights that Gemini got and doing so much faster, but I'm having a really hard time getting it to spit out a proper lengthy proof in a single prompt, as it loves its "summaries". For the random matrix theory problems, it also doesn't seem to adhere to the notation used in the documents I give it, which is a bit weird. My general impression at the moment is that it is probably on par with Gemini for the important stuff, and both are a bit better than DeepSeek.
I can't stress how much better these three models are than everything else though (at least in my type of math problems). Claude can't get anything nontrivial on any of the problems within ten (!!) minutes of thinking, so I have to shut it off before I run into usage limits. I have colleagues who love using Claude for tiny lemmas and things, so your mileage may vary, but it seems pretty bad at the hard stuff. Kimi and GLM are so vague as to be useless.
lifty 2 hours ago
nibbleyou 4 hours ago
Curious to know what kind of problems you are talking about here
hodgehog11 4 hours ago
I don't want to give away too much due to anonymity reasons, but the problems are generally in the following areas (in order from hardest to easiest):
- One problem on using quantum mechanics and C*-algebra techniques for non-Markovian stochastic processes. The interchange between the physics and probability languages often trips the models up, so pretty much everything tends to fail here.
- Three problems in random matrix theory and free probability; these require strong combinatorial skills and a good understanding of novel definitions, requiring multiple papers for context.
- One problem in saddle-point approximation; I've just recently put together a manuscript for this one with a masters student, so it isn't trivial either, but does not require as much insight.
- One problem pertaining to bounds on integral probability metrics for time-series modelling.
pm2r 4 hours ago
throwa356262 5 hours ago
Seriously, why can't huge companies like OpenAI and Google produce documentation that is half this good??
https://api-docs.deepseek.com/guides/thinking_mode
No BS, just a concise description of exactly what I need to write my own agent.
u_sama 4 hours ago
I am very partial to Mistral's API docs https://docs.mistral.ai/api
vitorgrs 5 hours ago
Meanwhile, they don't actually say which model you are running on Deepseek Chat website.
lykr0n 5 hours ago
It's because they're optimizing for a different problem.
Western Models are optimizing to be used as an interchangeable product. Chinese models are being optimizing to be built upon.
Barbing 4 hours ago
>Western Models are optimizing to be used as an interchangeable product.
But so much investment in their platforms, not just their APIs?
raincole 5 hours ago
> Western Models are optimizing to be used as an interchangeable product
Why? It sounds like the stupidest idea ever. Interchangeability = no lock-in = no moot.
setr 3 hours ago
hunter67 4 hours ago
FuckButtons 4 hours ago
simonjgreen 4 hours ago
tick_tock_tick 4 hours ago
peepee1982 5 hours ago
Alifatisk 5 hours ago
You might enjoy Z.ais api docs aswell
kubb 4 hours ago
Western orgs have been captured by Silicon Valley style patrimonialism, and aren’t based on merit anymore.
orbital-decay 5 hours ago
>we implement end-to-end, bitwise batch-invariant, and deterministic kernels with minimal performance overhead
Pretty cool, I think they're the first to guarantee determinism with the fixed seed or at the temperature 0. Google came close but never guaranteed it AFAIK. DeepSeek show their roots - it may not strictly be a SotA model, but there's a ton of low-level optimizations nobody else pays attention to.
sergiopreira 8 minutes ago
DeepSeek is commoditizing frontier capability... Opus 4.6-level benchmarks at a fraction of the cost changes also who can access these tools.
Stuff that was prohibitive six months ago is now up for grabs. We keep on working on the infra level now, swithcing models whenever we run out of credits, or want a different result. The question is how do we build context, architecture and ensure the agent is effective and efficient..... wouldn't it be good if we simply used less energy to make these AI calls?
xingyi_dev 3 hours ago
Deepseek v4 is basically that quiet kid in the back of the class who never says a word but casually ruins the grading curve for everyone else on the final exam.
chenzhekl 3 hours ago
It's interesting that they mentioned in the release notes:
"Limited by the capacity of high-end computational resources, the current throughput of the Pro model remains constrained. We expect its pricing to decrease significantly once the Ascend 950 has been deployed into production."
https://api-docs.deepseek.com/zh-cn/news/news260424#api-%E8%...
nsoonhui an hour ago
Sorry, but exactly where in the article that you linked contains the mention of " Ascend 950"?
chenzhekl an hour ago
it's in the footnote text of the first figure of the section the link points to, where "昇腾950" means "Ascend 950"
nsoonhui an hour ago
revolvingthrow 6 hours ago
> pricing "Pro" $3.48 / 1M output tokens vs $4.40
I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.
edit: $1.74/M input $3.48/M output on OpenRouter
menzoic 4 hours ago
API prices may be profitable. Subscriptions may still be subsidized for power users. Free tiers almost certainly are. And frontier labs may be subsidizing overall business growth, training, product features, and peak capacity, even if a normal metered API call is profitable on marginal inference.
dannyw 3 hours ago
Research and training costs have to be amortized from somewhere; and labs are always training. I'm definitely keen for the financials when the two files for IPO though, it would be interesting to see; although I'm sure it won't be broken down much.
schneehertz 5 hours ago
This price is high even because of the current shortage of inference cards available to DeepSeek; they claimed in their press release that once the Ascend 950 computing cards are launched in the second half of the year, the price of the Pro version will drop significantly
Bombthecat 4 hours ago
In six month deepseek won't be sota anymore und usage will be wayyyy down.
2ndorderthought 23 minutes ago
randomgermanguy 2 hours ago
Palmik an hour ago
Barbing 4 hours ago
LinXitoW an hour ago
They got loans to buy inference hardware on the promise of potential AGI, or at least something approaching ASI, all leading to stupid amounts of profit for those investors.
We therefore cannot just look at inference costs directly, training is part of the pitch. Without the promises of continuous improvement and chasing the elusive AGI, money for investments for inference evaporates.
m00x 5 hours ago
They are profitable to opex costs, but not capex costs with the current depreciation schedules, though those are now edging higher than expected.
nl 3 hours ago
Amazingly, the current depreciation overestimates the retained value of GPUs.
In 2023, the depreciation schedule for H100s was 2 years, but they are still oversubscribed and generating signficant income.
Coreweve has upped their depreciation for GPUs to 6 years(!) now, which seems more realistic.
https://www.silicondata.com/blog/h100-rental-price-over-time
amunozo 5 hours ago
I was thinking the same. How can it be than other providers can offer third-party open source models with roughly the similar quality like this, Kimi K2.6 or GLM 5.1 for 10 times less the price? How can it be that GPT 5.5 is suddenly twice the price as GPT 5.4 while being faster? I don't believe that it's a bigger, more expensive model to run, it's just they're starting to raise up the prices because they can and their product is good (which is honest as long as they're transparent with it). Honestly the movement about subscription costing the company 20 times more than we're paying is just a PR movement to justify the price hike.
peepee1982 4 hours ago
I'm pretty sure OpenAI and Anthropic are overpricing their token billed API usage mainly as an incentive to commit to get their subscriptions instead.
simonjgreen 4 hours ago
weird-eye-issue 4 hours ago
raincole 5 hours ago
Insert always has been meme.
But seriously, it just stems from the fact some people want AI to go away. If you set your conclusion first, you can very easily derive any premise. AI must go away -> AI must be a bad business -> AI must be losing money.
zarzavat 5 hours ago
Before the AI bubble that will burst any time now, there was the AI winter that would magically arrive before the models got good enough to rival humans.
crazylogger 4 hours ago
I haven't seen anyone claiming that API prices are subsidized.
At some point (from the very beginning till ~2025Q4) Claude Code's usage limit was so generous that you can get roughly $10~20 (API-price-equivalent) worth of usage out of a $20/mo Pro plan each day (2 * 5h window) - and for good reason, because LLM agentic coding is extremely token-heavy, people simply wouldn't return to Claude Code for the second time if provided usage wasn't generous or every prompt costs you $1. And then Codex started trying to poach Claude Code users by offering even greater limits and constantly resetting everyone's limit in recent months. The API price would have to be 30x operating cost to make this not a subsidy. That would be an extraordinary claim.
nl 3 hours ago
The claim that APIs are subsidized is very common.
eg:
Token prices are significantly subsidized and anyone that does any serious work with AI can tell you this.
https://news.ycombinator.com/item?id=47684887
(the claims don't make any sense, but they are widely held)
vessenes 2 hours ago
dannyw 3 hours ago
Yeah, subscriptions used to be extraordinarily generous. I miss those days, but the reinvigoration of open weight models is super exciting.
I'm still playing with the new Qwen3.6 35B and impressed, now DeepSeek v4 drops; with both base and instruction-tuned weights? There goes my weekend :P
mirzap 5 hours ago
My thoughts exactly. I also believe that subscription services are profitable, and the talk about subsidies is just a way to extract higher profit margins from the API prices businesses pay.
Bombthecat 4 hours ago
Google stated a while back, that with tpus they are able to sell at cost / with profit.
Aka: everyone who uses Nvidia isn't selling at cost, because Nvidia is so expensive.
vitorgrs 5 hours ago
And they actually say the prices will be "significantly" lower in second semester when Huawei 650 chips comes in.
jimmydoe 5 hours ago
They’ve also announced Pro price will further drop 2H26 once they have more HUAWEI chips.
masafej536 5 hours ago
Point taken but there isnt any western providers there yet. Power is cheaper in china.
3uler 5 hours ago
These models are open and there are tons of western providers offering it at comparable rates.
NitpickLawyer 5 hours ago
As this is a new arch with tons of optimisations, it'll take some time for inference engines to support it properly, and we'll see more 3rd party providers offer it. Once that settles we'll have a median price for an optimised 1.6T model, and can "guesstimate" from there what the big labs can reasonably serve for the same price. But yeah, it's been said for a while that big labs are ok on API costs. The only unknown is if subscriptions were profitable or not. They've all been reducing the limits lately it seems.
ithkuil 3 hours ago
Flavius 2 hours ago
It's because investors in OpenAI/Anthropic want to get their money back in 10 months, not in 10 years.
casey2 3 hours ago
It's the decades of performance doesn't matter SV/web culture. I'd be surprised if over 1% of OpenAI/Anthropic staff know how any non-toy computer system works.
dminik 5 hours ago
I mean, not one "bleeding edge" lab has stated they are profitable. They don't publish financials aside from revenue. And in Anthropic's case, they fuck with pricing every week. Clearly something is wrong here.
npn an hour ago
you know, if you don't have to pay insane salary for your top engineers, and don't have to pay billions for internet shills to control the narrative, then all of the labs will be insane profitable.
sekai 5 hours ago
> I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.
One answer - Chinese Communist Party. They are being subsidized by the state.
lbreakjai 3 hours ago
When China does it it's communism. When companies in the west get massive tax cuts, rebates, incentives and subsidies, that's just supporting the captains of industry.
fblp 7 hours ago
There's something heartwarming about the developer docs being released before the flashy press release.
taurath 4 hours ago
Their audience is people who build stuff, techs audience is enterprise CEOs and politicians, and anyone else happy to hype up all the questionably timed releases and warnings of danger, white collar irrelevence, or promises of utopian paradise right before a funding round.
onchainintel 7 hours ago
Insert obligatory "this is the way" Mando scene. Indeed!
necovek 7 hours ago
Where's the training data and training scripts since you are calling this open source?
Edit: it seems "open source" was edited out of the parent comment.
b65e8bee43c2ed0 6 hours ago
doesn't it get tiring after a while? using the same (perceived) gotcha, over and over again, for three years now?
no one is ever going to release their training data because it contains every copyrighted work in existence. everyone, even the hecking-wholesome safety-first Anthropic, is using copyrighted data without permission to train their models. there you go.
necovek 6 hours ago
Tepix 5 hours ago
fragmede 6 hours ago
woctordho 3 hours ago
They are exactly open source. The training data is the internet. Don't say it's on the internet. It IS the internet.
The training scripts are in Megatron and vLLM.
bl4ckneon 6 hours ago
Aww yes, let me push a couple petabytes to my git repo for everyone to download...
necovek 6 hours ago
0-_-0 5 hours ago
Weights are the source, training data is the compiler.
injidup 5 hours ago
yanis_t 22 minutes ago
Assuming it is almost as good as Opus 4.6 (which benchmarks seem to give evidence for), and assuming we are having a good enough harness (PI, OpenCode), it's is now more than 5x cheaper.
I just want to remind you that this is happening at the same time as Anthropic A/B tests removal of Code from Pro Plan, and as OpenAI releases gpt-5.5 2x more expensive than gpt-5.4...
stingraycharles 19 minutes ago
> Assuming it is almost as good as Opus 4.6 (which benchmarks seem to give evidence for)
That’s a big if. It’s my experience that models that perform very well on benchmarks do not necessarily perform well in real life.
I’ve mostly started ignoring the benchmarks and run my own evals.
dizhn an hour ago
I like deepseek. It works very well. I haven't tried v4 yet but on their web chat interface, just typing "Taiwan" causes it to give you a lecture about how Taiwan is part of China. :)
jyscao a minute ago
What a gotcha
gbnwl 8 hours ago
I’m deeply interested and invested in the field but I could really use a support group for people burnt out from trying to keep up with everything. I feel like we’ve already long since passed the point where we need AI to help us keep up with advancements in AI.
satvikpendem 6 hours ago
Don't keep up. Much like with news, you'll know when you need to know, because someone else will tell you first.
vessenes 2 hours ago
This is only good advice if you don’t have the need to understand what’s happening on the edge of the frontier. If you do, then you’ll lose on compounding the knowledge from staying engaged with the major developments.
wordpad 7 hours ago
The players barely ever change. People don't have problems following sports, you shouldn't struggle so much with this once you accept top spot changes.
ehnto 7 hours ago
It is funny seeing people ping pong between Anthropic and ChatGPT, with similar rhetoric in both directions.
At this point I would just pick the one who's "ethics" and user experience you prefer. The difference in performance between these releases has had no impact on the meaningful work one can do with them, unless perhaps they are on the fringes in some domain.
Personally I am trying out the open models cloud hosted, since I am not interested in being rug pulled by the big two providers. They have come a long way, and for all the work I actually trust to an LLM they seem to be sufficient.
DiscourseFan 7 hours ago
gbnwl 6 hours ago
I didn't express this well but my interest isn't "who is in the top spot", and is more _why and _how various labs get the results they do. This is also magnified by the fact that I'm not only interested in hosted providers of inference but local models as well. What's your take on the best model to run for coding on 24GB of VRAM locally after the last few weeks of releases? Which harness do you prefer? What quants do you think are best? To use your sports metaphor it's more than following the national leagues but also following college and even high school leagues as well. And the real interest isn't even who's doing well but WHY, at each level.
yorwba 3 hours ago
renticulous 5 hours ago
dnnddidiej an hour ago
vrganj 6 hours ago
It honestly has all kinda felt like more of the same ever since maybe GPT4?
New model comes out, has some nice benchmarks, but the subjective experience of actually using it stays the same. Nothing's really blown my mind since.
Feels like the field has stagnated to a point where only the enthusiasts care.
ifwinterco 4 hours ago
For coding Opus 4.5 in q3 2025 was still the best model I've used.
Since then it's just been a cycle of the old model being progressively lobotomised and a "new" one coming out that if you're lucky might be as good as the OG Opus 4.5 for a couple of weeks.
Subjective but as far as I can tell no progress in almost a year, which is a lifetime in 2022-25 LLM timelines
trueno 6 hours ago
holy shit im right there with you
sho 4 hours ago
So, this is the version that's able to serve inference from Huawei chips, although it was still trained on nVidia. So unless I'm very much mistaken this is the biggest and best model yet served on (sort of) readily-available chinese-native tech. Performance and stability will be interesting to see; openrouter currently saying about 1.12s and 30tps, which isn't wonderful but it's day one after all.
For reference, the huawei Ascend 950 that this thing runs on is supposed to be roughly comparable to nVidia's H100 from 2022. In other words, things are hotting up in the GPU war!
alpineman 4 hours ago
Can't see how NVIDA justifies its valuation/forward P/E ratio with these developments and on-device also becoming viable for 98% of people's needs when it comes to AI
aurareturn 3 hours ago
On-device is incredibly far away from being viable. A $20 ChatGPT subscription beats the hell out of the 8B model that a $1,000 computer can run.
Nvidia's forward PE ratio is only 20 for 2026. That's much lower than companies like Walmart and Costco. It's also growing nearly 100% YoY and has a $1 trillion backlog.
I think Nvidia is cheap.
2ndorderthought 17 minutes ago
midwain 2 hours ago
littlestymaar an hour ago
dannyw 3 hours ago
npodbielski 4 hours ago
Great! Can't wait to buy decent GPU for interference for <1k$
primaprashant 5 hours ago
While SWE-bench Verified is not a perfect benchmark for coding, AFAIK, this is the first open-weights model that has crossed the threshold of 80% score on this by scoring 80.6%.
Back in Nov 2025, Opus 4.5 (80.9%) was the first proprietary model to do so.
stared 4 hours ago
SWE-bench Verified is, at this point, contaminated https://openai.com/index/why-we-no-longer-evaluate-swe-bench...
So it os hard to tell how much of a model gain is due to skill, and how much - overfitting.
yanis_t 7 hours ago
Already on Openrouter. Pro version is $1.74/m/input, $3.48/m/output, while flash $0.14/m/input, 0.28/m/output.
nl 3 hours ago
The Pro model is giving 429 Overload errors
astrod 6 hours ago
Getting 'Api Error' here :( Every other model is working fine.
poglet 6 hours ago
Try interacting with it through the website, it will give an error and some explanation on the issue. I had to relax my guardrail settings.
esafak 7 hours ago
77ko 6 hours ago
Its on OR - but currently not available on their anthropic endpoint. OR if you read this, pls enable it there! I am using kimi-2.6 with Claude Code, works well, but Deepseek V4 gives an error:
`https://openrouter.ai/api/messages with model=deepseek/deepseek-v4-pro, OR returns an error because their Anthropic-compat translator doesn't cover V4 yet. The Claude CLI dutifully surfaces that error as "model...does not exist"
amunozo 4 hours ago
For those who rely on open source models but don't want to stop using frontier models, how do you manage it? Do you pay any of the Chinese subscription plans? Do you pay the API directly? After GPT 5.5 release, however good it is, I am a bit tired of this price hiking and reduced quota every week. I am now unemployed and cannot afford more expensive plans for the moment.
regularfry 15 minutes ago
I've been on Kimi K2.5 on openrouter for a couple of months for anything I can't run locally. Really is dirt cheap for how good it is. Haven't assessed K2.6 yet but the price is higher so it needs to be more efficient, not just more capable.
But more broadly: openrouter solves the problem of making a broad range of models available with a single payment endpoint, so you can just switch around as much as you like.
solarkraft an hour ago
At home I currently use MiniMax via OpenRouter - it’s pretty good and very cheap. They have a subscription plan, but I’m not ready to commit to it yet.
Another way to keep the ability to try out new models is to buy a reseller subscription like Cursor’s.
amunozo 15 minutes ago
I tried OpenRouter but I feel the money flies even with these models, it is not comparable to a subscription but yes, it's very good for trying. Maybe I should test other models alongside GPT 5.5 to see which one fits me.
azuanrb 3 hours ago
I have $20 ChatGPT subscription. Stopped Anthropic $20 subscription since the limit ran out too fast. That's my frontier model(s).
For OSS model, I have z.ai yearly subscription during the promo. But it's a lot more expensive now. The model is good imo, and just need to find the right providers. There are a lot of alternatives now. Like I saw some good reviews regarding ollama cloud.
amunozo 34 minutes ago
I am thinking about getting some 1 year promotion as a student before defending my PhD.
the_gipsy 3 hours ago
Have you considered... not subscribing? You can ask the top models via chats for specific stuff, and then set up some free CLI like mistral.
If you're trying to make a buck while unemployed, sure get a subscription. Otherwise learn how to work again without AI, just focus on the interesting stuff.
amunozo 3 hours ago
I just want to try to make something useful out of my time, that's why I'm subscribed to Codex at the moment. 20€ is affordable, not really a problem. But yes, maybe I would do me a favor unsubscribing and going back to the old ways to learn properly.
the_gipsy 2 hours ago
cmrdporcupine an hour ago
For DeepSeek you can use their API and if you ran it constantly you'd still be under what OpenAI or Anthropic charge for a coding plan.
anentropic 29 minutes ago
I had Claude make me a quick tool to combine my Claude Code token usage (via ccusage util) with OpenRouter pricing from the models API
I'm on Max x5 plan and any of the 'good' models like Kimi 2.6, GLM, DeepSeek would have cost 3-5x in per-token billing for what I used on my Claude plan the last three months
So unless my Claude fudged the maths to make itself look better, seems like I'm getting a good deal
amunozo 14 minutes ago
I am not so sure, credits fly when using any model trough API if I use it as much as I use Codex.
seanobannon 8 hours ago
Weights available here: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
BoorishBears 6 hours ago
https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-Base https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-Base
And we got new base models, wonderful, truly wonderful
mchusma 7 hours ago
For comparison on openrouter DeepSeek v4 Flash is slightly cheaper than Gemma 4 31b, more expensive than Gemma 4 26b, but it does support prompt caching, which means for some applications it will be the cheapest. Excited to see how it compares with Gemma 4.
MillionOClock 5 hours ago
I wonder why there aren't more open weights model with support for prompt caching on OpenRouter.
mzl 4 hours ago
It is tricky to build good infrastructure for prompt caching.
sidcool 7 hours ago
Truly open source coming from China. This is heartwarming. I know if the potential ulterior motives.
b65e8bee43c2ed0 5 hours ago
American companies want a scan of your asshole for the privilege of paying to access their models, and unapologetically admit to storing, analyzing, training on, and freely giving your data to any authorities if requested. Chinese ulteriority is hypothetical, American is blatant.
elefanten 5 hours ago
It’s not remotely hypothetical you’d have to be living under a rock to believe that. And the fusion with a one-party state government that doesn’t tolerate huge swathes of thoughtspace being freely discussed is completely streamlined, not mediated by any guardrails or accountability.
This “no harm to me” meme about a foreign totalitarian government (with plenty of incentive to run influence ops on foreigners) hoovering your data is just so mind-bogglingly naive.
ben_w 5 hours ago
oceanplexian 5 hours ago
randomNumber7 4 hours ago
b65e8bee43c2ed0 5 hours ago
danny_codes 5 hours ago
theshackleford 5 hours ago
michaelt 3 hours ago
casey2 3 hours ago
t0lo 5 hours ago
thesmtsolver2 4 hours ago
As someone with Tibetan friends and as someone from India, Chinese ulterior motives are way more clear.
mordae 4 hours ago
Quothling 5 hours ago
It's a little sad that tech now comes down to geopolitics, but if you're not in the USA then what is the difference? I'm Danish, would I rather give my data to China or to a country which recently threatened the kingdom I live in with military invasion? Ideally I'd give them to Mistral, but in reality we're probably going to continue building multi-model tools to make sure we share our data with everyone equally.
spaceman_2020 5 hours ago
I don’t care about whatever “ulterior motives” they might have
My country’s per capita income is $2500 a year. We can’t pay perpetual rent to OAI/Anthropic
djyde 4 hours ago
Same
try-working 7 hours ago
if you want to understand why labs open source their models: http://try.works/why-chinese-ai-labs-went-open-and-will-rema...
wraptile 5 hours ago
> Internet comments say that open sourcing is a national strategy, a loss maker subsidized by the government. On the contrary, it is a commercial strategy and the best strategy available in this industry.
This sounds whole lot like potatoh potahto. I think the former argument is very much the correct one: China can undercut everyone and win, even at a loss. Happened with solar panels, steel, evs, sea food - it's a well tested strategy and it works really well despite the many flavors it comes in.
That being said a job well done for the wrong reasons is still a job well done so we should very much welcome these contributions, and maybe it's good to upset western big tech a bit so it's remains competitive.
try-working 5 hours ago
I_am_tiberius 7 hours ago
Open weight!
alecco 6 hours ago
Please don't slander the most open AI company in the world. Even more open than some non-profit labs from universities. DeepSeek is famous for publishing everything. They might take a bit to publish source code but it's almost always there. And their papers are extremely pro-social to help the broader open AI community. This is why they struggle getting funded because investors hate openness. And in China they struggle against the political and hiring power of the big tech companies.
Just this week they published a serious foundational library for LLMs https://github.com/deepseek-ai/TileKernels
Others worth mentioning:
https://github.com/deepseek-ai/DeepGEMM a competitive foundational library
https://github.com/deepseek-ai/Engram
https://github.com/deepseek-ai/DeepSeek-V3
https://github.com/deepseek-ai/DeepSeek-R1
https://github.com/deepseek-ai/DeepSeek-OCR-2
They have 33 repos and counting: https://github.com/orgs/deepseek-ai/repositories?type=all
And DeepSeek often has very cool new approaches to AI copied by the rest. Many others copied their tech. And some of those have 10x or 100x the GPU training budget and that's their moat to stay competitive.
The models from Chinese Big Tech and some of the small ones are open weights only. (and allegedly benchmaxxed) (see https://xcancel.com/N8Programs/status/2044408755790508113). Not the same.
patshead 5 hours ago
kortilla 4 hours ago
0-_-0 5 hours ago
Weights are the source, training data is the compiler
crazylogger 5 hours ago
ngruhn 5 hours ago
zerr 4 hours ago
Do they also open-source censoring filter rules? Like, you can't ask what happened at Tiananmen Square in 1989.
harladsinsteden 4 hours ago
> I know if the potential ulterior motives.
And you think the US tech giants don't have any ulterior motives?!
FuckButtons 3 hours ago
I think their motives are pretty transparent, as are china’s, as ever, you have to pick the lesser of two evils.
nthypes 8 hours ago
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
Model was released and it's amazing. Frontier level (better than Opus 4.6) at a fraction of the cost.
0xbadcafebee 7 hours ago
I don't think we need to compare models to Opus anymore. Opus users don't care about other models, as they're convinced Opus will be better forever. And non-Opus users don't want the expense, lock-in or limits.
As a non-Opus user, I'll continue to use the cheapest fastest models that get my job done, which (for me anyway) is still MiniMax M2.5. I occasionally try a newer, more expensive model, and I get the same results. I have a feeling we might all be getting swindled by the whole AI industry with benchmarks that just make it look like everything's improving.
versteegen 6 hours ago
Which model's best depends on how you use it. There's a huge difference in behaviour between Claude and GPT and other models which makes some poor substitutes for others in certain use cases. I think the GPT models are a bad substitute for Claude ones for tasks such as pair-programming (where you want to see the CoT and have immediate responses) and writing code that you actually want to read and edit yourself, as opposed to just letting GPT run in the background to produce working code that you won't inspect. Yes, GPT 5.4 is cheap and brilliant but very black-box and often very slow IME. GPT-5.4 still seems to behave the same as 5.1, which includes problems like: doesn't show useful thoughts, can think for half an hour, says "Preparing the patch now" then thinks for another 20 min, gives no impression of what it's doing, reads microscopic parts of source files and misses context, will do anything to pass the tests including patching libraries...
ind-igo 6 hours ago
Agree with your assessment, I think after models reached around Opus 4.5 level, its been almost indistinguishable for most tasks. Intelligence has been commoditized, what's important now is the workflows, prompting, and context management. And that is unique to each model.
vidarh 4 hours ago
wuschel 5 hours ago
sandos 5 hours ago
Is Opus nerfed somehow in Copilot? Ive tried it numerous times, it has never reallt woved me. They seem to have awfully small context windows, but still. Its mostly their reasoning which has been off
Codex is just so much better, or the genera GPT models.
specproc 4 hours ago
spaceman_2020 5 hours ago
I found Opus 4.7 to be actually worse than Opus 4.6 for my use case
Substantially worse at following instructions and overoptimized for maximizing token usage
kmarc 6 hours ago
This resonates with me a lot.
I do some stuff with gemini flash and Aider, but mostly because I want to avoid locking myself into a walled garden of models, UIs and company
post-it 6 hours ago
What do you run these on? I've gotten comfortable with Claude but if folks are getting Opus performance for cheaper I'll switch.
oceanplexian 5 hours ago
slopinthebag 6 hours ago
sandGorgon 6 hours ago
actually this is not the reason - the harness is significantly better. There is no comparable harness to Claude Code with skills, etc.
Opencode was getting there, but it seems the founders lost interest. Pi could be it, but its very focused on OpenClaw. Even Codex cli doesnt have all of it.
which harness works well with Deepseek v4 ?
darkwater 5 hours ago
avereveard 5 hours ago
eh idk. until yesterday opus was the one that got spatial reasoning right (had to do some head pose stuff, neither glm 5.1 nor codex 5.3 could "get" it) and codex 5.3 was my champion at making UX work.
So while I agree mixed model is the way to go, opus is still my workhorse.
gunalx 2 hours ago
szundi 6 hours ago
I don’t know what people are doing but Minimax produced 16 bugreports which of 15 was false positives (literally a mistake).
In contrast ChatGPT 5.3 and also Opus has a 90% rate at least on this same project. (Embedded)
All other tests were the same. What are you doing with these models?
onchainintel 7 hours ago
How does it compare to Opus 4.7? I've been immersed in 4.7 all week participating in the Anthropic Opus 4.7 hackathon and it's pretty impressive even if it's ravenous from a token perspective compared to 4.6
greenknight 7 hours ago
The thing is, it doesnt need to beat 4.7. it just needs to do somewhat well against it.
This is free... as in you can download it, run it on your systems and finetune it to be the way you want it to be.
libraryofbabel 6 hours ago
p1esk 7 hours ago
onchainintel 7 hours ago
kelseyfrog 7 hours ago
johnmaguire 7 hours ago
spaceman_2020 5 hours ago
Tbh I was more productive with 4.6 than ever before and if AI progress locks in permanently at 4.6 tier, I’d be pretty happy
rvz 7 hours ago
It is more than good enough and has effectively caught up with Opus 4.6 and GPT 5.4 according to the benchmarks.
It's about 2 months behind GPT 5.5 and Opus 4.7.
As long as it is cheap to run for the hosting providers and it is frontier level, it is a very competitive model and impressive against the others. I give it 2 years maximum for consumer hardware to run models that are 500B - 800B quantized on their machines.
It should be obvious now why Anthropic really doesn't want you to run local models on your machine.
deaux 7 hours ago
snovv_crash 7 hours ago
colordrops 7 hours ago
creamyhorror 4 hours ago
No, the Deepseek V4 paper itself says that DS-V4-Pro-Max is close to Opus 4.5 in their staff evaluations, not better than 4.6:
> In our internal evaluation, DeepSeek-V4-Pro-Max outperforms Claude Sonnet 4.5 and approaches the level of Opus 4.5.
doctoboggan 7 hours ago
Is it honestly better than Opus 4.6 or just benchmaxxed? Have you done any coding with an agent harness using it?
If its coding abilities are better than Claude Code with Opus 4.6 then I will definitely be switching to this model.
bokkies 6 hours ago
Apparently glm5.1 and qwen coder latest is as good as opus 4.6 on benchmarks. So I tried both seriously for a week (glm Pro using CC) and qwen using qwen companion. Thought I could save $80 a month. Unfortunately after 2 days I had switched back to Max. The speed (slower on both although qwen is much faster) and errors (stupid layout mistakes, inserting 2 footers then refusing to remove one, not seeing obvious problems in screenshots & major f-ups of functionality), not being able to view URLs properly, etc. I'll give deepseek a go but I suspect it will be similar. The model is only half the story. Also been testing gpt5.4 with codex and it is very almost as good as CC... better on long running tasks running in background. Not keen on ChatGPT codex 'personality' so will stick to CC for the most part.
madagang 7 hours ago
Their Chinese announcement says that, based on internal employee testing, it is not as good as Opus 4.6 Thinking, but is slightly better than Opus 4.6 without Thinking enabled.
anentropic 27 minutes ago
mchusma 7 hours ago
ibic 6 hours ago
deaux 7 hours ago
NitpickLawyer 7 hours ago
> (better than Opus 4.6)
There we go again :) It seems we have a release each day claiming that. What's weird is that even deepseek doesn't claim it's better than opus w/ thinking. No idea why you'd say that but anyway.
Dsv3 was a good model. Not benchmaxxed at all, it was pretty stable where it was. Did well on tasks that were ood for benchmarks, even if it was behind SotA.
This seems to be similar. Behind SotA, but not by much, and at a much lower price. The big one is being served (by ds themselves now, more providers will come and we'll see the median price) at 1.74$ in / 3.48$ out / 0.14$ cache. Really cheap for what it offers.
The small one is at 0.14$ in / 0.28$ out / 0.028$ cache, which is pretty much "too cheap to matter". This will be what people can run realistically "at home", and should be a contender for things like haiku/gemini-flash, if it can deliver at those levels.
slopinthebag 6 hours ago
Anthropic fans would claim God itself is behind Opus by 3-6 months and then willingly be abused by Boris and one of his gaslighting tweets.
LMAO
NitpickLawyer 6 hours ago
bbor 7 hours ago
For the curious, I did some napkin math on their posted benchmarks and it racks up 20.1 percentage point difference across the 20 metrics where both were scored, for an average improvement of about 2% (non-pp). I really can't decide if that's mind blowing or boring?
Claude4.6 was almost 10pp better at at answering questions from long contexts ("corpuses" in CorpusQA and "multiround conversations" in MRCR), while DSv4 was a staggering 14pp better at one math challenge (IMOAnswerBench) and 12pp better at basic Q&A (SimpleQA-Verified).
Quasimarion 7 hours ago
FWIW it's also like 10x cheaper.
sergiotapia 8 hours ago
The dragon awakes yet again!
kindkang2024 7 hours ago
There appears a flight of dragons without heads. Good fortune.
That's literally what the I Ching calls "good fortune."
Competition, when no single dragon monopolizes the sky, brings fortune for all.
rapind 7 hours ago
Pop?
vinhnx 2 hours ago
The king is back! I remember vividly being very amazed and having a deep appreciation reading DeepSeek's reasoning on Chat.DeepSeek.com, even before the DeepSeek moment in January later that year. I can't quite remember the date, but it's the most profound moment I have ever had. After OpenAI O1, no other model has “reasoning” capability yet. And DeepSeek opens the full trace for us. Seeing DeepSeek's “wait, aha…” moments is something hard to describe. I learned strategy and reasoning skills for myself also. I am always rooting for them.
buenolot 2 hours ago
Instead of King DeepSeek we got DeepShit Clown
zargon 7 hours ago
The Flash version is 284B A13B in mixed FP8 / FP4 and the full native precision weights total approximately 154 GB. KV cache is said to take 10% as much space as V3. This looks very accessible for people running "large" local models. It's a nice follow up to the Gemma 4 and Qwen3.5 small local models.
regularfry 13 minutes ago
I'm going to blow my bandwidth allowance again this month, aren't I.
sbinnee 7 hours ago
Price is appealing to me. I have been using gemini 3 flash mainly for chat. I may give it a try.
input: $0.14/$0.28 (whereas gemini $0.5/$3)
Does anyone know why output prices have such a big gap?
girvo 5 hours ago
Output is what the compute is used for above all else; costs more hardware time basically than prompt processing (input) which is a lot faster
tokenmaxxinej 5 hours ago
input tokens are processed at 10-50 times the speed of output tokens since you can process then in batches and not one at a time like output tokens
zkmon 5 hours ago
They released 1.6 T pro base model on huggingface. First time I'm seeing a "T" model here.
mzl 4 hours ago
Kimi K2.5 and K2.6 are both >1T
quadruple 3 hours ago
In their paper, point 5.2.5 talks about their sandboxing platform(DeepSeek Elastic Compute). It seems like they have 4 different execution methods: function calls, container, microVM and fullVM.
This is a pretty interesting thing they've built in my opinion, and not something I'd expect to be buried in the model paper like this. Does anyone have any details about it? Google doesn't seem to find anything of note, and I'd love to dive a bit deeper into DSec.
sixhobbits 4 hours ago
I know people don't like Twitter links here but the main link just goes to their main docs site generic 'getting started' page.
The website now has a link to the announcement on Twitter here https://x.com/deepseek_ai/status/2047516922263285776
Copying text of that below
DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.
DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.
Try it now at http://chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today!
Tech Report: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
Open Weights: https://huggingface.co/collections/deepseek-ai/deepseek-v4
alpineman 4 hours ago
Just use xcancel by adding 'cancel' to the link
Imanari 5 hours ago
Just tested it via openrounter in the Pi Coding agent and it regularly fails to use the read and write tool correctly, very disappointing. Anyone know a fix besides prompting "always use the provided tools instead of writing your own call"
rane 4 hours ago
FWIW, works great in Claude Code.
https://api-docs.deepseek.com/guides/coding_agents#integrate...
abstracthinking 5 hours ago
They have just released it, give it some time, they probably haven't pretested it with Pi
Imanari 5 hours ago
How can they fix it after the release? They would have to retrain/finetune it further, no?
zargon 5 hours ago
mark33vh an hour ago
Yeah hope they fix this for PI
coderssh 5 hours ago
Feels like the real story here is cost/performance tradeoff rather than raw capability. Benchmarks keep moving incrementally, but efficiency gains like this actually change who can afford to build on top.
jessepcc 7 hours ago
At this point 'frontier model release' is a monthly cadence, Kimi 2.6 Claude 4.6 GPT 5.5, the interesting question is which evals will still be meaningful in 6 months.
mixtureoftakes 4 hours ago
more like weekly or almost daily, gpt 5.5 was literally 12 hours ago
simonw 7 hours ago
I like the pelican I got out of deepseek-v4-flash more than the one I got from deepseek-v4-pro.
https://simonwillison.net/2026/Apr/24/deepseek-v4/
Both generated using OpenRouter.
For comparison, here's what I got from DeepSeek 3.2 back in December: https://simonwillison.net/2025/Dec/1/deepseek-v32/
And DeepSeek 3.1 in August: https://simonwillison.net/2025/Aug/22/deepseek-31/
And DeepSeek v3-0324 in March last year: https://simonwillison.net/2025/Mar/24/deepseek/
JSR_FDED 7 hours ago
No way. The Pro pelican is fatter, has a customized front fork, and the sun is shining! He’s definitely living the best life.
chronogram 5 hours ago
The pro pelican is a work of art! It goes dimensions that no other LLM has gone before.
w4yai 7 hours ago
yeah. look at these 4 feathers (?) on his bum too.
oliver236 6 hours ago
a lot of dumplings
torginus 5 hours ago
This is just a random thought, but have you tried doing an 'agentic' pelican?
As in have the model consider its generated SVG, and gradually refine it, using its knowledge of the relative positions and proportions of the shapes generated, and have it spin for a while, and hopefully the end result will be better than just oneshotting it.
Or maybe going even one step further - most modern models have tool use and image recognition capabilities - what if you have it generate an SVG (or parts/layers of it, as per the model's discretion) and feed it back to itself via image recognition, and then improve on the result.
I think it'd be interesting to see, as for a lot of models, their oneshot capability in coding is not necessarily corellated with their in-harness ability, the latter which really matters.
simonw 5 hours ago
I tried that for the GPT-5 launch - a self-improving loop that renders the SVG, looks at it and tries again - and the results were surprisingly disappointing.
I should try it again with the more recent models.
torginus 3 hours ago
nickvec 7 hours ago
The Flash one is pretty impressive. Might be my favorite so far in the pelican-riding-a-bicycle series
murkt 6 hours ago
DeepSeek pelicans are the angriest pelicans I’ve seen so far.
kristopolous 6 hours ago
they're just late for work.
lazycatjumping 5 hours ago
996 Pelican, lol
mikae1 6 hours ago
Being a bicycle geometry nerd I always look at the bicycle first.
Let me tell you how much the Pro one sucks... It looks like failed Pedersen[1]. The rear wheel intersects with the bottom bracket, so it wouldn't even roll. Or rather, this bike couldn't exist.
The flash one looks surprisingly correct with some wild fork offset and the slackest of seat tubes. It's got some lowrider[2] aspirations with the small wheels, but with longer, Rivendellish[3], chainstays. The seat post has different angle than the seat tube, so good luck lowering that.
[1] https://en.wikipedia.org/wiki/Pedersen_bicycle
simonw 6 hours ago
This is an excellent comment. Thanks for this - I've only ever thought about whether the frame is the right shape, I never thought about how different illustrations might map to different bicycle categories.
mikae1 6 hours ago
jojobas 6 hours ago
The Pedersen looks like someone failed the "draw a bicycle" test and decided to adjust the universe.
catelm 6 hours ago
I think the pelican on a bike is known widely enough that of seizes to be useful as a benchmark. There is even a pelican briefly appearing in the promo video of GPT-5, if I'm not mistaken https://openai.com/gpt-5/. So the companies are apparently aware of it.
simonw 5 hours ago
It was a bigger deal in the Gemini 3.1 launch: https://x.com/JeffDean/status/2024525132266688757
brutal_chaos_ 6 hours ago
What was your prompt for the image? Apologies if this should be obvious.
shawn_w 6 hours ago
>Generate an SVG of a pelican riding a bicycle
at the top of the linked pages.
nsoonhui 6 hours ago
To me this is the perfect proof that
1) LLM is not AGI. Because surely if AGI it would imply that pro would do better than flash?
2) and because of the above, Pelican example is most likely already being benchmaxxed.
chvid 6 hours ago
Is it then Deepseek hosted by Deepseek?
How much does the drawing change if you ask it again?
ycui1986 7 hours ago
I really like the pro version. The pelican is so cute.
theanonymousone 6 hours ago
Where is the GPT 5.5 Pelican?
simonw 5 hours ago
culopatin 5 hours ago
In the 5.5 topic
lobochrome 6 hours ago
Why they so angry?
EnPissant 6 hours ago
This should not be the top comment on every model release post. It's getting tiring.
blitzar 5 hours ago
This should be the bottom comment on the pelican comment on every model release post.
EnPissant 5 hours ago
Aliabid94 7 hours ago
MMLU-Pro:
Gemini-3.1-Pro at 91.0
Opus-4.6 at 89.1
GPT-5.4, Kimi2.6, and DS-V4-Pro tied at 87.5
Pretty impressive
ant6n 7 hours ago
Funny how Gemini is theoretically the best -- but in practice all the bugs in the interface mean I don't want to use it anymore. The worst is it forgets context (and lies about it), but it's very unreliable at reading pdfs (and lies about it). There's also no branch, so once the context is lost/polluted, you have to start projects over and build up the context from scratch again.
spaceman_2020 5 hours ago
The sheer number of bugs and lack of meaningful improvements in Google products is a clear counterargument to the AI bull thesis
If AI was so good at coding, why can’t it actually make a usable Gemini/AI Studio app?
barnabee 4 hours ago
hodgehog11 2 hours ago
Most of these tests are one-prompt in nature. I've also noticed issues with the PDF reader in Gemini which was very frustrating, although it is significantly better now than it was even two weeks ago. On the contrary, now GPT-5 seems to be giving me issues.
In my experience, Gemini is the most insightful model for hard problems (particularly math problems that I work on).
lazycatjumping 5 hours ago
I gave up on Gemini 3.1 Pro in VSCode after 2 hours. They fully refunded me.
esperent 5 hours ago
Yeah if I could use Gemini with pi.dev that would be my choice. But Gemini CLI is just so, so bad.
rohanm93 6 hours ago
This is shockingly cheap for a near frontier model. This is insane.
For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.
I am uncomfortable about sending user data which may contain PII to their servers in China so I won't be using this as appealing as it sounds. I need this to come to a US-hosted environment at an equivalent price.
Hosting this on my own + renting GPUs is much more expensive than DeepSeek's quoted price, so not an option.
esperent 5 hours ago
> I am uncomfortable about sending user data which may contain PII to their servers in China
As a European I feel deeply uncomfortable about sending data to US companies where I know for sure that the government has access to it.
I also feel uncomfortable sending it to China.
If you'd asked me ten years ago which one made me more uncomfortable. China.
But now I'm not so sure, in fact I'm starting to lean towards the US as being the major risk.
fractalf 6 hours ago
Right now Im much more worried about sending data to the US and A.. At least theres a less chanse it will be missused against -me-
swiftcoder 4 hours ago
> For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.
It's doesn't seem all that out there compared to the other Chinese model price/performance? Kimi2.6 is cheaper even than this, and is pretty close in performance
rohanm93 3 hours ago
Kimi is indeed somewhat cheap for frontier-level intelligence, but still is $4-5 per mm tokens. Deep Seek is at least an order of magnitude cheaper.
swiftcoder 2 hours ago
gardnr 6 hours ago
865 GB: I am going to need a bigger GPU.
npodbielski 4 hours ago
Or several bigger GPUs! :)
lifeisstillgood 4 hours ago
On a seperate note, I am guessing that all the new models have announced in the space of a few days because the time to train a model is the same for each AI company.
Which strikes me as odd - Inwoukd have assumed someone had an edge in terms of at least 10% extra GPUs.
namenotrequired 3 hours ago
But why would they all start at the same time?
lifeisstillgood 3 hours ago
Because they all (if my memory serves) did this release at the same time thing last time. I have not looked into it but I am guessing that not letting one model pull ahead for a month means everyone keeps up - which implies the “stickiness” of any one model is a lot lower than we think
yanhangyhy 2 hours ago
somehow i canot open the link. but in their chinese version's release article, in the end ,there is a quote from xunzi(https://en.wikipedia.org/wiki/Xunzi_(philosopher))
"Not seduced by praise, not terrified by slander; following the Way in one's conduct, and rectifying oneself with dignity." (不诱于誉,不恐于诽,率道而行,端然正己)
(It is mainly used to express the way a Confucian gentleman conducts himself in the world. It reminds me of an interview I once watched with an American politician, who said that, at its core, China is still governed through a Confucian meritocratic elite system. It seems some things have never really changed.
In some respects, Liang Wenfeng can be compared to Linux. The political parallel here is that the advantages of rational authoritarianism are often overlooked because of the constraints imposed by modern democratic systems. )
Oxlamarr an hour ago
The speed of progress here is wild. It feels like the hard part is shifting from having access to a strong model to actually building trustworthy systems around it.
CJefferson 6 hours ago
What's the current best framework to have a 'claude code' like experience with Deepseek (or in general, an open-source model), if I wanted to play?
deaux 6 hours ago
TranquilMarmot 6 hours ago
whoopdeepoo 6 hours ago
You can use deepseek with Claude code
esperent 5 hours ago
You can, but does it work well? I assume CC has all kinds of Claude specific prompts in it, wouldn't you be better with a harness designed to be model agnostic like pi.dev or OpenCode?
rane 4 hours ago
Alifatisk 5 hours ago
You can use CC with other models, you aren’t forced to use Claude model.
0x142857 6 hours ago
claude-code-cli/opencode/codex
nba456_ an hour ago
Wow, never seen a post with so many comments posted overnight like this.
Grp1 2 hours ago
DeepSeek’s docs say V4 has a 1M context length. Is that actually usable in practice, or just the model/API limit?
Codex shows ~258k for me and Claude Code often shows ~200k, so I’m curious how DeepSeek is exposing such a large window.
lucrbvi 2 hours ago
They have added a lot of optimization focussing on the KV-cache, so they can have a much larger window without eating all the VRAM.
The 1M window might be usable, but it will probably underperform against a smaller window of course.
luyu_wu 8 hours ago
For those who didn't check the page yet, it just links to the API docs being updated with the upcoming models, not the actual model release.
talim 8 hours ago
Weights are on Huggingface FWIW. https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/tree/main
cmrdporcupine 7 hours ago
My submission here https://news.ycombinator.com/item?id=47885014 done at the same time was to the weights.
dang, probably the two should be merged and that be the link
culi 7 hours ago
there's no pinging. Someone's gotta email dang
bandrami 5 hours ago
I don't mind that High Flyer completely ripped off Anthropic to do this so much as I mind that they very obviously waited long enough for the GAB to add several dozen xz-level easter eggs to it.
cedws an hour ago
He who is a ripper off-er cannot be ripped off.
jdeng 7 hours ago
Excited that the long awaited v4 is finally out. But feel sad that it's not multimodal native.
storus 6 hours ago
Oh well, I should have bought 2x 512GB RAM MacStudios, not just one :(
aquir 5 hours ago
It is great! I asked the question what I always ask of new models ("what would Ian M Banks think about the current state of AI") and it gave me a brilliant answer! Funny enough the answer contained multiple criticisms of his own creators ("Chinese state entities", "Social Credit System").
fbrncci an hour ago
Take that Anthropic and your shenanigans.
thefounder 2 hours ago
They still don’t support json schema or batch api. It’s like deepseek does not want to make money
kiproping 2 hours ago
What do you currently use for json and batch, I was doing some analysis and my results show that gpt-oss-120b (non batch via openrotuer) is the best for now for my use case, better than gemini-flash models (batch on google). How is your experience?
dannyw 3 hours ago
Are there better providers for inferencing this right now? I know it's launch day, but openrouter showing 30tps isn't looking great.
yanis_t 4 hours ago
Is there a harness that is as good as cloud code that can be used with open weight models?
laurentiurad 2 hours ago
Try Opencode or Comrade. Both OSS and working great with OSS models too.
barnabee 4 hours ago
I prefer OpenCode over Claude Code, and it works with basically everything. Give it a try. ymmv
Numerlor 4 hours ago
I've liked Hermes agent, but never used Claude code so don't know how it compares
sixhobbits 4 hours ago
Try pi coding agent!
npodbielski 4 hours ago
Never used Claude myself but there are agents that can use local model. I.e. - Jetbrains Junie - Mistral Vibe
xnx 6 hours ago
Such different time now than early 2025 when people thought Deepaeek was going to kill the market for Nvidia.
antirez 3 hours ago
Actually the fact the inference of a SOTA model is completely Nvidia-free is the biggest attack to Nvidia every carried so far. Even American frontier AI labs may start to buy Chinese hardware if they need to continue the AI race, they can't keep paying so much money for the GPUs, especially once Huawei training versions of their GPUs will ship.
eunos an hour ago
That's like saying Raytheon would outsource building drones from Saheed makers (don't know who exactly).
Not gonna happen
Ifkaluva 5 hours ago
They might still kill the market for NVIDIA, if future releases prioritize Huawei chips
taosx 8 hours ago
clark1013 6 hours ago
Looking forward to DeepSeek Coding Plan
m_abdelfattah 5 hours ago
I came here to say the same :) !
jfxia 5 hours ago
Is V4 still not a multi-modal model?
vitorgrs 5 hours ago
Not yet... Which is a shame.
namegulf 7 hours ago
Is there a Quantized version of this?
mordae 4 hours ago
They have released mixed fp8/fp4 for efficiency. It's still hundreds of gigabytes, though. Give up on local for these.
JonChesterfield 3 hours ago
Anyone worked out how much hardware one needs to self host this one?
GuardCalf 3 hours ago
I like this. The more competitors there are, the more we the users benefit.
sibellavia 6 hours ago
A few hours after GPT5.5 is wild. Can’t wait to try it.
KaoruAoiShiho 7 hours ago
SOTA MRCR (or would've been a few hours earlier... beaten by 5.5), I've long thought of this as the most important non-agentic benchmark, so this is especially impressive. Beats Opus 4.7 here
apexalpha 5 hours ago
This FLash model might be affordable for OpenClaw. I run it on my mac 48gb ram now but it's slowish.
reenorap 7 hours ago
Which version fits in a Mac Studio M3 Ultra 512 GB?
simonw 7 hours ago
The Flash one should - it's 160GB on Hugging Face: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/tree/ma...
ycui1986 7 hours ago
So, dual RTX PRO 6000
swrrt 7 hours ago
Any visualised benchmark/scoreboard for comparison between latest models? DeepSeek v4 and GPT-5.5 seems to be ground breaking.
cztomsik 4 hours ago
So is this the first AI lab using MUON for their frontier model?
hodgehog11 4 hours ago
No, Muon was developed by Moonshot; they've been using it in their Kimi models since Kimi K2 in 2025.
cztomsik 28 minutes ago
Jordan Keller worked at Moonshot? Or am I missing something? I thought he is the original author. https://x.com/kellerjordan0/status/1842300916864844014
WhereIsTheTruth 5 hours ago
Interesting note:
"Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly."
So it's going to be even cheaper
aliljet 7 hours ago
How can you reasonably try to get near frontier (even at all tps) on hardware you own? Maybe under 5k in cost?
revolvingthrow 6 hours ago
For flash? 4 bit quant, 2x 96GB gpu (fast and expensive) or 1x 96GB gpu + 128GB ram (still expensive but probably usable, if you’re patient).
A mac with 256 GB memory would run it but be very slow, and so would be a 256GB ram + cheapo GPU desktop, unless you leave it running overnight.
The big model? Forget it, not this decade. You can theoretically load from SSD but waiting for the reply will be a religious experience.
Realistically the biggest models you can run on local-as-in-worth-buying-as-a-person hardware are between 120B and 200B, depending on how far you’re willing to go on quantization. Even this is fairly expensive, and that’s before RAM went to the moon.
zargon 6 hours ago
Flash is less than 160 GB. No need to quantize to fit in 2x 96 GB. Not sure how much context fits in 30 GB, but it should be a good amount.
redrove 6 hours ago
mordae 3 hours ago
Look at GB/s.
Strix halo has 256 GB/s bandwidth for $2500. The Flash model has 13 GB activations.
256 / 13 = 19.6 tokens per second
Except you cannot fit it into the maximum RAM of 128 GB Strix Halo supports. So move on.
Another option is Threadripper. That's 8 memory channels. Using older DDR4-3200 you get roughly 200 GB/s. For $2000.
200 / 13 = 15.4 tokens per second
But, a chunk of per-token weights is actually always the same and not MoE, so you would offload that to a GPU and get a decent speedup. Say 25 tokens per second total.
Then likely some expensive Mac. No idea.
Eventually you arrive at a mining rig chassis with a beefy board and multiple GPUs. That has the benefit of pipelining. You run part of the model on one GPU and move on, so another batch can start on the first one. Low (say 30-100) tps individually, but a lot more in parallel. Best get it with other people.
awakeasleep 7 hours ago
The same way you fit a bucket wheel excavator in your garage
floam 6 hours ago
Very carefully
zozbot234 5 hours ago
Run on an old HEDT platform with a lot of parallel attached storage (probably PCIe 4) and fetch weights from SSD. You'd ultimately be limited by the latency of these per-layer fetches, since MoE weights are small. You could reduce the latencies further by buying cheap Optane memory on the second-hand market.
datadrivenangel 6 hours ago
A loaded macbook pro can get you to the frontier from 24 months ago at ~10-40tok/s, which is plenty fast enough for regular chatting.
542458 6 hours ago
The low end could be something like an eBay-sourced server with a truckload of DDR3 ram doing all-cpu inference - secondhand server models with a terabyte of ram can be had for about 1.5K. The TPS will be absolute garbage and it will sound like a jet engine, but it will nominally run.
The flash version here is 284B A13B, so it might perform OK with a fairly small amount of VRAM for the active params and all regular ram for the other params, but I’d have to see benchmarks. If it turns out that works alright, an eBay server plus a 3090 might be the bang-for-buck champ for about $2.5K (assuming you’re starting from zero).
jdoe1337halo 7 hours ago
More like 500k
mariopt 7 hours ago
Does deepseek has any coding plan?
jeffzys8 6 hours ago
no
cl08 4 hours ago
Any way to connect this to claude code?
showmexyz 4 hours ago
As posted below https://api-docs.deepseek.com/guides/coding_agents#integrate...
mordae 4 hours ago
It's literally in the linked docs.
raincole 7 hours ago
History doesn't always repeat itself.
But if it does, then in the following week we'll see DeepSeek4 floods every AI-related online space. Thousands of posts swearing how it's better than the latest models OpenAI/Anthropic/Google have but only costs pennies.
Then a few weeks later it'll be forgotten by most.
sbysb 7 hours ago
It's difficult because even if the underlying model is very good, not having a pre-built harness like Claude Code makes it very un-sticky for most devs. Even at equal quality, the friction (or at least perceived friction) is higher than the mainstream models.
raincole 7 hours ago
OpenCode? Pi?
If one finds it difficult to set up OpenCode to use whatever providers they want, I won't call them 'dev'.
The only real friction (if the model is actually as good as SOTA) is to convince your employer to pay for it. But again if it really provides the same value at a fraction of the cost, it'll eventually cease to be an issue.
throwa356262 6 hours ago
2ndorderthought 16 minutes ago
You can literally run it from Claude code. Easily too
cmrdporcupine 7 hours ago
They have instructions right on their page on how to use claude code with it.
tcbrah 5 hours ago
giving meta a run for its money, esp when it was supposed to be the poster child for OSS models. deepseek is really overshadowing them rn
alpineman 3 hours ago
Meta is totally directionless
rvz 7 hours ago
The paper is here: [0]
Was expecting that the release would be this month [1], since everyone forgot about it and not reading the papers they were releasing and 7 days later here we have it.
One of the key points of this model to look at is the optimization that DeepSeek made with the residual design of the neural network architecture of the LLM, which is manifold-constrained hyper-connections (mHC) which is from this paper [2], which makes this possible to efficiently train it, especially with its hybrid attention mechanism designed for this.
There was not that much discussion around it some months ago here [3] about it but again this is a recommended read of the paper.
I wouldn't trust the benchmarks directly, but would wait for others to try it for themselves to see if it matches the performance of frontier models.
Either way, this is why Anthropic wants to ban open weight models and I cannot wait for the quantized versions to release momentarily.
[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
[1] https://news.ycombinator.com/item?id=47793880
jeswin 7 hours ago
> this is why Anthropic wants to ban open weight models
Do you have a source?
louiereederson 6 hours ago
More like he wants to ban accelerator chip sales to China, which may be about “national security” or self preservation against a different model for AI development which also happens to be an existential threat to Anthropic. Maybe those alternatives are actually one and the same to him.
cubefox 3 hours ago
Abstract of the technical report [1]:
> We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: (1) a hybrid attention architecture that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency; (2) Manifold-Constrained Hyper-Connections (mHC) that enhance conventional residual connections; (3) and the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline that unlocks and further enhances their capabilities. DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, redefines the state-of-the-art for open models, outperforming its predecessors in core tasks. Meanwhile, DeepSeek-V4 series are highly efficient in long-context scenarios. In the one-million-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. This enables us to routinely support one-million-token contexts, thereby making long-horizon tasks and further test-time scaling more feasible. The model checkpoints are available at https://huggingface.co/collections/deepseek-ai/deepseek-v4.
1: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
zurfer 4 hours ago
lots of great stuff, but the plot in the paper is just chart crime. different shades of gray for references where sometimes you see 4 models and sometimes 3.
sergiotapia 6 hours ago
Using it with opencode sometimes it generates commands like:
bash({"command":"gh pr create --title "Improve Calendar module docs and clean up idiomatic Elixir" --body "$(cat <<'EOF'
Problem
The Calendar modu...
like generating output, but not actually running the bash command so not creating the PR ultimately. I wonder if it's a model thing, or an opencode thing.casey2 3 hours ago
Already over a billion tokens on open router in under 5 hours
tariky 6 hours ago
Anyone tried with make web UI with it? How good is it? For me opus is only worth because of it.
augment_me 5 hours ago
Amaze amaze amaze
ls612 7 hours ago
How long does it usually take for folks to make smaller distills of these models? I really want to see how this will do when brought down to a size that will run on a Macbook.
simonw 7 hours ago
Unsloth often turn them around within a few hours, they might have gone to bed already though!
Keep an eye on https://huggingface.co/unsloth/models
Update ten minutes later: https://huggingface.co/unsloth/DeepSeek-V4-Pro just appeared but doesn't have files in yet, so they are clearly awake and pushing updates.
mohsen1 5 hours ago
"2 minutes ago" https://huggingface.co/unsloth/DeepSeek-V4-Pro
EnPissant 6 hours ago
Those are quants, not distills.
inventor7777 7 hours ago
Weren't there some frameworks recently released to allow Macs to stream weights from fast SSDs and thus fit way more parameters than what would normally fit in RAM?
I have never tried one yet but I am considering trying that for a medium sized model.
simonw 7 hours ago
I've been calling that the "streaming experts" trick, the key idea is to take advantage of Mixture of Expert models where only a subset of the weights are used for each round of calculations, then load those weights from SSD into RAM for each round.
As I understand it if DeepSeek v4 Pro is a 1.6T, 49B active that means you'd need just 49B in memory, so ~100GB at 16 bit or ~50GB at 8bit quantized.
v4 Flash is 284B, 13B active so might even fit in <32GB.
zozbot234 6 hours ago
zargon 6 hours ago
inventor7777 7 hours ago
EnPissant 5 hours ago
zozbot234 6 hours ago
These are more like experiments than a polished release as of yet. And the reduction in throughput is high compared to having the weights in RAM at all times, since you're bottlenecked by the SSD which even at its fastest is much slower than RAM.
the_sleaze_ 7 hours ago
Do you have the links for those? Very interested
inventor7777 7 hours ago
gigatexal 5 hours ago
Has anyone used it? How does it compare to gpt 5.5 or opus 4.7?
coolThingsFirst 5 hours ago
I got an API key without credit card details I didn’t know they had a free plan.
luew 6 hours ago
We will be hosting it soon at getlilac.com!
punkpeye 6 hours ago
Incredible model quality to price ratio
donbreo 4 hours ago
Aaaand it cant still name all the states in India,or say what happened in 1989
mordae 4 hours ago
Ask Claude how to overthrow a Nazi dictatorship in the US.
frozenseven 7 hours ago
hongbo_zhang 7 hours ago
congrats
dhruv3006 6 hours ago
Ah now !
shafiemoji 7 hours ago
I hope the update is an improvement. Losing 3.2 would be a real loss, it's excellent.