Mistral AI Releases Forge (mistral.ai)
687 points by pember a day ago
kioleanu 11 hours ago
I like Mistral, it hits the exact sweet spot between cost and my data staying in the EU, withouth a significant drop in quality, but man are their model naming conventions confusing af. They mention they have a model called Devstral 2, which is neither Codestral nor Devestral. I want to use it, but the api only lists devstral-2512, devstral-latest, devstral-medium-latest, devstral-medium-2507, devstral-small, devstral-small-2507.
I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.
IanCal 9 hours ago
I got really lost on their site, but to help a bit according to their model page
devstral-2512 devstral-latest and devstral-medium-latest are all devstral 2 https://docs.mistral.ai/models/devstral-2-25-12
labs-devstral-small-2512 and devstral-small-latest are devstral small 2
devstral-medium-2507 is devstral 1.0
and devstral-small-2507 is devstral small 1.1
kioleanu 9 hours ago
wow, thank you, this is great. I was thinking they should have a page like this, but I couldn't find myself.
Manfred 10 hours ago
I had the same experience. It's even more confusing when you want to create an API key because they are separated by product, maybe?
kioleanu 10 hours ago
no, the key is actually universal, you can't choose a specific product
lis 8 hours ago
newswasboring 9 hours ago
I have a general impression they are not interested too much in individual devs and making it suite their workflow. They want to be a B2B company and deliver a custom workflow per company.
Or it can just be a Google like problem where a big company one part doesn't talk to the other.
soco 9 hours ago
But wouldn't winning devs be a neat helping point in winning b2b contacts? Or they think golf courts are enough for success? Okay they might be right here, but still they make it so confusing for no obvious reason.
MidnightRider39 8 hours ago
lelanthran 7 hours ago
R0m41nJosh 3 hours ago
philipallstar 7 hours ago
newswasboring 9 hours ago
kioleanu 9 hours ago
you might be correct. for example, they have an intellij plugin that allows integration without the AI Assistant, but it is only available for Enterprise customers
butILoveLife 7 hours ago
>data staying in the EU
This is really why Mistral has any support.
The models are bottom barrel, but its the best Europe has...
Although you could use Chinese models on European servers.
ogou 14 hours ago
Don't sleep on Mistral. Highly underrated as a general service LLM. Cheaper, too. Their emphasis on bespoke modelling over generalized megaliths will pay off. There are all kinds of specialized datasets and restricted access stores that can benefit from their approach. Especially in highly regulated EU.
Not everyone is obsessed with code generation. There is a whole world out there.
lelanthran 11 hours ago
I also think that this is the best approach for businesses wanting to adopt AI to automate, streamline, etc their business.
The problem they have is that this is not a moat - their approach is easily reproducible.
If they can pull ahead in having the most number of pre-trained models (one for this ERP, one for that CRM, etc) and then being able to close sales to companies using these products and sell them on post-trained (give us your specific ERP customisations and we'll give you access to a model that is tailored to your business), then THAT is a moat.
But they need to do this without fanfare. Just close sales, and keep closing, basically. After all, even if other AI providers copy the process, the moat would already have been established for Mistral.
Lapel2742 10 hours ago
> The problem they have is that this is not a moat - their approach is easily reproducible.
My 2ct: Currently the moat may be that they are not US-American which is not reproducible by any of the US alternatives.
lelanthran 9 hours ago
Bombthecat 4 hours ago
drstewart 9 hours ago
soco 9 hours ago
erispoe 8 hours ago
Except the evidence today rather points to SOTA model + harness than fine tuned models.
lelanthran 7 hours ago
srivmo 13 hours ago
> Their emphasis on bespoke modelling over generalized megaliths will pay off.
Isn't the entire deal with LLMs that they are trained as megaliths? How can bespoke modelling overcome the treasure trove of knowledge that megaliths can generically bring in, even in bespoke scenarios?
wodenokoto 10 hours ago
ChatGPT is already a small agent that receives your message and decides which agent needs to respond. Within those, agents can have sub agents (like when it does research).
When generating images most services will have a small agent that rewrites your request and hands it off to the generative image model.
So from the treasure trove point of view, optimized agents have their place. From companies building pipelines, they also have their place.
TeMPOraL 9 hours ago
lelanthran 11 hours ago
> Isn't the entire deal with LLMs that they are trained as megaliths? How can bespoke modelling overcome the treasure trove of knowledge that megaliths can generically bring in, even in bespoke scenarios?
Think of it as a base model (the megalith) which then has the weights adjusted towards a specific use-case (SAP, for example).
Bombthecat 4 hours ago
The companies I work want onprem models, and no Chinese ones. Does mistral support onprem? ( For a price)
Stromgren 12 hours ago
Agreed. I’ve used their platform to train smaller, specialized models. Something I could have done in Codelab or some other tool, but their platform allows me to just upload a training set and as soon as it finishes I have a hosted model available at an endpoint. It obviously has some constraints compared to running the training yourself, but it also opens up the opportunity to way more people.
isodev 13 hours ago
Indeed, but even for coding use cases, Vibe is more of a focused “refactor/ write this function” aid than “write me an app” and it can work locally. For me that’s a lot more valuable as an accelerator to my workflow where the developer stays in control and fully involved in the process.
haraldooo 13 hours ago
I agree. Just started using it. Can you give some examples of fields you maybe even prefer Mistral?
Forgeties79 5 hours ago
I use a pretty lightweight local Mistral model in LM studio for both creative and technical writing/iterating and it’s fantastic.
spiderfarmer 12 hours ago
Yes, since it's not American, it will be the de-facto choice for most big European companies.
jstummbillig 12 hours ago
Why would that be? Most big EU companies use ms teams or google workspace, for example.
schubidubiduba 11 hours ago
umeridrisi 12 hours ago
Is this the best Grok alternative?
spiderfarmer 12 hours ago
Any model is.
grosswait 7 hours ago
butILoveLife 7 hours ago
If you couldn't use the words Europe to describe why you'd chose Mistral, you'd have no good reasons to choose Mistral.
Its just not good. Its bottom floor for LLMs.
danelski 5 hours ago
> Its bottom floor for LLMs.
What? That's just demonstrably false. The market doesn't consist of 5 providers.
butILoveLife 4 hours ago
mark_l_watson 19 hours ago
I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.
ChrisGreenHeur 13 hours ago
I found it to be the best model if you want to talk about topics philosophical. It has no problems going deep and technical while other models tend to be afraid of overshooting the comprehension of the reader.
jerrygoyal 17 hours ago
their ocr model is goated
SyneRyder 12 hours ago
Did they make significant improvements in OCR 3? The quality I was getting from Mistral OCR 2 was nowhere near as good as what I could get from just sending the same files to Claude Sonnet via an API call.
I have been finding Voxtral useful though.
oakpond 11 hours ago
probably yes. considering that even some of their non-ocr models can recognize my shitty handwritten math
stavros 16 hours ago
Better than Qwen? I guess the best overall is Gemini, right?
ph4rsikal 15 hours ago
thefounder 14 hours ago
nicman23 13 hours ago
also offering support for local deployments
w4yai 18 hours ago
Go Mistral !
doctorpangloss 15 hours ago
first, there was .ai
next, it sounds like it's going to be .eu
but what about ai.eu
fnord123 9 hours ago
> but what about ai.eu
oh, .. why?
upghost 16 hours ago
> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.
> Post-training methods allow teams to refine model behavior for specific tasks and environments.
How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?
There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.
qntty 7 hours ago
Pre-training mean exposing an already-trained model to more raw text like PDF extracts etc (aka continued pre-training). You wouldn't be starting from scratch, but it's still pre-training because the objective is just next token prediction of the text you expose it to.
Post-training means everything else: SFT, DPO, RL, etc. Anything that involves things like prompt/response pairs, reward models, or benefits from human feedback of any kind.
losvedir 7 hours ago
Er, then what is the "already trained" model? I thought pre-training was the gradient descent through the internet part of building foundational models.
mirekrusin 14 hours ago
Probably marketing speak for full fine-tuning vs PEFT/LoRA.
lelanthran 12 hours ago
I would guess:
Pre-training: refining the weights in an existing model using more training data.
Post-training: Adding some training data to the prompt (RAG, basically).
anon373839 16 hours ago
I think they are referring to “continued pretraining”.
stingraycharles 16 hours ago
I can imagine that, as usual, you start with a few examples and then instruct an LLM to synthesize more examples out of that, and train using that. Sounds horrible, but actually works fairly well in practice.
gunalx 12 hours ago
Probably just means SFT fine-tuning a base model, vs behavioural dpo and/or SFT fine-tuning a instruction model.
jcmartinezdev 9 hours ago
Mistral is doing some really great stuff lately. Sure, it's hard to compete with OpenAI and Anthropic and their models, but they are taking up some interesting takes and designing their product in unique ways.
I like a lot what they are doing and I'll be watching them a lot more closely. I'd love to work for them btw!
roxolotl 20 hours ago
Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so we’d probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.
ryeguy_24 17 hours ago
How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.
troyvit 4 hours ago
I'm thinking stuff like this:
https://denverite.com/2026/03/12/ai-recycling-facility-comme...
You could take a model like the one referenced in the article, retool it with Forge for oh I don't know, compost, and use it to flag batches that contain too much paper for instance.
These kinds of applications would work across industries, basically anywhere where you have a documented process and can stand to have automated oversight.
mirekrusin 14 hours ago
You can fine tune small, very fast and cheap to run specialized models ie. to react to logs, tool use and domain knowledge, possibly removing network llm comms altogether etc.
Shitty-kitty 15 hours ago
rag basically gives the llm a bunch of documents to search thru for the answer. What it doesn't do is make the algorithm any better. pre-training and fine-tunning improve the llm abaility to reason about your task.
baby 17 hours ago
RAG is dead
charcircuit 17 hours ago
Using tools and skills to retrieve data or files is anything but dead.
nathanappere 12 hours ago
CharlesW 17 hours ago
And yet your blog says you think NFTs are alive. Curious.
But seriously, RAG/retrieval is thriving. It'll be part of the mix alongside long context, reranking, and tool-based context assembly for the forseeable future.
WesleyJohnson an hour ago
nl 15 hours ago
prophesi 11 hours ago
elicash 16 hours ago
strongly-typed 17 hours ago
loeg 17 hours ago
Is it??
bigyabai 17 hours ago
In what, X's hype circles? Embeddings are used in production constantly.
dmix 18 hours ago
This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.
dash2 13 hours ago
I think it’s interesting what this approach suggests about who will profit from AI. I’m sceptical that having huge numbers of GPUs is a moat. After all, real humans – even geniuses – are trained on much much less data than the whole Internet. But proprietary and specialised data could very well be a moat. It’s hard to train a scientist/lawyer/analyst without reading a lot of science/law/finance. Companies’ proprietary data might encode a great deal of irreplaceable knowledge. Seems as if Mistral is taking this bet.
copirate 10 hours ago
> After all, real humans – even geniuses – are trained on much much less data than the whole Internet.
It's certainly different data, but one could argue that real humans have been trained on 3.5 billion years of evolution data.
losvedir 7 hours ago
> Forge enables enterprises to build models that internalize their domain knowledge. Organizations can train models on large volumes of internal documentation, codebases, structured data, and operational records. During training, the model learns the vocabulary, reasoning patterns, and constraints that define that environment.
I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.
mikodin 4 hours ago
I was under this impression as well - I'd love to hear from someone who's deeper in the know about this!
csunoser 20 hours ago
Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.
todteera 10 hours ago
Interesting how Mistral is investing into training models for industry specific use cases. With the commoditization of intelligence by base models, they're probably looking to creating value from specialized verticals.
jbverschoor 12 hours ago
ASML and ESA as clients means something. I dont expect to see the first name somewhere else on the logo list
alansaber 6 hours ago
I find the mistral "middle" between small LMs /1T LMs compelling. Models that are sufficiently big to be performant but specialised for domains and tasks- this is what I assumed we'd always head towards.
andai 18 hours ago
They mention pretraining too, which surprises me. I thought that was prohibitively expensive?
It's feasible for small models but, I thought small models were not reliable for factual information?
simsla 16 hours ago
Typical stages of training for these models are:
Foundational:
- Pretraining - Mid/post-training (SFT) - RLHF or alignment post-training (RL)
And sometimes...
- Some more customer-specific fine-tuning.
Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely.
vincentbusch an hour ago
lol the AI-generated support reply about their own AI model is peak 2026
the naming mess is wild though. i ran into similar confusion trying to set up mistral for a side project — ended up just guessing which endpoint was the right one
zby 13 hours ago
My bet is that the solution to continuous learning is with external storage. There is a lot of talk about context engineering - but I have not seen anyone taking context as the main bottleneck and building a system around that. This would show that even context engineering is kind of wrong term - because context does not enter the llm in some mysterious way - it goes through prompt and the whole model of passing chat history back and forth is not the most efficient way of using the prompt limitation.
mhl47 12 hours ago
"External Storage" whatever that is can not be the same as continous learning as it does not have the strong connections/capture the interdepencies of knowledge.
That said I think we will see more efforts also on the business side to have models that can help you build a knowledge base in some kind of standardized way that the model is trained to read. Or synthesize some sort on instructions how to navigate your knowledge base.
Currently e.g. Copilot tries to navigate a hot mess of a MS knowledge graph that is very different for each company. And due to its amnesia it has to repeat the discovery in every session. No wonder that does not work. We have to either standardize or store somewhere (model, instructions) how to find information efficiently.
zby 11 hours ago
The key to make Copilot useful is to take the limited context problem seriously enough. There are many dimensions to it: https://zby.github.io/commonplace/notes/context-efficiency-i... and it should be the starting point for designing the systems that extensively use llms.
Centigonal 13 hours ago
What do you mean when you say "external storage?"
zby 11 hours ago
A knowledge base - something where the LLM knows how to find the knowledge it needs for a given task. I am working on this idea in https://zby.github.io/commonplace/
ithkuil 11 hours ago
A form of context engineering
hermit_dev 16 hours ago
The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.
reverius42 14 hours ago
Ironically that was also the past of AI. In 2016 it was all about specialized models (not just training data, everything including architecture and model class/type) for specific tasks and that's the way things had been for a long time.
Are you suggesting that it's an aberration that from ~2019 to ~2026 the AI field has been working on general intelligence (I assume this is what you mean by "achieving benevolent knowledge")?
Personally I think it's remarkable how much a simple transformer model can do when scaled up in size. LLMs are an incredible feat of generalization. I don't see why the trajectory should change back towards specialization now.
holoduke 13 hours ago
I don't think that's true. Nothing points to specialized LLMs being better. General purpose LLMs are just much more useful in daily work.
hermit_dev 6 hours ago
To be more specific, I think the future is local and specialized. IBM among others thought the same way with their giant mainframe centralized computers and the original way people would utilize software in the 70s. It's an interesting parallel to today's cloud if you think about it. It's just not scalable from a resource (hardware), energy, and cost perspective. I think we're living a unique time, but it's going to change. Without continued massive funding and a pivot to sustainable, things will (and should) change.
Don't get me wrong, general intelligence will always be important and should be a part of specialist models to a degree for understanding, but it doesn't make sense to use an 800B+ parameter model to help write an email or do research on company trends. Hell, look at what China has been able to do. Qwen 3.5 9B, exceeds Claude 3.5 Haiku and nears Sonnet 3.5 levels. The 27B variation of Qwen 3.5 is superior to both in many ways and even rivals newer models. There is obviously an inherit lag behind, but we will gradually see a shift as these models become more capable.
Right now we are chasing 1-2% improvements at the cost of billions. Local are already absurdly capable (more and more by the day - same with cloud ofcourse) and smarter than most people in specific areas. To do most jobs, can we honestly say it requires a PhD or higher level understanding to perform? We're chasing something that is becoming more and more not needed from a general day to day perspective. AGI is outstanding, but not practical (at least today). I think we'll get there anyway at our current trajectory (though dangerous), but I suspect things will shift.
tho23i42342397 9 hours ago
Interesting. Does this actually scale though ? I've never seen enterprises which have "internal knowledge" in proper readable form - it's often in code, and more importantly in people who wrote them.
I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.
Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).
Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.
bob001 5 hours ago
My sense is that it sounds amazing in theory to executives who have never had to themselves look at internal data. In reality the internal knowledge base is a mix of incomplete, inaccurate, self serving lies, out of date and so on. At worst, the data is explicitly biased to hide reality from executives so the AI will look extra good to executives. Of course, a business that makes all tactical decisions based on lies is not going to do well.
rorylawless 18 hours ago
The fine tuning endpoint is deprecated according to the API docs. Is this the replacement?
aavci 17 hours ago
Interesting to see. I thought they were promoting fine tuning
thecopy 10 hours ago
Looks interesting. But how to explore or test or use? The product page (https://mistral.ai/products/forge) also does not contain anything useful. Just "Contact us"
Dissapointing.
Aldipower 11 hours ago
I cannot keep up with their products, model names and releases. What is what for? Their marketing texts do not make sense for me. Is there a nice overview somewhere?
I am a simple stupid Le Chat user with a small mind and the Tredict MCP Server connected to it (to Le Chat, not my mind), which works ok-ish. :-)
speedgoose 13 hours ago
I was enthusiastic but it’s "contact us" priced for now. I was expecting a classic cloud LLM forge with a public pricing.
apexalpha 9 hours ago
This looks good but how much money are we talking here? Are we 'retraining' an entire model but adding enterprise data to the public data set?
whatever1 13 hours ago
I thought that for pretraining to work and reasoning to emerge you need internet scale data. How can forge achieve it with just internal company data (unless the said company is AT&T or something) ?
krinne 11 hours ago
I wasnt able to find a way to access this - is this something accessible only to enterprises ?
Would love to take it for a spin, if that is even possible.
Havoc 7 hours ago
Good for them. Really hope they find market fit
spacesh1psoda 10 hours ago
Go EU!
aavci 17 hours ago
How does this compare to fine tuning?
Otterly99 11 hours ago
It seems to me that it is broadly the same thing, except they give you the resources to do it and expert knowledge.
burgerquizz 10 hours ago
can i use mistral to read my source code and teach it so i don't need to inject the whole doc every single time and consume token every single time?
supernes 14 hours ago
> Code agents are becoming the primary users of developer tools, so we built Forge for them first, not
... for humans.
bsjshshsb 18 hours ago
Id training or FT > context? Anyone have experience.
Is it possible to retrain daily or hourly as info changes?
dragochat 7 hours ago
where sample notebook/script? where github? where signup?
...learn a thing or two from NVIDIA or gtfo
troyvit 4 hours ago
lol
> Mistral AI has already partnered with world-leading organizations, like ASML, DSO National Laboratories Singapore, Ericsson, European Space Agency, Home Team Science and Technology Agency (HTX) Singapore, and Reply to train models on the proprietary data that powers their most complex systems and future-defining technologies.
When you can actually represent somebody like the ESA get in touch with them. Otherwise, uh, gtfo.