Mistral AI launches Forge to help companies build proprietary AI models, challenging cloud giants
Mistral AI Unveils Forge: A Bold Challenge to Cloud Giants in the Battle for Enterprise AI Control
In a week marked by aggressive expansion and strategic positioning, Mistral AI has launched Forge, a groundbreaking enterprise model training platform that could fundamentally reshape how organizations build and deploy artificial intelligence. The French AI company’s latest offering represents far more than just another fine-tuning tool—it’s a direct challenge to Amazon, Microsoft, and Google’s dominance in enterprise AI infrastructure.
The Enterprise AI Paradigm Shift
For the past two years, enterprise AI adoption has followed a predictable pattern: companies select a general-purpose model from OpenAI, Anthropic, Google, or an open-source provider, then apply lightweight fine-tuning through cloud APIs to adjust the model’s behavior for specific tasks. This approach has worked well for proof-of-concept deployments and many production use cases.
But Mistral AI is arguing that this model has reached its limits.
“We had a fine-tuning API relying on supervised fine-tuning. I think it was kind of what was the standard a couple of months ago,” said Elisa Salamanca, head of product at Mistral AI, in an exclusive interview with VentureBeat. “It gets you to a proof-of-concept state. Whenever you actually want to have the performance that you’re targeting, you need to go beyond.”
Forge: Beyond Fine-Tuning to Full Model Training
Forge goes significantly beyond the fine-tuning APIs that Mistral and its competitors have offered for the past year. The platform supports the full model training lifecycle: pre-training on large internal datasets, post-training through supervised fine-tuning, DPO (Direct Preference Optimization), and ODPO (Online Direct Preference Optimization), and critically, reinforcement learning pipelines designed to align models with internal policies, evaluation criteria, and operational objectives over time.
The platform packages the training methodology that Mistral’s own AI scientists use internally to build the company’s flagship models—including data mixing strategies, data generation pipelines, distributed computing optimizations, and battle-tested training recipes.
“There’s no platform out there that provides you real-world training recipes that work,” Salamanca said. “Other open-source repositories or other tools can give you generic configurations or community tutorials, but they don’t give you the recipe that’s been validated—that we’ve been doing for all of our flagship models today.”
Real-World Applications That Off-the-Shelf AI Can’t Handle
The obvious question facing any product like Forge is demand. In a market where GPT-5, Claude, Gemini, and a growing fleet of open-source models can handle an enormous range of tasks, why would an enterprise invest the time, compute, and expertise required to train its own model from scratch?
Salamanca acknowledged the question head-on but argued that the need emerges quickly once companies move beyond generic use cases.
“A lot of the existing models can get you very far,” she said. “But when you’re looking at what’s going to make you competitive compared to your competition—everyone can adopt and use the models that are out there. When you want to go a step beyond that, you actually need to create your own models. You need to leverage your proprietary information.”
The real-world examples she cited illustrate the edges of the current model ecosystem. In one case, Mistral worked with a public institution that had ancient manuscripts with missing text from damaged sections. “The models that were available were not able to do this because they’ve never seen the data,” Salamanca explained. “Digitization was not very good. There were some unique patterns and characters, and so we actually created a model for them to fill in the spans.”
In another engagement, Mistral partnered with Ericsson to customize its Codestral model for legacy-to-modern code translation. Ericsson has built up half a decade of proprietary knowledge around an internal calling language—a codebase so specialized that no off-the-shelf model has ever encountered it.
Perhaps the most telling example involves hedge funds. Salamanca described working with financial firms to customize models for proprietary quantitative languages—the kind of deeply guarded intellectual property that these firms keep on-premises and never expose to cloud-hosted AI services.
The Business Model: License Fees, Data Pipelines, and Embedded AI Scientists
Forge’s business model reflects the complexity of enterprise model training. According to Salamanca, it operates across several revenue streams. For customers who run training jobs on their own GPU clusters—a common requirement in highly regulated or IP-sensitive industries—Mistral does not charge for compute. Instead, the company charges a license fee for the Forge platform itself, along with optional fees for data pipeline services and what Mistral calls “forward-deployed scientists”—embedded AI researchers who work alongside the customer’s team.
“No competitor out there today is kind of selling this embedded scientist as part of their training platform offering,” Salamanca said.
This model has clear echoes of Palantir’s early playbook, where forward-deployed engineers served as the critical bridge between powerful software and the messy reality of enterprise data. It also suggests that Mistral recognizes a fundamental truth about the current state of enterprise AI: the technology alone is not enough. Most organizations lack the internal expertise to design effective training recipes, curate data at scale, or navigate the treacherous optimization landscape of distributed GPU training.
Data Privacy as a Competitive Advantage
One of the sharpest points of differentiation Mistral is pressing with Forge is data privacy. When customers train on their own infrastructure, Salamanca emphasized that Mistral never sees the data at all.
“It’s on their clusters, it’s with their data—we don’t see anything of it, and so it’s completely under their control,” she said. “I think this is something that sets us apart from the competition, where you actually need to upload your data, and you have a black box effect.”
This matters enormously in sectors like defense, intelligence, financial services, and healthcare, where the legal and reputational risks of exposing proprietary data to a third-party cloud service can be deal-breakers.
The Agent-First Future: Custom Models Still Matter
The timing of Forge’s launch raises an important strategic question. The AI industry in 2026 has been consumed by agents—autonomous AI systems that can use tools, navigate multi-step workflows, and take actions on behalf of users. If the future belongs to agents, why does the underlying model matter? Can’t companies simply plug into the best available frontier model through an MCP server or API and focus their energy on orchestration?
Salamanca pushed back on this framing with conviction. “The customers that we’ve been working on—some of these specific problems are things that no MCP server would ever solve,” she said. “You actually need that intelligence. You actually need to create that model that will help you solve your most critical business problem.”
She also argued that model customization is essential even in purely agentic architectures. “There are some agentic behaviors that you need to bring to the model,” Salamanca said. “It can be about reasoning patterns, specific types of documentation, making sure that you have the right reasoning traces. Even in these cases where people are going completely agentic, you still need model customization—like reinforcement learning techniques—to actually get the right level of performance.”
A Week of Aggressive Expansion
To fully appreciate Forge’s significance, it helps to view it alongside the other announcements Mistral made in the same week—a barrage of releases that together represent the most ambitious expansion in the company’s short history.
Just yesterday, Mistral released Leanstral, the first open-source code agent for Lean 4, the proof assistant used in formal mathematics and software verification. Leanstral operates with just 6 billion active parameters and is designed for realistic formal repositories—not isolated math competition problems.
On the same day, Mistral launched Mistral Small 4, a mixture-of-experts model with 119 billion total parameters but only 6 billion active per query, running 40 percent faster than its predecessor while handling three times more queries per second. Both models ship under the Apache 2.0 license—the most permissive open-source license in wide use.
And then there is the Nvidia Nemotron Coalition. Announced at Nvidia’s GTC conference, the coalition is a first-of-its-kind collaboration between Nvidia and a group of AI labs—including Mistral, Perplexity, LangChain, Cursor, Black Forest Labs, Reflection AI, Sarvam, and Thinking Machines Lab—to co-develop open frontier models.
“Open frontier models are how AI becomes a true platform,” said Arthur Mensch, cofounder and CEO of Mistral AI, in Nvidia’s announcement. “Together with Nvidia, we will take a leading role in training and advancing frontier models at scale.”
Taking Aim at the Cloud Giants
Forge enters a market that is already crowded—at least on the surface. Amazon Bedrock, Microsoft Azure AI Foundry, and Google Cloud Vertex AI all offer model training and customization capabilities. But Salamanca argued that these offerings are fundamentally limited in two respects.
First, they are cloud-only. “In one set of cases, it’s very easy to answer—they want to run this on their premises, and so all these tools that are available on the cloud are just not available for them,” Salamanca said.
Second, she argued that the hyperscalers’ training tools largely offer simplified API interfaces that don’t provide the depth of control that serious model training requires.
There is also the dependency question. Salamanca described digital-native companies that had built products on top of closed-source models, only to have a new model release—more verbose than its predecessor—crash their production pipelines. “When you’re relying on closed-source models, you are also super dependent on the updates of the model that have side effects,” she warned.
The Talent Competition and Institutional Knowledge
The timing of Forge’s launch also arrives against a backdrop of fierce talent competition. As FinTech Weekly reported on March 14, Devendra Singh Chaplot—a co-founder of Mistral AI who headed the company’s multimodal group and contributed to training Mistral 7B, Mixtral 8x7B, and Mistral Large—left to join Elon Musk’s xAI, where he will work on Grok model training.
The loss of a co-founder is never insignificant, but Mistral appears to be compensating with institutional capability rather than individual brilliance. Forge is, in essence, a productization of the company’s collective training expertise—the recipes, the pipelines, the distributed computing optimizations—in a form that can scale beyond any single researcher.
Mistral’s Big Bet: The Companies That Own Their AI Models Will Be the Ones That Win
Forge is a bet on a specific theory of the enterprise AI future: that the most valuable AI systems will be those trained on proprietary knowledge, governed by internal policies, and operated under the organization’s direct control. This stands in contrast to the prevailing paradigm of the past two years, in which enterprises have largely consumed AI as a cloud service—powerful but generic, convenient but uncontrolled.
The question is whether enough enterprises will be willing to make the investment. Model training is expensive, technically demanding, and requires sustained organizational commitment. Forge lowers the barriers—through its infrastructure automation, its battle-tested recipes, and its embedded scientists—but it does not eliminate them.
What Mistral appears to be banking on is that the organizations with the most to gain from AI—the ones sitting on decades of proprietary knowledge in highly specialized domains—are precisely the ones for whom generic models are least sufficient. These are the companies where the gap between what a general-purpose model can do and what the business actually needs is widest, and where the competitive advantage of closing that gap is greatest.
Forge supports both dense and mixture-of-experts architectures, accommodating different trade-offs between performance, cost, and operational constraints. It handles multimodal inputs. It is designed for continuous adaptation rather than one-time training, with built-in evaluation frameworks that let enterprises test models against internal benchmarks before production deployment.
For the past two years, the enterprise AI playbook has been straightforward: pick a model, call an API, ship a feature. Mistral is now asking a harder question—whether the organizations willing to do the difficult, expensive, unglamorous work of training their own models will end up with something the API-callers never get.
An unfair advantage.
Tags
Mistral AI, Forge, enterprise AI, model training, fine-tuning, AI infrastructure, data privacy, open source, Nvidia Nemotron Coalition, Mistral Small 4, Leanstral, enterprise model training, AI customization, forward-deployed scientists, data pipelines, reinforcement learning, DPO, ODPO, mixture-of-experts, on-premises AI, AI sovereignty, competitive advantage
Viral Sentences
- Forge is Mistral’s model training platform that lets enterprises and governments customize AI models for their specific needs
- We had a fine-tuning API… it gets you to a proof-of-concept state. When you want the performance you’re targeting, you need to go beyond
- There’s no platform out there that provides you real-world training recipes that work
- When you’re relying on closed-source models, you are also super dependent on the updates of the model that have side effects
- The concrete impact is like turning a year-long manual migration process to something that’s really more scalable and faster
- No competitor out there today is selling this embedded scientist as part of their training platform offering
- It’s on their clusters, it’s with their data—we don’t see anything of it, and so it’s completely under their control
- The customers that we’ve been working on—some of these specific problems are things that no MCP server would ever solve
- We’re deeply rooted into open source. This has been part of our DNA since the beginning
- Open frontier models are how AI becomes a true platform
- Forge is designed for continuous adaptation rather than one-time training
- The organizations with the most to gain from AI are precisely the ones for whom generic models are least sufficient
- An unfair advantage
,



Leave a Reply
Want to join the discussion?Feel free to contribute!