← ← Back to Thinking AI

Mistral Small 4: One Open-Source Model to Replace Three

Published March 2026

Published on the TEN INVENT blog · March 2026

On March 16, 2026, Mistral AI released something that should make every developer reconsider their model stack. Mistral Small 4 is a 119-billion-parameter Mixture-of-Experts model, released under Apache 2.0, that unifies reasoning, multimodal understanding, and agentic coding into a single model. For the first time, you do not need to choose between a fast instruction-following model, a powerful reasoning engine, or a multimodal assistant. One model does all three.

At TEN INVENT, we run multiple models for different tasks — reasoning models for complex analysis, coding models for development workflows, and instruction models for general automation. The idea of consolidating that into a single, open-source model is compelling enough that we started testing Mistral Small 4 on day one.

The Architecture: 128 Experts, 6 Billion Active

Mistral Small 4 uses a Mixture-of-Experts (MoE) architecture with 128 expert networks, activating only 4 experts per token. While the total model weighs in at 119 billion parameters, only 6 billion are active for any given token (8 billion including embedding and output layers).

This is a critical design decision. MoE models give you the knowledge capacity of a massive model with the inference cost of a much smaller one. Running Mistral Small 4 does not require the GPU fleet you would need for a dense 119B model. Instead, the computational cost per token is comparable to running a 6-8 billion parameter dense model.

For developers who need to run models locally or on modest GPU setups, this changes the calculus entirely. You get the breadth of a large model — broad knowledge, strong reasoning, multimodal capabilities — at a fraction of the compute cost.

Three Models in One

The most significant aspect of Mistral Small 4 is what it replaces. Previously, Mistral maintained separate model families for different capabilities:

Magistral for complex reasoning and chain-of-thought tasks
Pixtral for multimodal understanding of images and documents
Devstral for agentic coding, code generation, and development workflows

Mistral Small 4 absorbs all three capabilities into a single model. This is not just a convenience — it eliminates a fundamental problem in production AI systems: model routing.

When you run multiple specialized models, you need a routing layer that decides which model handles each request. That routing layer adds latency, introduces failure points, and requires maintenance. It also creates edge cases where a request needs capabilities from multiple models — a question about code that references an image, for example, or a reasoning task that requires code execution.

With a unified model, the routing problem disappears. Every request goes to the same model, which has all the capabilities needed to handle it. Simpler architecture, fewer failure modes, lower operational cost.

Performance That Matters

The benchmark numbers tell a clear story:

40% reduction in end-to-end completion time in latency-optimized configurations compared to Mistral Small 3
3x increase in requests per second in throughput-optimized setups
256K token context window supporting long-form document analysis and multi-turn conversations
Configurable reasoning that allows developers to control the depth of chain-of-thought processing

The latency improvement is particularly important for agentic workflows. When an AI agent is executing a multi-step task — searching for information, analyzing results, writing code, running tests — every millisecond of model latency compounds across dozens of inference calls. A 40% latency reduction across an agent workflow can mean the difference between a task completing in 30 seconds versus 50 seconds.

Apache 2.0: Truly Open

Mistral Small 4 is released under the Apache 2.0 license, which means:

Commercial use without restrictions: You can deploy it in production, embed it in products, and charge customers for services built on it without paying Mistral a licensing fee
Modification and distribution: You can fine-tune the model, modify the architecture, and distribute your modified version
No copyleft obligations: Unlike some open-source licenses, Apache 2.0 does not require you to open-source your modifications

This matters enormously for enterprise adoption. Many organizations have legal teams that block models with ambiguous or restrictive licenses. Apache 2.0 is one of the most permissive and well-understood licenses in software, which removes a significant adoption barrier.

For comparison, Meta's Llama models use a custom license with usage restrictions above 700 million monthly active users. Google's Gemma models have similar custom terms. Mistral Small 4's Apache 2.0 license has no such constraints.

Practical Applications We Are Testing

At TEN INVENT, we have been running Mistral Small 4 across several use cases:

Code Review with Visual Context

Developers can submit a screenshot of a UI bug alongside the relevant code, and the model understands both the visual problem and the code context. Previously, this required routing to a multimodal model for the screenshot analysis and then to a coding model for the fix. Now it is a single inference call.

Document Analysis with Reasoning

Clients often send us contracts, specifications, or technical documentation as images or PDFs. Mistral Small 4 can read the document visually, extract the relevant information, reason about its implications, and generate actionable recommendations — all in one pass.

Agentic Development Workflows

For development tasks that require planning, code writing, and testing, the unified model maintains consistent context across all three phases. The reasoning it applies during planning carries through to code generation, which carries through to test design. No information is lost in model-to-model handoffs because there are no handoffs.

How It Compares

The competitive landscape for open-source models is fierce:

Llama 3.3 70B (Meta): Strong general-purpose model but dense architecture means higher compute costs and no native multimodal support in the open-weight version
Gemma 3 27B (Google): Efficient and capable but significantly smaller knowledge capacity than Mistral Small 4's 119B parameters
Qwen 3 235B-A22B (Alibaba): Large MoE model with more active parameters but a more restrictive license for commercial use

Mistral Small 4 occupies a unique position: large enough to be genuinely capable across diverse tasks, efficient enough to run on reasonable hardware, and permissive enough to deploy anywhere without legal concerns.

Running It Yourself

Getting started with Mistral Small 4 locally:

Hardware requirements: Minimum 48GB VRAM for full precision. With 4-bit quantization (GGUF format), it runs on GPUs with 24GB VRAM like the RTX 4090 or A10
Frameworks: Supported by vLLM, TGI, Ollama, and llama.cpp from day one
API access: Available through Mistral's API at competitive pricing, or self-hosted with no licensing costs
MCP integration: Works with Model Context Protocol servers for tool use, making it compatible with the growing MCP ecosystem of over 6,400 registered servers

For teams already running local models with Ollama or vLLM, switching to Mistral Small 4 is straightforward:

ollama run mistral-small-4

The Strategic Implication

Mistral Small 4 represents a broader trend in the AI model landscape: convergence. The era of maintaining separate models for separate capabilities is ending. The models that win enterprise adoption will be the ones that do everything well enough in a single package, rather than doing one thing exceptionally in isolation.

This is good news for developers. Fewer models to manage means simpler infrastructure, lower costs, and faster iteration. It also means that the skills you build working with one model — prompt engineering, fine-tuning, evaluation — transfer directly rather than being model-specific.

At TEN INVENT, we see Mistral Small 4 as the strongest open-source foundation model available today for teams that need reasoning, multimodal, and coding capabilities without vendor lock-in. The Apache 2.0 license, the MoE efficiency, and the unified capability set make it a model worth building on.

The open-source AI ecosystem just took a significant step forward. Mistral delivered exactly what the community needed: one model, three jobs, zero license headaches.