On March 16, 2026, Mistral AI released something that should make every developer reconsider their model stack. Mistral Small 4 is a 119-billion-parameter Mixture-of-Experts model, released under Apache 2.0, that unifies reasoning, multimodal understanding, and agentic coding into a single model. For the first time, you do not need to choose between a fast instruction-following model, a powerful reasoning engine, or a multimodal assistant. One model does all three.
At TEN INVENT, we run multiple models for different tasks — reasoning models for complex analysis, coding models for development workflows, and instruction models for general automation. The idea of consolidating that into a single, open-source model is compelling enough that we started testing Mistral Small 4 on day one.
The Architecture: 128 Experts, 6 Billion Active
Mistral Small 4 uses a Mixture-of-Experts (MoE) architecture with 128 expert networks, activating only 4 experts per token. While the total model weighs in at 119 billion parameters, only 6 billion are active for any given token (8 billion including embedding and output layers).
This is a critical design decision. MoE models give you the knowledge capacity of a massive model with the inference cost of a much smaller one. Running Mistral Small 4 does not require the GPU fleet you would need for a dense 119B model. Instead, the computational cost per token is comparable to running a 6-8 billion parameter dense model.
For developers who need to run models locally or on modest GPU setups, this changes the calculus entirely. You get the breadth of a large model — broad knowledge, strong reasoning, multimodal capabilities — at a fraction of the compute cost.
Three Models in One
The most significant aspect of Mistral Small 4 is what it replaces. Previously, Mistral maintained separate model families for different capabilities:
- Magistral for complex reasoning and chain-of-thought tasks
- Pixtral for multimodal understanding of images and documents
- Devstral for agentic coding, code generation, and development workflows
Mistral Small 4 absorbs all three capabilities into a single model. This is not just a convenience — it eliminates a fundamental problem in production AI systems: model routing.
When you run multiple specialized models, you need a routing layer that decides which model handles each request. That routing layer adds latency, introduces failure points, and requires maintenance. It also creates edge cases where a request needs capabilities from multiple models — a question about code that references an image, for example, or a reasoning task that requires code execution.
With a unified model, the routing problem disappears. Every request goes to the same model, which has all the capabilities needed to handle it. Simpler architecture, fewer failure modes, lower operational cost.
Performance That Matters
The benchmark numbers tell a clear story:
- 40% reduction in end-to-end completion time in latency-optimized configurations compared to Mistral Small 3
- 3x increase in requests per second in throughput-optimized setups
- 256K token context window supporting long-form document analysis and multi-turn conversations
- Configurable reasoning that allows developers to control the depth of chain-of-thought processing
The latency improvement is particularly important for agentic workflows. When an AI agent is executing a multi-step task — searching for information, analyzing results, writing code, running tests — every millisecond of model latency compounds across dozens of inference calls. A 40% latency reduction across an agent workflow can mean the difference between a task completing in 30 seconds versus 50 seconds.
Apache 2.0: Truly Open
Mistral Small 4 is released under the Apache 2.0 license, which means:
- Commercial use without restrictions: You can deploy it in production, embed it in products, and charge customers for services built on it without paying Mistral a licensing fee
- Modification and distribution: You can fine-tune the model, modify the architecture, and distribute your modified version
- No copyleft obligations: Unlike some open-source licenses, Apache 2.0 does not require you to open-source your modifications
This matters enormously for enterprise adoption. Many organizations have legal teams that block models with ambiguous or restrictive licenses. Apache 2.0 is one of the most permissive and well-understood licenses in software, which removes a significant adoption barrier.
For comparison, Meta's Llama models use a custom license with usage restrictions above 700 million monthly active users. Google's Gemma models have similar custom terms. Mistral Small 4's Apache 2.0 license has no such constraints.
Practical Applications We Are Testing
At TEN INVENT, we have been running Mistral Small 4 across several use cases:
Code Review with Visual Context
Developers can submit a screenshot of a UI bug alongside the relevant code, and the model understands both the visual problem and the code context. Previously, this required routing to a multimodal model for the screenshot analysis and then to a coding model for the fix. Now it is a single inference call.
Document Analysis with Reasoning
Clients often send us contracts, specifications, or technical documentation as images or PDFs. Mistral Small 4 can read the document visually, extract the relevant information, reason about its implications, and generate actionable recommendations — all in one pass.
Agentic Development Workflows
For development tasks that require planning, code writing, and testing, the unified model maintains consistent context across all three phases. The reasoning it applies during planning carries through to code generation, which carries through to test design. No information is lost in model-to-model handoffs because there are no handoffs.
How It Compares
The competitive landscape for open-source models is fierce:
- Llama 3.3 70B (Meta): Strong general-purpose model but dense architecture means higher compute costs and no native multimodal support in the open-weight version
- Gemma 3 27B (Google): Efficient and capable but significantly smaller knowledge capacity than Mistral Small 4's 119B parameters
- Qwen 3 235B-A22B (Alibaba): Large MoE model with more active parameters but a more restrictive license for commercial use
Mistral Small 4 occupies a unique position: large enough to be genuinely capable across diverse tasks, efficient enough to run on reasonable hardware, and permissive enough to deploy anywhere without legal concerns.
Running It Yourself
Getting started with Mistral Small 4 locally:
- Hardware requirements: Minimum 48GB VRAM for full precision. With 4-bit quantization (GGUF format), it runs on GPUs with 24GB VRAM like the RTX 4090 or A10
- Frameworks: Supported by vLLM, TGI, Ollama, and llama.cpp from day one
- API access: Available through Mistral's API at competitive pricing, or self-hosted with no licensing costs
- MCP integration: Works with Model Context Protocol servers for tool use, making it compatible with the growing MCP ecosystem of over 6,400 registered servers
For teams already running local models with Ollama or vLLM, switching to Mistral Small 4 is straightforward:
ollama run mistral-small-4
The Strategic Implication
Mistral Small 4 represents a broader trend in the AI model landscape: convergence. The era of maintaining separate models for separate capabilities is ending. The models that win enterprise adoption will be the ones that do everything well enough in a single package, rather than doing one thing exceptionally in isolation.
This is good news for developers. Fewer models to manage means simpler infrastructure, lower costs, and faster iteration. It also means that the skills you build working with one model — prompt engineering, fine-tuning, evaluation — transfer directly rather than being model-specific.
At TEN INVENT, we see Mistral Small 4 as the strongest open-source foundation model available today for teams that need reasoning, multimodal, and coding capabilities without vendor lock-in. The Apache 2.0 license, the MoE efficiency, and the unified capability set make it a model worth building on.
The open-source AI ecosystem just took a significant step forward. Mistral delivered exactly what the community needed: one model, three jobs, zero license headaches.