On May 20, 2026, Cohere released Command A+ — its most powerful model to date and its first ever shipped under a full Apache 2.0 license. For teams across MENA and Europe that need frontier-grade AI without sending data to a hyperscaler, this is the most consequential open-source release of the quarter. This guide walks through the architecture, the benchmarks, and how to put it to work.
What Command A+ actually is
Command A+ is a sparse Mixture-of-Experts (MoE) decoder model with 218 billion total parameters and 25 billion active per token. It uses 128 experts, with 8 routed experts plus one shared expert active on every token. The design choice is deliberate: dense models the size of Command A+ would be prohibitive to serve, but the MoE routing keeps inference density high while overall parameter capacity stays large.
Headline numbers worth committing to memory:
- 128K input context, 64K maximum generation
- 48 supported languages, up from 23 in the previous Command A
- W4A4 quantization — runs on 2x H100 or a single B200
- Apache 2.0 license — commercial use, modification, redistribution, all permitted
- Day-0 vLLM support for inference at scale
Cohere positions the model around what it calls "sovereign AI": the ability for governments, banks, telcos, and regulated enterprises to deploy frontier capability on infrastructure they control. The licensing and the small hardware footprint make that claim concrete rather than aspirational.
The benchmark picture
Cohere published a sharp jump on several agentic and reasoning workloads compared to the prior Command A generation:
| Benchmark | Command A+ | Prior Command A |
|---|---|---|
| Terminal-Bench Hard (agentic coding) | 25% | 3% |
| τ²-Bench (telecom reasoning) | 85% | 37% |
| MMMU (multimodal) | 75.1% | — |
| MMMU Pro | 63% | — |
| MathVista (math reasoning) | 80.6% | — |
The model is also rated 37 on the Artificial Analysis Intelligence Index, putting it in the same conversation as the top closed frontier models for many enterprise tasks.
The honest caveat: independent observers noted on launch day that Command A+ does not beat Qwen 3.6 head-to-head on every overlapping benchmark, despite activating roughly eight times more parameters per token. The story here is not raw leaderboard supremacy. It is the combination of permissive license, native multilingual support, native citations, and small deploy footprint in one package.
Why the Arabic and multilingual story matters
This is the part most coverage will undersell. Cohere reports tokenization efficiency improvements of 20% for Arabic, 16% for Korean, and 18% for Japanese. Translation: the same Arabic paragraph costs roughly a fifth fewer tokens to process than on the previous generation, which directly cuts inference cost and latency for Arabic workloads.
For a Tunisian fintech, a Saudi government portal, or an Emirati legal-tech team, that efficiency gain compounds across millions of requests. Combined with on-prem deployment, you get a credible answer to two recurring questions from regional CIOs:
- Can we keep regulated Arabic content inside our own datacenter? Yes.
- Will Arabic inference still be economically viable at scale? Yes, more so than before.
Getting started
The model is available three ways:
1. Open weights on Hugging Face. Pull CohereLabs/command-a-plus-05-2026-w4a4 for the quantized build, or the BF16/FP8 variants for higher-precision serving. Apache 2.0 means no separate license dance.
2. Managed inference via Cohere Model Vault. If you want the model but not the operational burden of running it.
3. Cohere API. Hit the standard chat endpoint with the new model id command-a-plus-05-2026.
A minimal vLLM serve command using the W4A4 build looks like this:
vllm serve CohereLabs/command-a-plus-05-2026-w4a4 \
--tensor-parallel-size 2 \
--max-model-len 131072 \
--quantization compressed-tensors \
--enable-auto-tool-choice \
--tool-call-parser cohereAnd a Python call once the server is up:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
)
response = client.chat.completions.create(
model="CohereLabs/command-a-plus-05-2026-w4a4",
messages=[
{"role": "system", "content": "You answer with citations to provided sources."},
{"role": "user", "content": "Summarize the 2026 PDPL amendments in two sentences."},
],
temperature=0.3,
)
print(response.choices[0].message.content)The model is OpenAI-API compatible through vLLM, so most existing client code drops in unchanged.
Native citations and RAG
Command A+ ships with a structured citation mode that emits source spans alongside generated text. For retrieval-augmented generation pipelines that need to show users where an answer came from — think internal knowledge bases, legal research, compliance Q&A — this removes a layer of brittle prompt engineering that most teams still maintain by hand.
The pattern is straightforward: pass your retrieved chunks as part of the input, and the model returns the answer with inline references to the chunk ids it actually used. Audit trails become a first-class output rather than an afterthought.
When to choose Command A+
It is the right pick when you need:
- On-premise or VPC-only deployment for regulatory reasons
- Strong Arabic, Japanese, or Korean inference without paying a token-tax
- Native citations in RAG systems where source attribution is non-negotiable
- Apache 2.0 freedom to fork, fine-tune, and redistribute
It is not the right pick when raw benchmark wins on coding or pure reasoning are the only criterion — Qwen 3.6, DeepSeek, and the closed frontier models from OpenAI, Anthropic, and Google still trade blows on individual leaderboards. Pick the tool that matches the constraint that actually binds you.
What this means for MENA tech teams
Three takeaways for teams shipping in Tunisia, the GCC, and the broader region:
- Sovereign AI is now buildable, not aspirational. Two H100s in a Tunis datacenter is a reachable budget. So is the legal clarity of Apache 2.0.
- Arabic-first products got cheaper overnight. The 20% token efficiency gain is real economic leverage.
- Cohere's strategy is differentiating on deployment, not benchmarks. That is a useful signal for how to position your own AI products: in regulated markets, deploy posture beats leaderboard rank.
If you are evaluating LLM infrastructure for a regulated MENA workload, Command A+ deserves a place on the shortlist next to whichever closed-frontier model you already use. The interesting question is not whether it wins every benchmark. It is whether it removes constraints — legal, geographic, economic — that your current stack still imposes.
Want to talk through an LLM deployment for your team? Get in touch — we help organizations across MENA evaluate and deploy enterprise AI.