Nous Research has rolled out Mixture of Agents 2.0 for Hermes Agent, turning multi-model orchestration into a first-class primitive. Co-founder and lead engineer Teknium announced the feature on June 26, 2026, framing it as a way to "combine any provider's models into a mixture of your own" and access the resulting preset "as if it were a normal model in Hermes."
The update arrives just one week after Hermes Agent v0.17.0 — "The Reach Release" — and reinforces the project's position as the most-used open-source agent on OpenRouter, with more than 140,000 GitHub stars accumulated since its February 2026 debut.
Key Highlights
- Mixture of Agents presets are now exposed as virtual models inside Hermes — pick one from the model selector and it behaves like any other LLM.
- Each preset runs multiple frontier models in parallel on the same query, then synthesizes the answers through an aggregator model.
- Nous Research benchmarks claim 8 percent higher scores than Claude Opus 4.8 and 11 percent higher than GPT-5.5 on its upcoming HermesBench evaluation.
- Any provider can be slotted in: Anthropic, OpenAI, xAI Grok, local inference, OpenRouter — the same preset can mix hosted frontier models with on-device runtimes.
- Released on top of Hermes Agent v0.17.0 (June 19, 2026), which already added background subagents, automation blueprints, and iMessage integration via Photon Spectrum.
Details
Mixture of Agents is not a new academic idea, but Hermes 2.0 is one of the first agent frameworks to expose it as a drop-in model rather than a code-level pipeline. A preset is a saved configuration of two or more underlying models plus an aggregator. Once defined, it appears alongside Claude, GPT, Grok, Gemini, and self-hosted endpoints in the standard model picker — and any Hermes feature that consumes a model (chat, subagents, automation blueprints, skills) can target it transparently.
Behind the scenes, Hermes fans the prompt out to every model in the preset, collects the responses, and asks the aggregator to compose a single answer. Teknium's announcement specifically highlighted a configuration mixing Claude Opus 4.8 with GPT-5.5 as the source of the headline benchmark gains.
The numbers come from HermesBench, an internal evaluation that Nous Research describes as "soon-to-release." That caveat matters: at launch there are no independent third-party benchmarks comparing MoA 2.0 presets to the underlying single models, and HermesBench scoring methodology has not been published.
Impact
The release lands in the middle of a sharp debate about frontier-model access. Earlier this month, the US government restricted worldwide availability of Anthropic Fable 5 and Mythos under export controls, and OpenAI was asked to stagger the GPT-5.6 release customer-by-customer to roughly 20 government-vetted partners. Several commentators framed Mixture of Agents 2.0 as a practical workaround: developers who hold legitimate API access to two or three frontier providers can compose their own "synthetic frontier" without waiting for any single lab to unblock them.
For development teams, the model-orchestration pattern has more immediate consequences than the geopolitics. Treating a multi-model ensemble as a single endpoint means existing application code does not have to change. The same prompt template, retry policy, and tool definition that worked against Opus can be pointed at a preset and produce stronger results — at the cost of higher latency, higher per-call spend (each model in the preset is paid for in full), and additional failure modes when one provider rate-limits or returns malformed output.
The freedom to mix hosted and local models is the more interesting structural shift. A preset can pair a frontier hosted model for reasoning with a small on-device Qwen, Mistral, or Llama variant for sensitive data extraction. NVIDIA's recent Hermes optimization for RTX PCs and DGX Spark — the latter running 120-billion-parameter models in 128 GB of unified memory — gives this configuration a credible hardware story for teams that want some of the workload to stay on their own machines.
Background
Hermes Agent launched in February 2026 as an open-source autonomous agent built by Nous Research, the lab behind the long-running Hermes fine-tune series. It crossed 140,000 GitHub stars in three months and overtook earlier general-purpose agents on OpenRouter usage. The platform now ships native desktop apps for macOS, Linux, and Windows, a browser-based admin dashboard, persistent memory, scheduling, sandboxed code execution, and adapters for seventeen messaging platforms including WhatsApp Business Cloud, Slack, and iMessage.
Mixture of Agents itself first appeared in earlier Hermes releases as a configurable pipeline. Version 2.0 reframes the same concept as a virtual model — a packaging change that materially lowers the integration cost. The aggregator-and-ensemble pattern echoes published research on multi-LLM systems but operationalizes it inside a tool that thousands of developers are already running daily.
What's Next
HermesBench is the immediate thing to watch. Until Nous Research publishes the methodology and the comparison numbers are reproduced by an outside party, the headline "8 percent above Opus 4.8" claim should be treated as vendor-reported. The next Hermes Agent point release is likely to include preset templates contributed by the community, and Teknium has signaled that aggregator-model customization — letting users swap in their own synthesizer — is on the near-term roadmap.
For teams in the MENA region weighing personal data protection law obligations, Mixture of Agents 2.0 is worth a closer look precisely because the same primitive that mixes Anthropic and OpenAI can mix a hosted provider with a local model. Routing the sensitive part of a prompt through an on-device aggregator while letting the heavy reasoning happen in the cloud is now a configuration change, not a custom integration. That flexibility — more than the benchmark headline — is what makes this release relevant beyond the agent-framework crowd.
Source: Nous Research — Hermes Agent