writing/news/2026/06
NewsJun 23, 2026·6 min read

Sakana AI Launches Fugu: One Model API That Orchestrates a Pool of Frontier LLMs

Tokyo lab Sakana AI has released Fugu, a multi-agent orchestration system that behaves like a single OpenAI-compatible model while routing tasks across a swappable pool of frontier LLMs — matching Fable 5 and Mythos benchmarks without training a frontier model of its own, and routing around export controls.

Tokyo-based research lab Sakana AI on June 22, 2026 released Sakana Fugu, a multi-agent orchestration system that behaves like a single foundation model. Instead of training one giant model, Fugu is itself a language model trained to coordinate a swappable pool of other frontier LLMs — deciding when to delegate, how agents should communicate, and how to synthesize their outputs — all behind one OpenAI-compatible API.

The launch follows a beta that began on April 25, 2026, and ships in two variants: Fugu for everyday coding, chat, and review, and Fugu Ultra for demanding multi-step work such as AI research, paper reproduction, cybersecurity analysis, and patent investigations.

Key Highlights

  • One model, many models: Fugu manages selection, delegation, verification, and synthesis internally, exposing a single endpoint that drops into existing OpenAI-compatible tooling.
  • Frontier-level scores without a frontier model: Fugu Ultra reportedly reaches 73.7 on SWE-bench Pro, 95.5 on GPQA-Diamond, 90.8 on LiveCodeBench Pro, and 50.0 on Humanity's Last Exam.
  • Built to route around restrictions: Teams can exclude specific agents from the pool for compliance, and Fugu reroutes automatically if a provider becomes unavailable.
  • Founded by a transformer co-author: Sakana AI was co-founded in 2023 by Llion Jones, one of the authors of the 2017 paper "Attention Is All You Need."

How It Works

Fugu is a trainable orchestrator that "dynamically coordinates multiple language models from a swappable pool while behaving like a single model through one API." Given a hard task, it can decompose the problem, spin up specialist models for the sub-parts, call a fresh instance of itself to manage a sub-problem, and then verify and synthesize the pieces into one clean response — without that machinery surfacing in the request.

The approach builds on two papers accepted at ICLR 2026: Trinity ("An Evolved LLM Coordinator"), which evolves a lightweight coordinator that assigns Thinker, Worker, and Verifier roles, and Conductor ("Learning to Orchestrate Agents in Natural Language"), alongside a dedicated Sakana Fugu Technical Report.

Benchmarks

According to coverage from The Decoder, Fugu Ultra lands ahead of strong baselines on several public benchmarks: SWE-bench Pro 73.7 versus Opus 4.8's 69.2 and GPT-5.5's 58.6; GPQA-Diamond 95.5 versus Opus 4.8's 92.0; LiveCodeBench Pro 90.8 versus 84.8; and Humanity's Last Exam at 50.0, edging Opus 4.8's 49.8. Sakana says Fugu Ultra "stands shoulder-to-shoulder" with Fable 5 and Mythos Preview — though those two are not in Fugu's pool because they are unavailable under export controls, so the comparison relies on published provider results.

Pricing

Fugu is offered through three subscription tiers — Standard at 20 dollars per month for lightweight use, Pro at 100 dollars per month for roughly 10x usage, and Max at 200 dollars per month for 20x heavy workloads. A pay-as-you-go API plan is also available, with Fugu Ultra priced at 5 dollars per million input tokens and 30 dollars per million output tokens.

Why It Matters for MENA

The pitch leans directly on a recent shock. On June 12, 2026, US export controls pulled Anthropic's Fable 5 and Mythos from worldwide availability essentially overnight — a move that hit MENA developers in Tunisia, Saudi Arabia, and across the region. Sakana frames the lesson bluntly: "For an organization or a nation, relying on a single company's APIs for critical infrastructure, finance, or governance is a material vulnerability."

A swappable orchestration layer offers resilience: if one provider is restricted, the pool reroutes and the application keeps running. For organizations bound by data-residency rules such as Tunisia's INPDP framework and the wider MENA push for PDPL compliance, the ability to curate which models sit in the pool — and to exclude any that cannot meet local requirements — is a practical lever, even if true sovereignty would require multiple providers being restricted at once to fully test it.

What's Next

Fugu positions orchestration itself as the product, a notable departure from the race to train ever-larger single models. If the approach holds up under real workloads, expect rivals to ship their own router-style endpoints — and enterprise buyers to weigh resilience and compliance alongside raw benchmark scores.


Source: Sakana AI