Moonshot AI Releases Kimi K2.6: Open-Source Model Matches Opus 4.6 on SWE-Bench and Orchestrates 300-Agent Swarms

Beijing-based Moonshot AI has released Kimi K2.6, a one-trillion-parameter open-weights model that dethrones every frontier lab on Humanity's Last Exam with tools and narrowly beats GPT-5.4 on SWE-Bench Pro. Announced on April 20, 2026, the model ships under a Modified MIT License and is immediately available on Kimi.com, the Kimi app, the official API, and the Kimi Code CLI — closing the gap between Chinese open-source models and proprietary Western systems to a matter of points.

Key Highlights

58.6 on SWE-Bench Pro, ahead of GPT-5.4 (57.7), Claude Opus 4.6 (53.4) and Gemini 3.1 Pro (54.2)
54.0 on HLE-Full with tools — the leading score across all measured frontier models
Agent Swarms scale to 300 sub-agents executing 4,000 coordinated steps, up from 100 and 1,500 in K2.5
256K context window, 1T total parameters with 32B activated per token via 384-expert MoE
Open weights on Hugging Face under a Modified MIT License permitting commercial use

Benchmark Performance

Kimi K2.6 posts the strongest numbers yet for an open-weights model on agentic coding workloads. On SWE-Bench Verified it reaches 80.2, on SWE-bench Multilingual 76.7, and on LiveCodeBench v6 it scores 89.6 — edging past Claude Opus 4.6's 88.8. Terminal-Bench 2.0 comes in at 66.7 and BrowseComp at 86.3, both sizeable jumps over the K2.5 baseline released earlier this year.

The headline result is on Humanity's Last Exam with tools, where K2.6 leads the field at 54.0 against GPT-5.4 at 52.1, Claude Opus 4.6 at 53.0, and Gemini 3.1 Pro at 51.4. On SWE-Bench Pro, a benchmark designed to resist contamination, K2.6 sits a single point shy of Claude Opus 4.7 — the closest any open-source model has come to Anthropic's latest.

Architecture and Technical Specs

Under the hood, K2.6 is a sparse Mixture-of-Experts model with 384 experts — eight selected per token plus one shared expert — activating 32 billion parameters out of a trillion total. The architecture uses 61 layers, a 7,168 attention hidden dimension, and 64 attention heads. Native multimodal understanding is provided by MoonViT, a 400-million-parameter vision encoder fused directly into the model rather than bolted on.

The 256K-token context window supports long-horizon agent runs, and Moonshot recommends deployment via vLLM, SGLang, or KTransformers with transformers versions 4.57.1 or higher. Two operating modes ship at launch: Thinking mode for extended reasoning and Instant mode for low-latency responses.

Agent Swarms and Long-Horizon Work

The most significant architectural bet is on agentic scale. K2.6 ships with Agent Swarms capable of running 300 sub-agents in parallel across 4,000 coordinated steps — triple the sub-agent count and more than double the step budget of K2.5. Moonshot also introduced Claw Groups for heterogeneous multi-agent coordination, allowing K2.6 to orchestrate third-party agents alongside its own.

In Moonshot's demonstrations, K2.6 autonomously optimized a financial matching engine over a 13-hour uninterrupted run, delivering a 185 percent throughput improvement. A separate showcase saw the model run for five continuous days on infrastructure management tasks. Moonshot says K2.6 can also ingest PDFs, spreadsheets, and slide decks and turn them into reusable "Skills" — a capability that mirrors the skills standard gaining traction across the coding-agent ecosystem.

Impact on the Open Model Race

The release lands at a pivotal moment for open-source AI. DeepSeek is expected to ship V4 in the coming weeks, and Alibaba's Qwen and Zhipu's GLM-5 are already compressing the gap with Western frontier labs. Kimi K2.6 is now arguably the strongest open-weights agentic coding model available, and its Modified MIT License means developers, startups, and enterprises can deploy it without vendor lock-in.

For price-sensitive teams, Moonshot's Kimi Code subscription prices the hosted model at 39 yuan per month — roughly a quarter of comparable Claude or GPT-5 coding tiers. Combined with support for integration into Cursor, Cline, OpenClaw, and other agent frameworks, the economic argument for open-weights coding agents is becoming harder to dismiss.

What's Next

Moonshot has signalled that K2.6 is the foundation for a broader agent platform rather than a one-off release. Expect progressively longer autonomous-run showcases, deeper Claw Group integrations, and an expanding catalogue of shareable Skills. For CIOs and engineering leaders in the MENA region, the shift is straightforward: the cost of running a near-frontier coding agent on your own infrastructure dropped again this week.

Source: MarkTechPost