Miami-based startup Subquadratic emerged from stealth on May 5, 2026 with $29 million in seed funding and a frontier large language model called SubQ that the company says is the first to abandon quadratic attention entirely. The model supports a 12-million-token context window in its research configuration and, according to internal benchmarks, uses roughly one-thousandth the attention compute of comparable frontier models at full context length.
Key Highlights
- $29 million seed round at a reported $500 million valuation, backed by Justin Mateen (Tinder co-founder), Javier Villamizar, Grant Gittlin, and Jaclyn Rice Nelson.
- Founders are CEO Justin Dangel, a five-time founder, and CTO Alex Whedon, formerly Head of Generative AI at TribeAI and a Meta software engineer.
- New attention mechanism called Subquadratic Sparse Attention (SSA) scales linearly with context length, with no quadratic fallback layers.
- Three products launched in private beta: SubQ API, SubQ Code (a command-line coding agent), and SubQ Search.
Details
The launch centers on a single architectural claim. Modern transformer models rely on dense attention whose compute and memory grow with the square of the input length, which is why million-token context windows are expensive and tens-of-millions are essentially unshipped. SubQ replaces that with a sparse mechanism that, for each query token, selects a small subset of positions to attend to based on content rather than fixed patterns, then computes exact attention only over those.
In published numbers, the company reports its sparse attention runs about 52 times faster than FlashAttention at one million tokens and uses 63 percent less compute. At the full 12-million-token window, Subquadratic claims a roughly 1,000x reduction in attention compute relative to other frontier models. On RULER 128K, SubQ posts 95.0 percent accuracy against Claude Opus 4.6's 94.8 percent. On SWE-Bench Verified the company reports 81.8 percent, edging Opus 4.6 at 80.8 percent. The research configuration scores 83 on MRCR v2, while the production model exposed to early-access users — branded SubQ 1M-Preview — scores 65.9 at one million tokens, behind Opus 4.6 at 78.3 and GPT-5.5 at 74.
CEO Justin Dangel framed the bet bluntly in the company's launch post: "Quadratic scaling has been that constraint for AI. The most valuable applications of AI remain unbuilt because the existing architecture can't support them."
Impact
If the numbers hold up under independent evaluation, the economics of long-context inference change category. Subquadratic's reference comparison on RULER 128K — 95 percent accuracy at roughly $8 of compute against Claude Opus's 94 percent at roughly $2,600 — implies a cost reduction approaching two orders of magnitude at competitive accuracy. That would directly threaten the retrieval-augmented-generation stack that the industry has built around the cost ceiling of quadratic attention, since the obvious response to "context is now cheap" is to stop chunking documents and just paste the whole thing in.
For developers, SubQ Code is the most concrete near-term hook: a CLI agent designed to load entire codebases into a single context window rather than relying on retrieval over chunks. The API exposes the production model through OpenAI-compatible endpoints with tool use support.
Background
The skepticism inside the AI community is structured and pointed. The 12-million-token figure is a research result, not the production artifact. The shipping model exposes one million tokens, and the published benchmarks largely cap there. AI engineer Will Depue argued on X that SubQ is plausibly a sparse-attention finetune of an existing open model rather than a from-scratch architecture, and that the linear scaling and speedup numbers do not obviously line up. A pre-launch survey of prior subquadratic architectures — Mamba, RWKV, Kimi Linear, DeepSeek Sparse Attention — concluded that earlier approaches struggled with precise memory retrieval and exact copying at frontier scale, the prior SubQ would need to overturn.
The launch also chose a closed posture that contrasts sharply with concavity-ai's open-weights release earlier in the year: no full technical report, no public weights, gated early access, and a 50-million-token next-model announcement before the current one is independently verified. Several developers asked the obvious question on launch day: if SSA is so cheap, why is access gated?
What's Next
Independent benchmarks at the full 12-million-token window are the only thing that will resolve the split reaction. Subquadratic has pre-announced a 50-million-token successor model, which raises the stakes for verification of the current claims. Early-access slots are open at subq.ai for the API and the coding and search products, and pricing is reportedly around $1.50 per million tokens for the production model — roughly a tenth of the headline rate for comparable frontier models, if it survives public exposure.
Source: SiliconANGLE