Mira Murati, former CTO of OpenAI, has broken 14 months of public silence with the most significant announcement yet from her startup Thinking Machines Lab. The company revealed TML-Interaction-Small, a new class of AI model it calls an "interaction model" — built from scratch to listen, speak, and act all at the same time.
Key Highlights
- TML-Interaction-Small is a 276-billion parameter Mixture-of-Experts (MoE) model with 12 billion active parameters
- Turn-taking latency of 0.40 seconds — beating GPT-realtime-1.5 (0.59s) and Gemini-3.1-Flash-Live (0.57s)
- Processes audio, video, and text in 200-millisecond micro-turns, enabling full-duplex conversation
- Founded in February 2025 with a USD 12 billion valuation; team includes PyTorch co-founder Soumith Chintala
A New Paradigm: From Turn-Based to Real-Time
Every AI assistant you've used operates in turns: you finish speaking, the model processes, then it responds. Thinking Machines argues this design is not a limitation of intelligence — it is a limitation of architecture. Their interaction models are trained natively for simultaneity.
"People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way," the company wrote in its official announcement.
The model processes input and output in 200-millisecond micro-turns — roughly the speed of human conversational response — without waiting for an explicit end-of-turn signal. It can be interrupted, can interject, can translate speech live, count physical reps from video, and offer unsolicited but contextually relevant observations.
Dual-Model Architecture
Under the hood, TML-Interaction-Small runs a two-tier system:
- Interaction Model: lightweight, always listening, handles conversational flow and immediate responses
- Background Model: performs heavy reasoning, tool calls, web searches, and complex task planning asynchronously — while the interaction layer keeps the conversation alive
This split solves one of the hardest tradeoffs in voice AI: you can have speed or depth, rarely both. The architecture attempts to deliver both by separating concerns.
Benchmark Performance
Competing systems benchmarked on the FD-bench leaderboard:
| Model | Turn Latency |
|---|---|
| TML-Interaction-Small | 0.40s |
| Gemini-3.1-Flash-Live | 0.57s |
| GPT-realtime-1.5 | 0.59s |
| GPT-realtime-2.0 (thinking) | 1.63s |
The model achieves FD-bench v1.5 interaction quality of 77.8 and Audio MultiChallenge accuracy of 43.4%. On structured reasoning benchmarks like IFEval, it scores 89.7 — competitive but below GPT-realtime-2.0 at 95.2.
Thinking Machines also released new benchmarks alongside the model, arguing that existing evaluations were designed for turn-based systems and do not capture real-time interaction quality.
The Team Behind the Model
Thinking Machines Lab was founded in February 2025, shortly after Murati's departure from OpenAI. The company has since assembled a team of AI veterans:
- Mira Murati (CEO) — former CTO, OpenAI
- Soumith Chintala (CTO) — co-creator of PyTorch
- John Schulman — former researcher, OpenAI
The lab is backed by a USD 2 billion seed round at a USD 12 billion valuation, with participation from a16z among others.
What's Next
TML-Interaction-Small is available now to a limited set of research partners. A wider public release is planned for later in 2026. The company has not disclosed pricing or API details.
The announcement has drawn immediate attention across the AI industry, with analysts noting that if the latency and interaction quality claims hold up to third-party verification, it could shift the competitive landscape for voice AI products — particularly in customer service, telehealth, live translation, and real-time coding assistance.
Source: Thinking Machines Lab · TechCrunch