Thinking Machines Unveils Full-Duplex AI That Listens While It Talks

Mira Murati, former CTO of OpenAI, has broken 14 months of public silence with the most significant announcement yet from her startup Thinking Machines Lab. The company revealed TML-Interaction-Small, a new class of AI model it calls an "interaction model" — built from scratch to listen, speak, and act all at the same time.

Key Highlights

TML-Interaction-Small is a 276-billion parameter Mixture-of-Experts (MoE) model with 12 billion active parameters
Turn-taking latency of 0.40 seconds — beating GPT-realtime-1.5 (0.59s) and Gemini-3.1-Flash-Live (0.57s)
Processes audio, video, and text in 200-millisecond micro-turns, enabling full-duplex conversation
Founded in February 2025 with a USD 12 billion valuation; team includes PyTorch co-founder Soumith Chintala

A New Paradigm: From Turn-Based to Real-Time

Every AI assistant you've used operates in turns: you finish speaking, the model processes, then it responds. Thinking Machines argues this design is not a limitation of intelligence — it is a limitation of architecture. Their interaction models are trained natively for simultaneity.

"People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way," the company wrote in its official announcement.

The model processes input and output in 200-millisecond micro-turns — roughly the speed of human conversational response — without waiting for an explicit end-of-turn signal. It can be interrupted, can interject, can translate speech live, count physical reps from video, and offer unsolicited but contextually relevant observations.

Dual-Model Architecture

Under the hood, TML-Interaction-Small runs a two-tier system:

Interaction Model: lightweight, always listening, handles conversational flow and immediate responses
Background Model: performs heavy reasoning, tool calls, web searches, and complex task planning asynchronously — while the interaction layer keeps the conversation alive

This split solves one of the hardest tradeoffs in voice AI: you can have speed or depth, rarely both. The architecture attempts to deliver both by separating concerns.

Benchmark Performance

Competing systems benchmarked on the FD-bench leaderboard:

Model	Turn Latency
TML-Interaction-Small	0.40s
Gemini-3.1-Flash-Live	0.57s
GPT-realtime-1.5	0.59s
GPT-realtime-2.0 (thinking)	1.63s

The model achieves FD-bench v1.5 interaction quality of 77.8 and Audio MultiChallenge accuracy of 43.4%. On structured reasoning benchmarks like IFEval, it scores 89.7 — competitive but below GPT-realtime-2.0 at 95.2.

Thinking Machines also released new benchmarks alongside the model, arguing that existing evaluations were designed for turn-based systems and do not capture real-time interaction quality.

The Team Behind the Model

Thinking Machines Lab was founded in February 2025, shortly after Murati's departure from OpenAI. The company has since assembled a team of AI veterans:

Mira Murati (CEO) — former CTO, OpenAI
Soumith Chintala (CTO) — co-creator of PyTorch
John Schulman — former researcher, OpenAI

The lab is backed by a USD 2 billion seed round at a USD 12 billion valuation, with participation from a16z among others.

What's Next

TML-Interaction-Small is available now to a limited set of research partners. A wider public release is planned for later in 2026. The company has not disclosed pricing or API details.

The announcement has drawn immediate attention across the AI industry, with analysts noting that if the latency and interaction quality claims hold up to third-party verification, it could shift the competitive landscape for voice AI products — particularly in customer service, telehealth, live translation, and real-time coding assistance.

Source: Thinking Machines Lab · TechCrunch

Key Highlights

TML-Interaction-Small is a 276-billion parameter Mixture-of-Experts (MoE) model with 12 billion active parameters
Turn-taking latency of 0.40 seconds — beating GPT-realtime-1.5 (0.59s) and Gemini-3.1-Flash-Live (0.57s)
Processes audio, video, and text in 200-millisecond micro-turns, enabling full-duplex conversation
Founded in February 2025 with a USD 12 billion valuation; team includes PyTorch co-founder Soumith Chintala

A New Paradigm: From Turn-Based to Real-Time

"People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way," the company wrote in its official announcement.

Dual-Model Architecture

Under the hood, TML-Interaction-Small runs a two-tier system:

Interaction Model: lightweight, always listening, handles conversational flow and immediate responses
Background Model: performs heavy reasoning, tool calls, web searches, and complex task planning asynchronously — while the interaction layer keeps the conversation alive

This split solves one of the hardest tradeoffs in voice AI: you can have speed or depth, rarely both. The architecture attempts to deliver both by separating concerns.

Benchmark Performance

Competing systems benchmarked on the FD-bench leaderboard:

Model	Turn Latency
TML-Interaction-Small	0.40s
Gemini-3.1-Flash-Live	0.57s
GPT-realtime-1.5	0.59s
GPT-realtime-2.0 (thinking)	1.63s

Thinking Machines also released new benchmarks alongside the model, arguing that existing evaluations were designed for turn-based systems and do not capture real-time interaction quality.

The Team Behind the Model

Thinking Machines Lab was founded in February 2025, shortly after Murati's departure from OpenAI. The company has since assembled a team of AI veterans:

Mira Murati (CEO) — former CTO, OpenAI
Soumith Chintala (CTO) — co-creator of PyTorch
John Schulman — former researcher, OpenAI

The lab is backed by a USD 2 billion seed round at a USD 12 billion valuation, with participation from a16z among others.

What's Next

TML-Interaction-Small is available now to a limited set of research partners. A wider public release is planned for later in 2026. The company has not disclosed pricing or API details.

Source: Thinking Machines Lab · TechCrunch