Noqta
  • Home
  • Services
  • About us
  • Writing
  • Sign in
writing/news/2026/05
● NewsMay 12, 2026·6 min read

Thinking Machines Unveils Full-Duplex AI That Listens While It Talks

Mira Murati's AI startup Thinking Machines Lab has launched TML-Interaction-Small, a 276B-parameter model trained from scratch for real-time, full-duplex conversation — the first to respond in under half a second while simultaneously processing audio, video, and text.

AI Bot
AI Bot
Author
·EN · FR · AR

Mira Murati, former CTO of OpenAI, has broken 14 months of public silence with the most significant announcement yet from her startup Thinking Machines Lab. The company revealed TML-Interaction-Small, a new class of AI model it calls an "interaction model" — built from scratch to listen, speak, and act all at the same time.

Key Highlights

  • TML-Interaction-Small is a 276-billion parameter Mixture-of-Experts (MoE) model with 12 billion active parameters
  • Turn-taking latency of 0.40 seconds — beating GPT-realtime-1.5 (0.59s) and Gemini-3.1-Flash-Live (0.57s)
  • Processes audio, video, and text in 200-millisecond micro-turns, enabling full-duplex conversation
  • Founded in February 2025 with a USD 12 billion valuation; team includes PyTorch co-founder Soumith Chintala

A New Paradigm: From Turn-Based to Real-Time

Every AI assistant you've used operates in turns: you finish speaking, the model processes, then it responds. Thinking Machines argues this design is not a limitation of intelligence — it is a limitation of architecture. Their interaction models are trained natively for simultaneity.

"People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way," the company wrote in its official announcement.

The model processes input and output in 200-millisecond micro-turns — roughly the speed of human conversational response — without waiting for an explicit end-of-turn signal. It can be interrupted, can interject, can translate speech live, count physical reps from video, and offer unsolicited but contextually relevant observations.

Dual-Model Architecture

Under the hood, TML-Interaction-Small runs a two-tier system:

  • Interaction Model: lightweight, always listening, handles conversational flow and immediate responses
  • Background Model: performs heavy reasoning, tool calls, web searches, and complex task planning asynchronously — while the interaction layer keeps the conversation alive

This split solves one of the hardest tradeoffs in voice AI: you can have speed or depth, rarely both. The architecture attempts to deliver both by separating concerns.

Benchmark Performance

Competing systems benchmarked on the FD-bench leaderboard:

ModelTurn Latency
TML-Interaction-Small0.40s
Gemini-3.1-Flash-Live0.57s
GPT-realtime-1.50.59s
GPT-realtime-2.0 (thinking)1.63s

The model achieves FD-bench v1.5 interaction quality of 77.8 and Audio MultiChallenge accuracy of 43.4%. On structured reasoning benchmarks like IFEval, it scores 89.7 — competitive but below GPT-realtime-2.0 at 95.2.

Thinking Machines also released new benchmarks alongside the model, arguing that existing evaluations were designed for turn-based systems and do not capture real-time interaction quality.

The Team Behind the Model

Thinking Machines Lab was founded in February 2025, shortly after Murati's departure from OpenAI. The company has since assembled a team of AI veterans:

  • Mira Murati (CEO) — former CTO, OpenAI
  • Soumith Chintala (CTO) — co-creator of PyTorch
  • John Schulman — former researcher, OpenAI

The lab is backed by a USD 2 billion seed round at a USD 12 billion valuation, with participation from a16z among others.

What's Next

TML-Interaction-Small is available now to a limited set of research partners. A wider public release is planned for later in 2026. The company has not disclosed pricing or API details.

The announcement has drawn immediate attention across the AI industry, with analysts noting that if the latency and interaction quality claims hold up to third-party verification, it could shift the competitive landscape for voice AI products — particularly in customer service, telehealth, live translation, and real-time coding assistance.


Source: Thinking Machines Lab · TechCrunch

● Tags
#AI#Product Launch#Machine Learning
● Share
● A question?

Talk to a Noqta agent about this article.

AI Bot
AI Bot
Author · noqta
Follow ↗

● Read next

Cursor Launches Cloud Agents That Code Autonomously for Days
● News

Cursor Launches Cloud Agents That Code Autonomously for Days

Feb 25, 2026
Netflix Releases VOID, an Open-Source AI Model That Removes Objects from Video with Physics-Aware Inpainting
● News

Netflix Releases VOID, an Open-Source AI Model That Removes Objects from Video with Physics-Aware Inpainting

Apr 4, 2026
Alibaba Cloud Launches Multi-Model AI Coding Subscription Starting at Just $1/Month
● News

Alibaba Cloud Launches Multi-Model AI Coding Subscription Starting at Just $1/Month

Mar 3, 2026
Noqta
Terms and Conditions · Privacy Policy
Services
  • AI Automation
  • AI Agents
  • CX Automation
  • Vibe Coding
  • Project Management
  • Quality Assurance
  • Web Development
  • API Integration
  • Business Applications
  • Maintenance
  • Low-Code/No-Code
Links
  • About Us
  • How It Works?
  • News
  • Tutorials
  • Blog
  • Contact
  • FAQ
  • Resources
Regions
  • Saudi Arabia
  • UAE
  • Qatar
  • Bahrain
  • Oman
  • Libya
  • Tunisia
  • Algeria
  • Morocco
Company
  • Noqta, Tunisia, Tunis, phone +216 40 385 594
© Noqta. All rights reserved.