Netflix Releases VOID, an Open-Source AI Model That Removes Objects from Video with Physics-Aware Inpainting

Netflix has quietly released VOID (Video Object and Interaction Deletion), an open-source AI model that goes far beyond simple object erasure in video. VOID removes objects while realistically simulating how the remaining scene would physically behave without them — a breakthrough in video inpainting that no competing tool has achieved at this level.

What Makes VOID Different

Existing video object removal tools typically leave behind artifacts or simply fill the gap with static background. VOID understands physical causality. Remove a person holding a guitar, and the guitar falls naturally. Remove someone jumping into a pool, and the splash disappears too. Remove a car from a collision, and the remaining vehicle continues down the road undisturbed.

This interaction-aware approach is what sets VOID apart from every competitor on the market.

How It Works

VOID is built on top of CogVideoX and uses a two-pass transformer architecture:

Pass 1: A base inpainting model trained with a novel quadmask conditioning system that encodes four types of pixel information — the primary object to remove, overlap regions, affected interaction zones, and preserved background
Pass 2: A warped-noise refinement step that improves temporal consistency across longer sequences

The model was trained on two synthetic datasets: HUMOTO (human-object interactions rendered in Blender with physics simulation) and Kubric (object interactions using Google Scanned Objects). Training ran on 8x A100 80GB GPUs using DeepSpeed ZeRO Stage 2.

Beating the Competition

In user studies with 25 participants across multiple scenarios, VOID was preferred 64.8% of the time, with Runway a distant second at 18.4%. The model outperforms Runway, ProPainter, DiffuEraser, Generative Omnimatte, ROSE, and MiniMax-Remover — tools that range from commercial products to state-of-the-art research.

Open Source and Available Now

Netflix released VOID under an open license on Hugging Face, making it available for anyone to download and use. The project includes:

Two model checkpoints (Pass 1 and Pass 2)
A Google Colab notebook for quick experimentation
An interactive demo on Hugging Face Spaces
Full training pipeline code for generating synthetic data

The model requires a GPU with 40GB or more of VRAM (A100 recommended), SAM2 for segmentation, and a Google Gemini API key for mask generation.

The Team Behind VOID

Six researchers developed the model: Saman Motaded (Netflix/Sofia University), William Harvey (Netflix), Benjamin Klein (Netflix), Luc Van Gool (Sofia University), Zhuoning Yuan (Netflix), and Ta-Ying Cheng (Netflix). The accompanying research paper is available on arXiv.

Why It Matters

VOID represents Netflix's first major public AI model release, signaling the streaming giant's growing investment in AI research beyond its well-known recommendation algorithms. For filmmakers and video editors, physics-aware object removal opens new possibilities in post-production — from removing unwanted elements to creating alternative scene compositions.

For the open-source AI community, VOID adds a powerful new tool to the video generation ecosystem, one that prioritizes physical realism over mere visual plausibility.

Source: Netflix VOID on GitHub