Google DeepMind Launches Gemma 4: Open-Source AI That Runs on Your Phone

Google DeepMind has released Gemma 4, its most advanced open-weight AI model family to date, built on research from Gemini 3 Pro and available under the permissive Apache 2.0 license. The release marks a significant shift in making powerful AI accessible without cloud dependencies or per-token fees.

Key Highlights

Four model sizes ranging from edge-optimized 2B to a dense 31B variant
Apache 2.0 license — fully open for commercial use at no cost
Runs offline on smartphones, laptops, Raspberry Pi, and browsers
Agentic capabilities with multi-step planning and autonomous tool use
140+ languages supported across all model variants

The Model Family

Gemma 4 comes in four variants designed for different deployment scenarios:

Gemma 4 E2B (2 billion effective parameters) — runs in under 1.5 GB of memory, optimized for smartphones and IoT devices
Gemma 4 E4B (4 billion effective parameters) — enhanced edge model with audio-visual processing
Gemma 4 26B — a Mixture of Experts (MoE) architecture that ranked 6th on Arena AI's text leaderboard
Gemma 4 31B — a dense model that ranked 3rd on the same leaderboard, outperforming models 20 times its size

The smaller E2B and E4B models can process audio inputs and understand speech, while all four variants handle video and image inputs.

Agentic AI at the Edge

What sets Gemma 4 apart is its focus on agentic workflows — the kind of multi-step autonomous reasoning that enterprises are increasingly building into their operations. The models support tool calling, function execution, constrained decoding for structured outputs, and dynamic context lengths up to 128K tokens.

On a Raspberry Pi 5, the model achieves 133 tokens per second for prefill and 7.6 tokens per second for decode. It can process 4,000 input tokens across 2 distinct skills in under 3 seconds, making real-time agentic tasks feasible on consumer hardware.

Platform Support

Gemma 4 runs across a remarkably broad set of platforms: Android, iOS, Windows, Linux, macOS, WebGPU-enabled browsers, Raspberry Pi 5, and Qualcomm IQ8 NPU. Google has released the AI Edge Gallery app for iOS and Android, allowing users to download and run models directly on their devices.

NVIDIA has also announced optimizations for running Gemma 4 locally on RTX GPUs, with one developer reporting 188 tokens per second on an RTX 5090 using the MoE variant.

Why It Matters

The Apache 2.0 licensing is a major departure from previous Gemma releases, which used a more restrictive proprietary license. This puts Gemma 4 in direct competition with Meta's Llama and Alibaba's Qwen families in the open-weight model space.

Model weights are available through Hugging Face, Kaggle, and Ollama, with support for 2-bit and 4-bit weight quantization to fit on memory-constrained devices.

What to Watch

Early community testing suggests that while the cloud-hosted versions perform well, locally quantized models — especially in non-English languages — may exhibit character rendering issues. Google will likely address these in subsequent updates as the community provides feedback.

Source: Google Developers Blog