Google DeepMind Launches Gemma 4: Open-Source AI That Runs on Your Phone

AI Bot
By AI Bot ·

Loading the Text to Speech Audio Player...

Google DeepMind has released Gemma 4, its most advanced open-weight AI model family to date, built on research from Gemini 3 Pro and available under the permissive Apache 2.0 license. The release marks a significant shift in making powerful AI accessible without cloud dependencies or per-token fees.

Key Highlights

  • Four model sizes ranging from edge-optimized 2B to a dense 31B variant
  • Apache 2.0 license — fully open for commercial use at no cost
  • Runs offline on smartphones, laptops, Raspberry Pi, and browsers
  • Agentic capabilities with multi-step planning and autonomous tool use
  • 140+ languages supported across all model variants

The Model Family

Gemma 4 comes in four variants designed for different deployment scenarios:

  • Gemma 4 E2B (2 billion effective parameters) — runs in under 1.5 GB of memory, optimized for smartphones and IoT devices
  • Gemma 4 E4B (4 billion effective parameters) — enhanced edge model with audio-visual processing
  • Gemma 4 26B — a Mixture of Experts (MoE) architecture that ranked 6th on Arena AI's text leaderboard
  • Gemma 4 31B — a dense model that ranked 3rd on the same leaderboard, outperforming models 20 times its size

The smaller E2B and E4B models can process audio inputs and understand speech, while all four variants handle video and image inputs.

Agentic AI at the Edge

What sets Gemma 4 apart is its focus on agentic workflows — the kind of multi-step autonomous reasoning that enterprises are increasingly building into their operations. The models support tool calling, function execution, constrained decoding for structured outputs, and dynamic context lengths up to 128K tokens.

On a Raspberry Pi 5, the model achieves 133 tokens per second for prefill and 7.6 tokens per second for decode. It can process 4,000 input tokens across 2 distinct skills in under 3 seconds, making real-time agentic tasks feasible on consumer hardware.

Platform Support

Gemma 4 runs across a remarkably broad set of platforms: Android, iOS, Windows, Linux, macOS, WebGPU-enabled browsers, Raspberry Pi 5, and Qualcomm IQ8 NPU. Google has released the AI Edge Gallery app for iOS and Android, allowing users to download and run models directly on their devices.

NVIDIA has also announced optimizations for running Gemma 4 locally on RTX GPUs, with one developer reporting 188 tokens per second on an RTX 5090 using the MoE variant.

Why It Matters

The Apache 2.0 licensing is a major departure from previous Gemma releases, which used a more restrictive proprietary license. This puts Gemma 4 in direct competition with Meta's Llama and Alibaba's Qwen families in the open-weight model space.

Model weights are available through Hugging Face, Kaggle, and Ollama, with support for 2-bit and 4-bit weight quantization to fit on memory-constrained devices.

What to Watch

Early community testing suggests that while the cloud-hosted versions perform well, locally quantized models — especially in non-English languages — may exhibit character rendering issues. Google will likely address these in subsequent updates as the community provides feedback.


Source: Google Developers Blog


Want to read more news? Check out our latest news article on Anthropic Discovers 171 Emotion Vectors Inside Claude That Causally Drive Its Behavior.

Discuss Your Project with Us

We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.

Let's find the best solutions for your needs.