Google has officially rolled out Gemma 3n, its latest on-device AI model first teased back in May 2025. What makes this launch exciting is that Gemma 3n brings full-scale multimodal processing think audio, video, image, and text straight to smartphones and edge devices, all without needing constant internet or heavy cloud support. It’s a big step forward for developers looking to bring powerful AI features to low-power devices running on limited memory.
At the core of Gemma 3n is a new architecture called MatFormer short for Matryoshka Transformer. Think Russian nesting dolls: smaller, fully-functional models tucked inside bigger ones. This clever setup lets developers scale AI performance based on the device’s capability. You get two versions E2B runs on just 2GB RAM, and E4B works with around 3GB.
Despite packing 5 to 8 billion raw parameters, both versions behave like much smaller models when it comes to resource use. That’s thanks to smart design choices like Per-Layer Embeddings (PLE), which shift some of the load from the GPU to the CPU, helping save memory. It also features KV Cache Sharing, which speeds up processing of long audio and video inputs by nearly 2x perfect for real-time use cases like voice assistants and mobile video analysis.
Gemma 3n isn’t just light on memory it’s stacked with serious capabilities. For speech-based features, it uses an audio encoder adapted from Google’s Universal Speech Model, which means it can handle speech-to-text and even language translation directly on your phone. It’s already showing solid results, especially when translating between English and European languages like Spanish, French, Italian, and Portuguese.
On the visual front, it’s powered by Google’s new MobileNet-V5—a lightweight but powerful vision encoder that can process video at up to 60fps on phones like the Pixel. That means smooth, real-time video analysis without breaking a sweat. And it’s not just fast—it’s also more accurate than older models.
Developers can plug into Gemma 3n using popular tools like Hugging Face Transformers, Ollama, MLX, llama.cpp, and more. Google’s also kicked off the Gemma 3n Impact Challenge, offering a $150,000 prize pool for apps that showcase the model’s offline magic.
The best part? Gemma 3n runs entirely offline. No cloud, no connection just pure on-device AI. With support for over 140 languages and the ability to understand content in 35, it’s a game-changer for building AI apps where connectivity is patchy or privacy is a priority.
Here’s how you can try it out -
Want to try Gemma 3n for yourself? Here’s how you can get started:
At the core of Gemma 3n is a new architecture called MatFormer short for Matryoshka Transformer. Think Russian nesting dolls: smaller, fully-functional models tucked inside bigger ones. This clever setup lets developers scale AI performance based on the device’s capability. You get two versions E2B runs on just 2GB RAM, and E4B works with around 3GB.
Despite packing 5 to 8 billion raw parameters, both versions behave like much smaller models when it comes to resource use. That’s thanks to smart design choices like Per-Layer Embeddings (PLE), which shift some of the load from the GPU to the CPU, helping save memory. It also features KV Cache Sharing, which speeds up processing of long audio and video inputs by nearly 2x perfect for real-time use cases like voice assistants and mobile video analysis.
We’re fully releasing Gemma 3n, which brings powerful multimodal AI capabilities to edge devices. 🛠️
— Google DeepMind (@GoogleDeepMind) June 26, 2025
Here’s a snapshot of its innovations 🧵 pic.twitter.com/ARo2nHdUzC
Gemma 3n isn’t just light on memory it’s stacked with serious capabilities. For speech-based features, it uses an audio encoder adapted from Google’s Universal Speech Model, which means it can handle speech-to-text and even language translation directly on your phone. It’s already showing solid results, especially when translating between English and European languages like Spanish, French, Italian, and Portuguese.
On the visual front, it’s powered by Google’s new MobileNet-V5—a lightweight but powerful vision encoder that can process video at up to 60fps on phones like the Pixel. That means smooth, real-time video analysis without breaking a sweat. And it’s not just fast—it’s also more accurate than older models.
Developers can plug into Gemma 3n using popular tools like Hugging Face Transformers, Ollama, MLX, llama.cpp, and more. Google’s also kicked off the Gemma 3n Impact Challenge, offering a $150,000 prize pool for apps that showcase the model’s offline magic.
The best part? Gemma 3n runs entirely offline. No cloud, no connection just pure on-device AI. With support for over 140 languages and the ability to understand content in 35, it’s a game-changer for building AI apps where connectivity is patchy or privacy is a priority.
Here’s how you can try it out -
Want to try Gemma 3n for yourself? Here’s how you can get started:
- Experiment instantly – Head over to Google AI Studio, where you can play around with Gemma 3n in just a few clicks. You can even deploy it directly to Cloud Run from there.
- Download the model – Prefer working locally? You’ll find the model weights available on Hugging Face and Kaggle.
- Dive into the docs – Google’s got solid documentation to help you integrate Gemma into your workflow. Start with inference, fine-tuning, or build from scratch.
- Use your favorite tools – Whether you're into Ollama, MLX, llama.cpp, Docker, transformers.js, or Google's AI Edge Gallery—Gemma 3n fits right in.
- Bring your own dev stack – Already using Hugging Face Transformers, TRL, NVIDIA NeMo, Unsloth, or LMStudio? You’re covered.
- Deploy it your way – Push to production with options like Google GenAI API, Vertex AI, SGLang, vLLM, or even the NVIDIA API Catalog.
You may also like
Thailand starts banning sale of cannabis without prescription
Bengal DA crisis: Bengal government approaches SC seeking 6 months more to clear 25 per cent dues
Imtiaz Ali calls Diljit Dosanjh 'Son of the soil' amid Sardaar Ji 3 row
Carrie Johnson admitted to hospital after postpartum illness
India to host 2029 World Police and Fire Games, Amit Shah says 'a moment of great pride'