At the edge, our E2B and E4B models redefine on-device utility, prioritizing multimodal functionality, low-latency processing, and seamless ecosystem integration over raw parameter count.
Powerful, Accessible, and Open
To power the next generation of pioneering research and products, we specifically designed the Gemma 4 model’s size to run and fine-tune efficiently on the world’s billions of Android devices, from laptop GPUs to developer workstations and accelerators.
These highly optimized models allow you to fine-tune Gemma 4 to achieve state-of-the-art performance for specific tasks. We have already had amazing success with this approach. For example, INSAIT created the pioneering Bulgarian First Language Model (BgGPT), and we collaborated with Yale University on Cell2Sentence-Scale to discover new pathways for cancer treatment and more.
Here’s why Gemma 4 is the most capable open model family to date.
Advanced Reasoning: Capable of multi-step planning and deep logic, Gemma 4 shows significant improvements in math and instruction-following benchmarks that require it. Agent workflows: With native support for function calls, structured JSON output, and native system instructions, you can build autonomous agents that can interact with a variety of tools and APIs to reliably execute your workflows. Code generation: Gemma 4 supports high-quality offline code and transforms your workstation into local-first AI code. Assistant.Vision and Audio: All models process video and images natively, support variable resolution, and excel at visual tasks like OCR and chart understanding. Additionally, E2B and E4B models feature native audio input for speech recognition and understanding. Longer context: Handle long-form content seamlessly. Edge models feature a 128K context window, and large models offer up to 256K, allowing you to pass repositories and long documents in a single prompt. 140+ Languages: Natively trained in 140+ languages, Gemma 4 helps developers build comprehensive, high-performance applications for users around the world.
Versatile model compatible with various hardware
We release Gemma 4 model weights sized for specific hardware and use cases to ensure you get frontier class inference where you need it.
26B and 31B models: Frontier Intelligence, offline on a personal computer
Optimized to provide researchers and developers with state-of-the-art inference on accessible hardware, unquantized bfloat16 weights efficiently fit on a single 80GB NVIDIA H100 GPU. For local setups, the quantized version runs natively on consumer GPUs, powering IDEs, coding assistants, and agent workflows. Our 26 billion Mix of Experts (MoE) focuses on latency, activating only 3.8 billion of the total parameters during inference, delivering extremely fast tokens per second. Meanwhile, 31 billion Dense maximizes raw quality and provides a strong foundation for fine-tuning.

