Google CEO Sundar Pichai announced the launch of Gemini 2.0, a model that represents the next step in Google’s ambitions to revolutionize AI.
One year after the introduction of the Gemini 1.0 model, this major upgrade includes enhanced multimodal capabilities, agent capabilities, and innovative user tools designed to push the boundaries of AI-driven technology.
Leap to revolutionary AI
Reflecting on Google’s 26-year mission to organize and make the world’s information accessible, Pichai said, “If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making information more accessible. It makes it convenient.”
Gemini 1.0, released in December 2022, was noted as Google’s first native multimodal AI model. The first iteration was great at understanding and processing text, video, images, audio, and code. Its enhanced 1.5 version was widely accepted by developers for its ability to understand long contexts, enabling productivity-focused applications such as NotebookLM.
With Gemini 2.0, Google now aims to accelerate the role of AI as a universal assistant with native image and voice generation, better reasoning and planning, and real-world decision-making capabilities. In Pichai’s words, this development represents the beginning of the “age of agency.”
“We have invested in developing models that are more agentic, meaning they have a deeper understanding of the world around you, can think several steps ahead, and can act on your behalf under your supervision. and take action,” Pichai explained.
Gemini 2.0: Core features and availability
At the heart of today’s announcement is the experimental release of Gemini’s second generation flagship model, Gemini 2.0 Flash. It builds on the foundation laid by its predecessor to deliver faster response times and advanced performance.
Gemini 2.0 Flash supports multimodal input and output, including the ability to generate native images in combination with text and multilingual audio for interactable text-to-speech. Additionally, users can also benefit from the integration of native tools such as Google Search and third-party user-defined functions.
Developers and enterprises will now be able to access Gemini 2.0 Flash through Google AI Studio and Vertex AI’s Gemini API. Additionally, a larger size model will be released more widely in January 2024.
To enable global accessibility, the Gemini app features a chat-optimized version of the 2.0 Flash experimental model. Early adopters can experience this modern assistant on desktop and mobile with the mobile app coming soon.
Products like Google Search are also being enhanced with Gemini 2.0 to handle complex queries such as advanced math problems, coding inquiries, and multimodal questions.
Comprehensive suite of AI innovations
The release of Gemini 2.0 comes with exciting new tools to showcase its capabilities.
One such feature, Deep Research, acts as an AI research assistant that simplifies the process of researching complex topics by compiling information into comprehensive reports. Another upgrade powers search with Gemini-enabled AI summaries to tackle complex multi-step user queries.
The model was trained using Google’s 6th generation Tensor Processing Unit (TPU), known as Trillium. Pichai said this “enhanced 100% of Gemini 2.0’s training and inference.”
Trillium is now available to external developers, who can benefit from the same infrastructure that supports Google’s own advances.
Pioneering agent experience
Gemini 2.0 comes with experimental “agent” prototypes built to explore the future of human-AI collaboration, including:
Project Astra: The universal AI assistant
First introduced at I/O earlier this year, Project Astra leverages Gemini 2.0’s multimodal understanding to improve real-world AI interactions. Trusted testers tried out Assistant on Android and provided feedback to help us improve multilingual interactions, memory retention, and integration with Google tools like Search, Lens, and Maps. Astra has also demonstrated human-like speech delays, and further research is underway for applications in wearable technology, such as prototype AI glasses.
Project Mariner: Redefining web automation
Project Mariner is an experimental web browsing assistant that uses the power of Gemini 2.0 to reason about interactive elements such as text, images, and forms in the browser. In initial testing, we achieved an 83.5% success rate in completing end-to-end web tasks on the WebVoyager benchmark. Early testers using the Chrome extension are helping refine Mariner’s features, and Google is evaluating safety measures to ensure the technology is user-friendly and secure.
Jules: Coding agent for developers
Jules is an AI-powered assistant built for developers that integrates directly into your GitHub workflow to tackle your coding challenges. It can autonomously propose solutions, generate plans, and execute code-based tasks, all under human supervision. This experimental effort is part of Google’s long-term goal to create versatile AI agents across a variety of domains.
Game applications and beyond
To extend the reach of Gemini 2.0 to virtual environments, Google DeepMind is working with gaming partners such as Supercell to develop intelligent gaming agents. These experimental AI companions can also interpret game actions in real time, suggest strategies, and access broader knowledge through search. Research is also being conducted on how Gemini 2.0’s spatial reasoning can support robotics and open the door to future applications in the physical world.
Commitment to responsibility in AI development
As AI capabilities expand, Google emphasizes the importance of prioritizing safety and ethical considerations.
Google claims that Gemini 2.0 has undergone an extensive risk assessment, enhanced by oversight from the Responsibility and Safety Board, to mitigate potential risks. Additionally, built-in inference capabilities enable advanced “red teaming,” allowing developers to assess security scenarios and optimize safety measures at scale.
Google is also considering safeguards to address user privacy, prevent abuse, and ensure the trustworthiness of its AI agents. For example, Project Mariner is designed to prioritize user instructions while resisting malicious prompt injection and preventing threats such as phishing and fraudulent transactions. Meanwhile, Project Astra’s privacy controls allow users to easily manage session data and deletion settings.
Pichai reaffirmed the company’s commitment to responsible development, saying, “We strongly believe that the only way to build AI is to be responsible from the beginning.”
With the Gemini 2.0 Flash release, Google is inching closer to its vision of building a universal assistant that can transform interactions across domains.
SEE ALSO: Unlearning machines: Researchers make AI models “forget” data
Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expos in Amsterdam, California, and London. This comprehensive event will be co-located with major events such as Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Learn about other upcoming enterprise technology events and webinars from TechForge here.