Google's AI can now surf the web for you, click on buttons, and fill out forms with Gemini 2.5 Computer Use

Leading providers of large language models (LLMs) are increasingly working on extending their capabilities into “agents.” These agents can perform actions on behalf of users across various websites. Notable examples include OpenAI’s ChatGPT Agent and Anthropic’s Computer Use, both of which were introduced in recent years.

Google has joined this domain with its introduction of the Gemini 2.5 Pro Computer Use model. This model, designed by Google’s DeepMind subsidiary, can utilize a virtual browser to gather information, fill out forms, and take other actions based on user commands entered as text prompts. Google CEO Sundar Pichai noted that this model represents a significant advancement in developing general-purpose agents.

The Gemini 2.5 Pro Computer Use model is not currently available for direct use by consumers. Instead, Google has collaborated with Browserbase, a company that provides a virtual “headless” web browser suited for AI applications. Users can experiment with the model on the Browserbase platform, where they can compare its performance against offerings from OpenAI and Anthropic.

For AI developers, the model is available via the Gemini API in Google AI Studio, allowing for rapid prototyping and application development. The Gemini 2.5 Pro model was released in March 2025 and has since undergone multiple updates focused on enhancing AI agents’ interactions with user interfaces.

Initial tests with Gemini 2.5 Computer Use demonstrated its ability to automate web browsing tasks; however, there were limitations, such as a lack of direct file creation capabilities compared to competitors. The model has shown competitive results in benchmark tests against other AI systems and functions autonomously in task completion, offering a range of UI actions.

Safety features are incorporated into the model, ensuring that actions taken by the AI can be inspected and confirmed by users to prevent unauthorized tasks. The Gemini 2.5 Computer Use model operates on a similar pricing structure to its predecessor, with distinctions in access and certain capabilities related to data handling and usage tiers.

Source: https://venturebeat.com/ai/googles-ai-can-now-surf-the-web-for-you-click-on-buttons-and-fill-out-forms

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top