Researchers at Salesforce and the University of Southern California have introduced a novel technique enabling computer agents to execute code while interacting with graphical user interfaces (GUIs). This system, termed CoAct-1, integrates coding capabilities with cursor manipulation and button clicks, aiming to enhance workflow efficiency and minimize errors.
CoAct-1 aims to overcome the limitations of traditional GUI-based agents, which often struggle in complex workflows due to reliance on cumbersome mouse clicks. By juxtaposing coding with visual interface interaction, CoAct-1 purportedly leads to a more effective automation process. This multi-agent system consists of an Orchestrator, a Programmer, and a GUI Operator, each specializing in different tasks. The Orchestrator designates tasks and manages the workflow, while the Programmer generates and executes scripts, and the GUI Operator handles visual tasks using vision-language models.
Testing on the OSWorld benchmark, which comprises 369 real-world tasks, reveals that CoAct-1 achieves a success rate of 60.76%, outperforming traditional methods with fewer steps required for task completion. Notably, it excels in instances where programmatic controls are advantageous, such as operating system tasks involving file management.
Despite the promising results, deploying CoAct-1 in real-world enterprise environments presents challenges, including compatibility with legacy software and the need for human oversight. Researchers acknowledge concerns regarding the agent’s ability to execute its own code, emphasizing the importance of secure systems with controlled access.
The researchers suggest that while CoAct-1 has significant potential in automating multi-tool processes, full autonomy in critical functions may not be feasible without human validation. Continued training and feedback in simulated environments are deemed necessary to enhance robustness and security before deployment.
This development raises questions about the adaptability of AI systems in diverse environments and the balance between automation and the necessity of human involvement.
Source: https://venturebeat.com/ai/salesforces-new-coact-1-agents-dont-just-point-and-click-they-write-code-to-accomplish-tasks-faster-and-with-greater-success-rates/

