MCP-Universe benchmark shows GPT-5 fails more than half of real-world orchestration tasks
Salesforce research has released a new benchmark aimed at assessing model and agentic performance in real-world enterprise tasks. This evaluation seeks to measure how effectively artificial intelligence systems can perform a variety of tasks commonly encountered in business settings. The benchmark specifically evaluates the performance of the GPT-5 model, revealing that it fails to successfully […]










