Amazon’s AI leader, Rohit Prasad, has urged those focused on model benchmarks to reconsider their approach, stating that real-world application is what matters. Speaking ahead of AWS re:Invent in Las Vegas, he emphasized that the current benchmarking methods do not reflect true model performance, as they often lack standardized training data and holdout evaluations.
This perspective contrasts sharply with other AI research labs that frequently highlight their models’ leaderboard achievements. Notably, Prasad’s comments come while Amazon’s flagship model, Nova, ranks at 79 on LMArena. To support companies in training custom AI models affordably, Amazon introduced Nova Forge, which allows businesses to utilize the company’s Nova model checkpoints during varying training stages.
This service aims to solve prevalent challenges companies face, providing alternatives to fine-tuning existing models or building new ones from scratch. With Forge, businesses can incorporate their proprietary data early in the training, maximizing the model’s learning capabilities. Prasad stated that Forge was developed in response to requests from Amazon’s internal teams seeking effective tools to enhance existing models without incurring exorbitant costs.
Reddit is among the companies that have started using Forge to create tailored safety models, leveraging its extensive community moderation data. Their Chief Technology Officer expressed that the platform has significantly empowered their engineers in model development. The goal is to consolidate various safety models into one, improving their understanding of community nuances.
Amazon’s strategy appears to focus on offering specialized AI solutions rather than competing directly in the general model performance race. Prasad argues that the emphasis should shift from benchmark standings to real-world utility, with ongoing developer adoption critical to this narrative.
Source: https://www.theverge.com/column/836902/amazons-ai-benchmarks-dont-matter

