Together AI's ATLAS adaptive speculator delivers 400% inference speedup by learning from workloads in real-time

Enterprises are facing challenges with AI deployments due to the limitations of static speculators, which struggle to adapt to changing workloads. Speculators are smaller AI models that assist large language models during inference by predicting multiple tokens simultaneously, enhancing throughput and reducing latency.

Together AI has developed a new system named ATLAS (AdapTive-LeArning Speculator System) designed to address the performance issues associated with static speculators. This system incorporates self-learning inference optimization capabilities, potentially achieving a performance improvement of up to 400% compared to current inference technologies. The problem arises when AI workloads change; static speculators, trained on fixed datasets, often lead to decreased accuracy and speed as enterprises scale.

ATLAS employs a dual-speculator architecture that balances stability and adaptation. It features a static speculator providing consistent baseline performance and an adaptive speculator that learns continuously from real-time data. A confidence-aware controller orchestrates the selection between the two speculators, adjusting to optimize performance.

Testing has shown ATLAS can achieve significant inference speeds, reportedly reaching up to 500 tokens per second on specific GPUs. The software’s optimization capabilities compete with specialized hardware, indicating that improvements in algorithmic approaches might be closing the gap with custom silicon solutions.

ATLAS’ benefits extend to various enterprise scenarios, including reinforcement learning training and evolving workloads, where the ability of the adaptive speculator to align with changing conditions stands out. Currently available on Together AI’s platform at no extra cost, ATLAS symbolizes a broader change in the industry from static to adaptive AI systems, emphasizing continuous learning and optimization.

Source: https://venturebeat.com/ai/together-ais-atlas-adaptive-speculator-delivers-400-inference-speedup-by

Leave a Comment Cancel Reply