Researchers find that retraining only small parts of AI models can cut costs and prevent forgetting

Enterprise efforts to fine-tune large language models (LLMs) can sometimes result in a phenomenon known as “catastrophic forgetting,” where models lose their ability to perform previously learned tasks. A study from the University of Illinois Urbana-Champaign introduces a new training method aimed at addressing this issue, particularly focusing on models that generate responses from images, specifically LLaVA and Qwen 2.5-VL.

The researchers propose a strategy that emphasizes retraining narrow aspects of the LLM rather than the entire model, which can significantly increase computational costs. They argue that catastrophic forgetting is not true memory loss but a byproduct of bias drift in the model’s output. The training of new LMMs is expensive and resource-intensive, raising the need for more efficient methods to update existing models.

To investigate the existence and causes of catastrophic forgetting, the researchers designed a set of target tasks for the models and evaluated their performance after fine-tuning. Findings indicated that while the models experienced initial drops in performance in some benchmarks, they often regained capabilities in other specialized tasks. Notably, tuning only specific layers, such as the self-attention projection layers, showed promising results, enabling good performance on target tasks without significant drops in accuracy across held-out tasks.

The researchers concluded that what appears to be forgetting could actually stem from shifts in task distribution rather than permanent knowledge loss. Their approach of narrowly retraining specific segments of the model allows for cost efficiency and better management of output drift.

While the study’s findings are focused on the two models under investigation, the researchers suggest that these methods could potentially be applied to other LLMs as well, expanding the implications of their work beyond the current scope.

Source: https://venturebeat.com/ai/researchers-find-that-retraining-only-small-parts-of-ai-models-can-cut-costs

Leave a Comment Cancel Reply