Researchers from the University of Washington and the Allen Institute for AI have set a new precedent in the work of fine-tuning LLMs. The study, led by Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, and Noah A. Smith, introduces a concept known as “proxy-tuning,” a method that promises to streamline the adaptation of large pretrained LMs efficiently.
Traditionally, large language models like GPT and BERT have required extensive resources for fine-tuning to meet specific needs or to enhance their performance. This process often poses a challenge, especially when model weights are inaccessible or resource constraints are a concern.
In this paper, the team’s research addresses this gap by presenting a resource-effective alternative that maintains, and in some cases, enhances the efficacy of these models. This is where proxy tuning comes into play.
This method is a lightweight, decoding-time algorithm that works in conjunction with black-box language models, which are typically large-scale and pre-trained. The core of this technique involves tuning a smaller language model and then applying the predictive differences between the small-tuned and untuned models.
This adjustment effectively shifts the predictions of the base model toward the desired tuning goal. The beauty of this method lies in its ability to leverage the advantages of larger, more comprehensive models without directly modifying them.
The effectiveness of proxy-tuning is underscored by its application to Llama2-70B, a prominent language model. By using a proxy model of only 7B in size, the researchers successfully narrowed 88% of the performance gap between the standard and fully tuned versions of Llama2-70B.
This was achieved across various benchmarks including knowledge accuracy, reasoning capabilities, and safety measures. Notably, in tests involving TruthfulQA, a platform assessing the integrity of model responses, the proxy-tuned models outperformed their directly tuned counterparts, suggesting that this method may better preserve factual accuracy.
The implications of this study extend far beyond these initial experiments. This new method, so far, has demonstrated the versatility of proxy-tuning in other domains, such as code adaptation and task-specific fine-tuning for question-answering and mathematical problems.
This flexibility indicates the potential if it becomes possible to scale.