October 11, 2024

One Initialization to Rule Them All: Fine-tuning LLMs via Explained Variance Adaptation

Listen to this article as Podcast
0:00 / 0:00
One Initialization to Rule Them All: Fine-tuning LLMs via Explained Variance Adaptation

One Initialization to Rule Them All: Fine-Tuning via Explained Variance Adaptation

In the ever-evolving world of artificial intelligence (AI), large language models (LLMs) are playing an increasingly important role. These models, pre-trained on massive datasets, have the ability to handle a variety of tasks, from generating text to translating languages. However, fine-tuning these models for specific applications is a crucial step in optimizing their performance. New research titled "One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation" introduces an innovative method for fine-tuning LLMs based on adapting the explained variance.

Background

Traditionally, LLMs are fine-tuned with a random initialization of the weight matrices. However, this approach can lead to slow convergence and suboptimal results. In recent years, methods such as Low-Rank Adaptation (LoRA) have proven more effective. LoRA introduces new low-rank weight matrices that are updated during training. However, the initial rank distribution of these matrices is typically uniform, which can limit the model's adaptability.

Explained Variance Adaptation (EVA)

The method presented in the research paper, Explained Variance Adaptation (EVA), aims to overcome the limitations of LoRA through data-driven initialization and an adaptive rank distribution. EVA consists of two main steps:

  • Data-Driven Initialization: EVA first computes the singular value decomposition of mini-batches of activation vectors. The obtained right singular vectors are then used to initialize the LoRA matrices. This approach ensures that the initialization is relevant to the specific data on which the model is being fine-tuned.
  • Adaptive Rank Distribution: EVA redistributes the ranks of the LoRA matrices to explain the maximum variance in the data. This allows the model to focus on the most important information and improve performance.

Results

The researchers evaluated EVA on a range of tasks, including language generation, language understanding, image classification, and reinforcement learning. The results show that EVA achieves faster convergence and higher average scores across multiple tasks compared to other fine-tuning methods, including LoRA.

Significance

EVA represents a significant advance in the field of fine-tuning LLMs. The data-driven initialization and adaptive rank distribution enable more efficient and effective adaptation of models to specific tasks. This has the potential to improve the performance of LLMs in a wide range of applications, from chatbots and language assistants to medical diagnosis and autonomous driving.

Future Research

The authors of the research paper highlight several areas for future research, including:

  • Investigating the effects of EVA on the generalization ability of LLMs.
  • Developing methods for automatically determining the optimal number of ranks for the LoRA matrices.
  • Applying EVA to other fine-tuning techniques, such as prompt-tuning.

Conclusion

Explained Variance Adaptation is a promising new method for fine-tuning LLMs. By combining data-driven initialization with adaptive rank distribution, EVA enables faster convergence and higher performance. This technique has the potential to revolutionize the way we adapt and deploy LLMs for a wide variety of applications.

Bibliography

[1] https://arxiv.org/html/2408.13296v1 [2] https://openaccess.thecvf.com/content/CVPR2023/papers/Goyal_Finetune_Like_You_Pretrain_Improved_Finetuning_of_Zero-Shot_Vision_Models_CVPR_2023_paper.pdf [3] https://arxiv.org/pdf/2210.05643 [4] https://www.researchgate.net/publication/345713638_Transfer_Learning_With_Adaptive_Fine-Tuning [5] https://github.com/wangkai930418/awesome-diffusion-categorized [6] https://ieeexplore.ieee.org/iel7/6287639/8948470/09241777.pdf [7] https://openreview.net/pdf?id=UYneFzXSJWh [8] https://www.jmlr.org/papers/volume25/23-0870/23-0870.pdf [9] https://assets.amazon.science/55/18/a577fd034d7e96fa56791c311de0/meta-learning-the-difference-preparing-large-language-models-for-efficient-adaptation.pdf [10] https://neurips.cc/virtual/2023/papers.html