
doi: 10.3390/math12233744
Recent studies on parameter-efficient fine-tuning (PEFT) have introduced effective and efficient methods for fine-tuning large language models (LLMs) on downstream tasks using fewer parameters than required by full fine-tuning. Low-rank decomposition adaptation (LoRA) significantly reduces the parameter count to 0.03% of that in full fine-tuning, maintaining satisfactory performance when training only two low-rank parameters. However, limitations remain due to the lack of task-specific parameters involved in training. To mitigate these issues, we propose the Lottery Rank-Pruning Adaptation (LoRPA) method, which utilizes the Lottery Ticket Hypothesis to prune less significant parameters based on their magnitudes following initial training. Initially, LoRPA trains with a relatively large rank size and then applies pruning to enhance performance in subsequent training with fewer parameters. We conducted experiments to compare LoRPA with LoRA baselines, including a setting with a relatively large rank size. Experimental results on the GLUE dataset with RoBERTa demonstrate that LoRPA achieves comparable results on the base scale while outperforming LoRA with various rank sizes by 0.04% to 0.74% on a large scale across multiple tasks. Additionally, on generative summarization tasks using BART-base on the CNN/DailyMail and XSum datasets, LoRPA outperformed LoRA at the standard rank size and other PEFT methods in most of the metrics. These results validate the efficacy of lottery pruning for LoRA in downstream natural-language understanding and generation tasks.
parameter efficient finetuning, QA1-939, deep learning, large language model, transfer learning, low-rank adaptation, Mathematics
parameter efficient finetuning, QA1-939, deep learning, large language model, transfer learning, low-rank adaptation, Mathematics
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
