
Natural language generation (NLG) tasks have received significant research attention in recent years. For tackling various NLG tasks, the Transformer [27] is now consensus to be employed as a fundamental building block. In the literature, there are three main Transformer variants for NLG: full Transformer, Encoder-Only (only using the encoder part of the Transformer), and Decoder-Only (only using the decoder part). A natural question to ask is: which architecture is the best choice. According to previous studies, when the amount of training dataset is sufficient, using the full Transformer is the priority choice for NLG tasks. However, for the insufficient training dataset setting, we find this is not the case. In this paper, we report experiment results of applying the three architectures to four different tasks under low-resource settings. In contrast to the conclusion by previous study, we find that there are no consistent results indicating which architecture is the best under low-resource dataset settings. Further, based on the experiment results, we comment on the architecture selection under the low-resource dataset consideration.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 9 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
