
Image Caption generation is an important research area in computer vision and natural language processing. This paper compares two popular Convolutional Neural Network (CNN) architectures, DenseNet201 and ResNet50, for feature extraction in the title generation task. The study aims to analyze the impact of these architectures on the quality of generated subtitles by measuring their learning curves and Bilingual Evaluation Understudy (BLEU) scores. The study shows that the choice of CNN architecture significantly affects the performance of the captioning model. Densenet201 and Resnet50 have different learning models and BLEU scores, indicating that the former is more effective at capturing high-level features, while the latter is more suitable for capturing local features. This study’s results will help develop more accurate and efficient subtitling models.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
