
handle: 11586/451500
Recognizing attributes of unknown artworks relies on more than visual information: prior knowledge and emotional context can play a crucial role. Building an AI system mimicking this perception requires a multi-modal model integrating computer vision and contextual factors. In this paper, we propose a new model that uses vision transformers and graph attention networks to learn new artworks’ visual and contextual features and predict their style, genre, and emotion. Contextual features are acquired from an extended version of our ArtGraph knowledge graph, enriched with emotion information from the ArtEmis dataset. Our inductive end-to-end multi-task architecture enables real-time execution and resilience to graph evolutions. Combining computer vision and knowledge graphs could facilitate a deeper understanding of the fine arts, bridging the gap between computer science and the humanities (The new version of the graph is available at https://doi.org/10.5281/zenodo.8172374, while the code is available at https://github.com/CILAB-ArtGraph/multi-modal-end-to-end-art-classifier).
Knowledge graphs, Computer vision, Deep learning, Emotion recognition, Digital humanities
Knowledge graphs, Computer vision, Deep learning, Emotion recognition, Digital humanities
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
