Do we need Multimodality? Experiments with Tweets from European Union Executives

{"references": ["Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., M\u00fcller, S., & Matsuo, A. (2018). quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774.", "Chollet, F. (2015). Keras. GitHub. https://github.com/fchollet/keras.", "Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805.", "Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28(5). https://doi.org/10.18637/jss.v028.i05.", "Li, K., Zhang, Y., Li, K., Li, Y., & Fu, Y. (2022). Image-Text Embedding Learning via Visual and Textual Semantic Reasoning. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP. https://doi.org/10.1109/TPAMI.2022.3148470.", "Li, L. H., Yatskar, M., Yin, D., Hsieh, C.-J., & Chang, K.-W. (2019). VisualBERT: A Simple and Performant Baseline for Vision and Language. ArXiv:1908.03557 [Cs]. http://arxiv.org/abs/1908.03557.", "Niu, Y., Lu, Z., Wen, J.-R., Xiang, T., & Chang, S.-F. (2019). Multimodal Multi-Scale Deep Learning for Large-Scale Image Annotation. IEEE Transactions on Image Processing, 28(4), 1720\u20131731. https://doi.org/10.1109/TIP.2018.2881928.", "\u00d6zdemir, S., & Rauh, C. (2022). A Bird's Eye View: Supranational EU Actors on Twitter. Politics and Governance, 10(1), 133\u2013145. https://doi.org/10.17645/pag.v10i1.4686.", "Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, \u00c9. (2011). Scikit-learn: Machine Learning in Python. The Journal of Machine Learning Research, 12, 2825\u20132830.", "Tseng, S.-Y., Narayanan, S., & Georgiou, P. (2021). Multimodal Embeddings From Language Models for Emotion Recognition in the Wild. IEEE Signal Processing Letters, 28, 608\u2013612. https://doi.org/10.1109/LSP.2021.3065598.", "Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., \u2026 Rush, A. M. (2020). HuggingFace's Transformers: State-of-the-art Natural Language Processing. ArXiv:1910.03771 [Cs]. http://arxiv.org/abs/1910.03771.", "Wu, P. Y., & Mebane, W. R. (2022). MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks. Computational Communication Research, 4(1). https://doi.org/10.5117/CCR2022.1.008.WU.", "Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., & Girshick, R. (2019). Detectron2. https://github.com/facebookresearch/detectron2."]}

Content analysis has always been one of the key methods in communication research and advances in computational methods often deal with processing vast quantities of text. Yet, communication rarely happens via a single modality. For example, one of the key political actors in Europe, the European Union posts images in about 40% of its tweets (Özdemir & Rauh, 2022). Dictionary-based and shallow learning (SL) methods have a hard time incorporating multimodality into the analysis. Deep learning (DL) brings the possibility to extend content analysis to multimodal materials. Previous studies have demonstrated the flexibility of embeddings to analyze multimodel data (Li et al., 2022; Niu et al., 2019; Tseng et al., 2021; Wu & Mebane, 2022). In this paper, we evaluate the feasibility of using multimodal DL embeddings to classify political messages where the message is delivered with a combination of visual and textual modalities in a computational experiment. We build a series of unimodal SL models and multimodal DL embedding-based models to classify manually annotated tweets from European Union (EU) executives. We then compare the classification performance of these models. Our results indicate that multimodal signals are tricky to catch in a way that is meaningful to a classifier. Finally, we conclude with some recommendations for researchers who would like to use multimodal data in automated content analysis.

Scripts can be found here: https://github.com/SinaOzdemir/ifkw_mmDL

Related Organizations

View all View all

Keywords

classification, twitter data, political communication, european union, multimodal embeddings

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average