Multimodal Sentiment Analysis Dataset

This dataset provides utterance-level annotations for multimodal sentiment analysis derived from publicly available YouTube videos. Each data instance corresponds to a single utterance and includes aligned multimodal information consisting of transcribed text, audio segments, and visual representations extracted as video keyframes. Sentiment labels are manually annotated at the utterance level to capture fine-grained affective expressions within conversational contexts. The dataset is designed to support research in multimodal learning, affective computing, and large language model (LLM)-based sentiment analysis. It can be used for benchmarking sentiment classification models, evaluating multimodal fusion strategies, and exploring zero-shot or fine-tuning approaches with vision–language and audio–text models. All data are provided for research and educational purposes only.

Related Organizations

Islamic University of Riau
Indonesia

Keywords

multimodal sentiment analysis, video sentiment, utterance-level annotation, large language models, affective computing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average