
The vast volume of chat messages generated during esports events on Twitch represents a valuable source of data for understanding audience behavior. However, the sheer quantity and dynamic nature of this data make manual analysis impractical. This study addresses this challenge by introducing FinTwitchBERT, a model fine-tuned to classify Twitch chat messages into four categories based on their uniqueness. Our model demonstrates the ability to distinguish between original content, repetitive messages such as emote spamming, formulaic messages, and interactive commands chat participants use to interact with channel bots. Pre-trained on over 18 million Finnish Twitch chat messages and utilizing a combination of semi-supervised learning and iterative pseudo-labeling with human-in-the-loop validation, FinTwitchBERT achieves 97.42% accuracy on a test set of unseen chat messages with a limited initial dataset of only 7,529 manually annotated messages.
peerReviewed
Nykykulttuurin tutkimus, chat, elektroninen urheilu, verkkokeskustelu, tekstinlouhinta, Data Analytics, sosiaalinen media, Data Mining, and Machine Learning for Social Media, pelikulttuuri, Computational Science, Twitch, Contemporary Culture, machine learning, koneoppiminen, verkkojuttelu, natural language processing, social media analysis, Laskennallinen tiede
Nykykulttuurin tutkimus, chat, elektroninen urheilu, verkkokeskustelu, tekstinlouhinta, Data Analytics, sosiaalinen media, Data Mining, and Machine Learning for Social Media, pelikulttuuri, Computational Science, Twitch, Contemporary Culture, machine learning, koneoppiminen, verkkojuttelu, natural language processing, social media analysis, Laskennallinen tiede
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
